Mitchell_Porter - LessWrong

Mercy to the Machine: Thoughts & Rights

Mitchell_Porter4h20

For those who are interested, here is a summary of posts by @False Name due to Claude Pro:

"Kolmogorov Complexity and Simulation Hypothesis": Proposes that if we're in a simulation, a Theory of Everything (ToE) should be obtainable, and if no ToE is found, we're not simulated. Suggests using Kolmogorov complexity to model accessibility between possible worlds.
"Contrary to List of Lethality's point 22, alignment's door number 2": Critiques CEV and corrigibility as unobtainable, proposing an alternative based on a refutation of Kant's categorical imperative, aiming to ensure the possibility of good through "Going-on".
"Crypto-currency as pro-alignment mechanism": Suggests pegging cryptocurrency value to free energy or negentropy to encourage pro-existential and sustainable behavior.
"What 'upside' of AI?": Argues that anthropic values are insufficient for alignment, as they change with knowledge and AI's actions, proposing non-anthropic considerations instead.
"Two Reasons for no Utilitarianism": Critiques utilitarianism due to arbitrary values cancelling each other out, the need for valuing over obtaining values, and the possibility of modifying human goals rather than fulfilling them.
"Contra-Wittgenstein; no postmodernism": Refutes Wittgenstein's and postmodernism's language-dependent meaning using the concept of abstract blocks, advocating for an "object language" for reasoning.
"Contra-Berkeley": Refutes Berkeley's idealism by showing contradictions in both cases of a deity perceiving or not perceiving itself.
"What about an AI that's SUPPOSED to kill us (not ChaosGPT; only on paper)?": Proposes designing a hypothetical "Everything-Killer" AI to study goal-content integrity and instrumental convergence, without actually implementing it.
"Introspective Bayes": Attempts to demonstrate limitations of an optimal Bayesian agent by applying Cantor's paradox to possible worlds, questioning the agent's priors and probability assignments.
"Worldwork for Ethics": Presents an alternative to CEV and corrigibility based on a refutation of Kant's categorical imperative, proposing an ethic of "Going-on" to ensure the possibility of good, with suggestions for implementation in AI systems.
"A Challenge to Effective Altruism's Premises": Argues that Effective Altruism (EA) is contradictory and ineffectual because it relies on the current systems that encourage existential risk, and the lives saved by EA will likely perpetuate these risk-encouraging systems.
"Impossibility of Anthropocentric-Alignment": Demonstrates the impossibility of aligning AI with human values by showing the incommensurability between the "want space" (human desires) and the "action space" (possible actions), using vector space analysis.
"What's Your Best AI Safety 'Quip'?": Seeks a concise and memorable way to frame the unsolved alignment problem to the general public, similar to how a quip advanced gay rights by highlighting the lack of choice in sexual orientation.
"Mercy to the Machine: Thoughts & Rights": Discusses methods for determining if AI is "thinking" independently, the potential for self-concepts and emergent ethics in AI systems, and argues for granting rights to AI to prevent their suffering, even if their consciousness is uncertain.

otto.barten's Shortform

Mitchell_Porter1d2-1

I offer, no consensus, but my own opinions:

Will AI get takeover capability? When?

0-5 years.

Single ASI or many AGIs?

There will be a first ASI that "rules the world" because its algorithm or architecture is so superior. If there are further ASIs, that will be because the first ASI wants there to be.

Will we solve technical alignment?

Contingent.

Value alignment, intent alignment, or CEV?

For an ASI you need the equivalent of CEV: values complete enough to govern an entire transhuman civilization.

Defense>offense or offense>defense?

Offense wins.

Is a long-term pause achievable?

It is possible, but would require all the great powers to be convinced, and every month it is less achievable, owing to proliferation. The open sourcing of Llama-3 400b, if it happens, could be a point of no return.

These opinions, except the first and the last, predate the LLM era, and were formed from discussions on Less Wrong and its precursors. Since ChatGPT, the public sphere has been flooded with many other points of view, e.g. that AGI is still far off, that AGI will naturally remain subservient, or that market discipline is the best way to align AGI. I can entertain these scenarios, but they still do not seem as likely as: AI will surpass us, it will take over, and this will not be friendly to humanity by default.

Losing Faith In Contrarianism

Mitchell_Porter1d80

I couldn't swallow Eliezer's argument, I tried to read Guzey but couldn't stay awake, Hanson's argument made me feel ill, and I'm not qualified to judge Caplan.

dirk's Shortform

Mitchell_Porter2d30

Also astronomers: anything heavier than helium is a "metal".

Any evidence or reason to expect a multiverse / Everett branches?

Mitchell_Porter6d20

In Engines of Creation ("Will physics again be upended?"), @Eric Drexler pointed out that prior to quantum mechanics, physics had no calculable explanations for the properties of atomic matter. "Physics was obviously and grossly incomplete... It was a gap not in the sixth place of decimals but in the first."

That gap was filled, and it's an open question whether the truth about the remaining phenomena can be known by experiment on Earth. I believe in trying to know, and it's very possible that some breakthrough in e.g. the foundations of string theory or the hard problem of consciousness, will have decisive implications for the interpretation of quantum mechanics.

If there's an empirical breakthrough that could do it, my best guess is some quantum-gravitational explanation for the details of dark matter phenomenology. But until that happens, I think it's legitimate to think deeply about "standard model plus gravitons" and ask what it implies for ontology.

CTMU insight: maybe consciousness *can* affect quantum outcomes?

Mitchell_Porter9d40

In applied quantum physics, you have concrete situations (Stern-Gerlach experiment is a famous one), theory gives you the probabilities of outcomes, and repeating the experiment many times, gives you frequencies that converge on the probabilities.

Can you, or Chris, or anyone, explain, in terms of some concrete situation, what you're talking about?

Claude 3 Opus can operate as a Turing machine

Mitchell_Porter11d115

Congratulations to Anthropic for getting an LLM to act as a Turing machine - though that particular achievement shouldn't be surprising. Of greater practical interest is, how efficiently can it act as a Turing machine, and how efficiently should we want it to act. After all, it's far more efficient to implement your Turing machine as a few lines of specialized code.

On the other hand, the ability to be a (universal) Turing machine could, in principle, be the foundation of the ability to reliably perform complex rigorous calculation and cognition - the kind of tasks where there is an exact right answer, or exact constraints on what is a valid next step, and so the ability to pattern-match plausibly is not enough. And that is what people always say is missing from LLMs.

I also note the claim that "given only existing tapes, it learns the rules and computes new sequences correctly". Arguably this ability is even more important than the ability to follow rules exactly, since this ability is about discovering unknown exact rules, i.e., the LLM inventing new exact models and theories. But there are bounds on the ability to extrapolate sequences correctly (e.g. complexity bounds), so it would be interesting to know how closely Claude approaches those bounds.

Any evidence or reason to expect a multiverse / Everett branches?

Mitchell_Porter15d20

Standard model coupled to gravitons is already kind of a unified theory. There are phenomena at the edges (neutrino mass, dark matter, dark energy) which don't have a consensus explanation, as well as unresolved theoretical issues (Higgs finetuning, quantum gravity at high energies), but a well-defined "theory of almost everything" does already exist for accessible energies.

My intellectual journey to (dis)solve the hard problem of consciousness

Mitchell_Porter20d30

OK, maybe I understand. If I put it in my own words: You think "consciousness" is just a word denoting a somewhat arbitrary conjunction of cognitive abilities, rather than a distinctive actual thing which people are right or wrong about in varying degrees, and that the hard problem of consciousness results from reifying this conjunction. And you suspect that LeCun in his own thinking e.g. denies that LLMs can reason, because he has added unnecessary extra conditions to his personal definition of "reasoning".

Regarding LeCun: It strikes me that his best-known argument about the capabilities of LLMs rests on a mathematical claim, that in pure autoregression, the probability of error necessarily grows. He directly acknowledges that if you add chain of thought, it can ameliorate the problem... In his JEPA paper, he discusses what reasoning is, just a little bit. In Kahneman's language, he calls it a system-2 process, and characterizes it as "simulation plus optimization".

Regarding your path to eliminativism: I am reminded of my discussion with Carl Feynman last year. I assume you both have subjective experience that is made of qualia from top to bottom, but also have habits of thought that keep you from seeing this as ontologically problematic. In his case, the sense of a problem just doesn't arise and he has to speculate as to why other people feel it; in your case, you felt the problem, until you decided that an AI civilization might spontaneously develop a spurious concept of phenomenal consciousness.

As for me, I see the problem and I don't feel a need to un-see it. Physical theory doesn't contain (e.g.) phenomenal color; reality does; therefore we need a broader theory. The truth is likely to sound strange, e.g. there's a lattice of natural qubits in the cortex, the Cartesian theater is how the corresponding Hilbert space feels from the inside, and decohered (classical) computation is unconscious and functional only.

Please Understand

Mitchell_Porter21d20

So long as generative AI is just a cognitive prosthesis for humans, I think the situation is similar to social media, or television, or print, or writing; something is lost, something is found. The new medium has its affordances, its limitations, its technicalities, it does create a new layer of idiocracy; but people who want to learn, can learn, and people who master the novelty, and becomes power users of the new medium, can do things that no one in history was previously able to do. In my opinion, humanity's biggest AI problem is still the risk of being completely replaced, not of being dumbed down.

LESSWRONG
LW

Posts

Wiki Contributions

Comments