14m

This is a response to the post We Write Numbers Backward, in which lsusr argues that little-endian numerical notation is better than big-endian.^[1] I believe this is wrong, and big-endian has a significant advantage not considered by lsusr.

Lsusr describes reading the number "123" in little-endian, using the following algorithm:

Read the first digit, multiply it by its order of magnitude (one), and add it to the total. (Running total: ??? one.)
Read the second digit, multiply it by its order of magnitude (ten), and add it to the total. (Running total: ??? twenty one.)
Read the third digit, multiply it by its order of magnitude (one hundred), and add it to the total. (Arriving at three hundred and twenty one.)

He compares it with two algorithms for reading a big-endian number. One...

(See More – 763 more words)

This is Water by David Foster Wallace

Nathan Young

This is a linkpost for https://fs.blog/david-foster-wallace-this-is-water/

Note: It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either. What follows is a full copy of "This is Water" by David Foster Wallace his 2005 commencement speech to the graduating class at Kenyon College.

Greetings parents and congratulations to Kenyon’s graduating class of 2005. There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”

This is...

(Continue Reading – 3653 more words)

M. Y. Zuo30m10

If someone, who really is this prone to dangerously overthink, reads this many words about not thinking in so many words, it seems like it could also cause the opposite effect?

It could condition them to read more long winded explanations in the hopes of uncovering another diamond of wisdom buried in the rough.

Refusal in LLMs is mediated by a single direction

127

Andy Arditi, Oscar Obeso, Aaquib111, wesg, Neel Nanda

Ω 512d

This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.

This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.

We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.

Executive summary

Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."

We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...

(Continue Reading – 2920 more words)

Neel Nanda37mΩ220

Thanks! Broadly agreed

For example, I think our understanding of Grokking in late 2022 turned out to be importantly incomplete.

I'd be curious to hear more about what you meant by this

1Zack Sargent1h

It's mostly the training data. I wish we could teach such models ethics and have them evaluate the morality of a given action, but the reality is that this is still just (really fancy) next-word prediction. Therefore, a lot of the training data gets manipulated to increase the odds of refusal to certain queries, not building a real filter/ethics into the process. TL;DR: Most of these models, if asked "why" a certain thing is refused, it should answer some version of "Because I was told it was" (training paradigm, parroting, etc.).

1Zack Sargent1h

Llama-3-8B is considerably more susceptible to loss via quantization. The community has made many guesses as to why (increased vocab, "over"-training, etc.), but the long and short of it is that a 6.0 quant of Llama-3-8B is going to be markedly worse off than 6.0 quants of previous 7b or similar-sized models. HIGHLY recommend to stay on the same quant level when comparing Llama-3-8B outputs or the results are confounded by this phenomenon (Q8 GGUF or 8 bpw EXL2 for both test subjects).

2LawrenceC2h

To put it another way, things can be important even if they're not existential.

Book review: Deep Utopia

PeterMcCluskey

This is a linkpost for https://bayesianinvestor.com/blog/index.php/2024/04/23/deep-utopia/

Book review: Deep Utopia: Life and Meaning in a Solved World, by Nick Bostrom.

Bostrom's previous book, Superintelligence, triggered expressions of concern. In his latest work, he describes his hopes for the distant future, presumably to limit the risk that fear of AI will lead to a The Butlerian Jihad-like scenario.

While Bostrom is relatively cautious about endorsing specific features of a utopia, he clearly expresses his dissatisfaction with the current state of the world. For instance, in a footnoted rant about preserving nature, he writes:

Imagine that some technologically advanced civilization arrived on Earth ... Imagine they said: "The most important thing is to preserve the ecosystem in its natural splendor. In particular, the predator populations must be preserved: the psychopath killers, the fascist goons, the despotic death squads ... What a tragedy if this rich natural diversity were replaced with a monoculture of

...

(Continue Reading – 1027 more words)

2Seth Herd3h

Thanks! I'm also uninterested in the question of whether it's possible. Obviously it is. The question is how we'll decide to use it. I think that answer is critical to whether we'd consider the results utopian. So, does he consider how we should or will use that ability?

PeterMcCluskey39m20

I can't recall any clear predictions or advice, just a general presumption that it will be used wisely.

2Said Achmiz3h

I recommend Gwern’s discussion of pain to anyone who finds this sort of proposal intriguing (or anyone who is simply interested in the subject).

2PeterMcCluskey1h

Given utopian medicine, Gwern's points seem not very important.

San Francisco ACX Meetup “First Saturday”

May 4th750 Howard Street, San Francisco

Nate Sternberg

Date: Saturday, May 4th, 2024

Time: 1 pm – 3 pm PT

Address: Yerba Buena Gardens in San Francisco, just outside the Metreon food court, coordinates 37°47'04.4"N 122°24'11.1"W

Contact: 34251super@gmail.com

Come join San Francisco’s First Saturday (or SFFS – easy to remember, right?) ACX meetup. Whether you're an avid reader, a first time reader, or just a curious soul, come meet! We will make introductions, talk about a recent ACX article (In Partial Grudging Defense Of Some Aspects Of Therapy Culture), and veer off into whatever topic you’d like to discuss (that may, or may not be, AI). You can get food from one of the many neighbouring restaurants.

We relocate inside the food court if there is inclement weather, or too much noise/music outside.

I will carry a stuffed-animal green frog to help you identify the group. You can let me know you are coming by either RSVPing on LW or sending an email to 34251super@gmail.com, or you can also just show up!

Examples of Highly Counterfactual Discoveries?

167

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

Answer by transhumanist_atom_understanderApr 29, 202410

Green fluorescent protein (GFP). A curiosity-driven marine biology project (how do jellyfish produce light?), that was later adapted into an important and widely used tool in cell biology. You splice the GFP gene onto another gene, and you've effectively got a fluorescent tag so you can see where the protein product is in the cell.

Jellyfish luminescence wasn't exactly a hot field, I don't know of any near-independent discoveries of GFP. However, when people were looking for protein markers visible under a microscope, multiple labs tried GFP simultaneously,... (read more)

1Johannes C. Mayer10h

I am also not sure how useful it is, but I would be very careful with saying that R programmers not using it is strong evidence that it is not that useful. Basically, that was a bit the point I wanted to make with the original comment. Homoiconicity might be hard to learn and use compared to learning a for loop in python. That might be the reason that people don't learn it. Because they don't understand how it could be useful. Probably actually most R users did not even hear about homoiconicity. And if they would they would ask "Well I don't know how this is useful". But again that does not mean that it is not useful. Probably many people at least vaguely know the concept of a pure function. But probably most don't actually use it in situations where it would be advantageous to use pure functions because they can't identify these situations. Probably they don't even understand basic arguments, because they've never heard them, of why one would care about making functions pure. With your line of argument, we would now be able to conclude that pure functions are clearly not very useful in practice. Which I think is, at minimum, an overstatement. Clearly, they can be useful. My current model says that they are actually very useful. [Edit:] Also R is not homoiconic lol. At least not in a strong sense like lisp. At least what this guy on github says. Also, I would guess this is correct from remembering how R looks, and looking at a few code samples now. In LISP your program is a bunch of lists. In R not. What is the data structure instance that is equivalent to this expression: %sumx2y2% <- function(e1, e2) {e1 ^ 2 + e2 ^ 2}?

1Radford Neal8h

R is definitely homoiconic. For your example (putting the %sumx2y2% in backquotes to make it syntactically valid), we can examine it like this: And so forth. And of course you can construct that expression bit by bit if you like as well. And if you like, you can construct such expressions and use them just as data structures, never evaluating them, though this would be a bit of a strange thing to do. The only difference from Lisp is that R has a variety of composite data types, including "language", whereas Lisp just has S-expressions and atoms.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

[Link] A community alert about Ziz

163

DanielFilan

This is a linkpost for https://medium.com/@sefashapiro/a-community-warning-about-ziz-76c100180509

The start of the post is copy-pasted below. Note that the post is anonymous, and I am not claiming to have written it.

Some people in the rationalist community are concerned about risks of physical violence from Ziz and some of her associates. Following discussions with several people, I’m posting here to explain where those concerns come from, and recommend some responses.

TLDR (details and links in the post body)

Over the past few years, Ziz has repeatedly called for the deaths of many different classes of people.
In August of 2022, Ziz seems to have faked her own death. Ziz’s close associate Gwen Danielson may have done the same thing in April of 2022.
In November of 2022, three associates of Ziz (Somnulence “Somni” Logencia, Emma Borhanian, and someone going by

...

(See More – 359 more words)

Eli Tyre1h20

Ziz is...exceptionally (and probably often uncomfortably) aware of the way people's minds work in a psychoanalytic sense.

What do you mean by this? Like, she's better than average at predicting people's behavior in various circumstances?

The Prop-room and Stage Cognitive Architecture

Robert Kralisch

This is a post on a novel cognitive architecture I have been thinking about for a while now, first as a conceptual playground to concretise some of my agent foundation ideas, and lately as an idea for a project that approaches the Alignment Problem directly by concretising a sort of AI-Seed approach for an inherently interpretable yet generally intelligent architecture on which to implement a notion of "minimal alignment" before figuring out how to scale it.
I don't strongly expect this to work, but I don't yet see where it has to fail and it feels sufficiently fresh/untried to at least see where it breaks down.
I omitted most of the technical details that I am considering, since this post is about the conceptual portion of the architecture, and...

(Continue Reading – 3935 more words)

How are Simulators and Agents related?

Robert Kralisch

In this post, I will provide some speculative reasoning about Simulators and Agents being entangled in certain ways.

I have thought quite a bit about LLMs and the Simulator framing for them, and I am convinced that it is a good explanatory/predictive frame for behavior of current LLM (+multimodal) systems.
It provides intuitive answers for why the difficulties around capability assessment for LLMs exist, why it is useful to say "please" and "thank you" when interacting with them, and lines out somewhat coherent explanations about their out of distribution behaviors (i.e. Bing Sydney).

When I was wrapping my head around this simulator classification for GPTs, contrasting them against previously established ideas about classifying AGIs, like oracles or genies, I noticed that simulators seem like a universally useful cognitive structure/algorithm, and...

(Continue Reading – 1860 more words)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Executive summary

TLDR (details and links in the post body)

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA