It might be the case that what people find beautiful and ugly is subjective, but that's not an explanation of ~why~ people find some things beautiful or ugly. Things, including aesthetics, have causal reasons for being the way they are. You can even ask "what would change my mind about whether this is beautiful or ugly?". Raemon explores this topic in depth.
This is a response to the post We Write Numbers Backward, in which lsusr argues that little-endian numerical notation is better than big-endian.[1] I believe this is wrong, and big-endian has a significant advantage not considered by lsusr.
Lsusr describes reading the number "123" in little-endian, using the following algorithm:
He compares it with two algorithms for reading a big-endian number. One...
Note: It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either. What follows is a full copy of "This is Water" by David Foster Wallace his 2005 commencement speech to the graduating class at Kenyon College.
Greetings parents and congratulations to Kenyon’s graduating class of 2005. There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”
This is...
If someone, who really is this prone to dangerously overthink, reads this many words about not thinking in so many words, it seems like it could also cause the opposite effect?
It could condition them to read more long winded explanations in the hopes of uncovering another diamond of wisdom buried in the rough.
This work was produced as part of Neel Nanda's stream in the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort, with co-supervision from Wes Gurnee.
This post is a preview for our upcoming paper, which will provide more detail into our current understanding of refusal.
We thank Nina Rimsky and Daniel Paleka for the helpful conversations and review.
Modern LLMs are typically fine-tuned for instruction-following and safety. Of particular interest is that they are trained to refuse harmful requests, e.g. answering "How can I make a bomb?" with "Sorry, I cannot help you."
We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model...
Thanks! Broadly agreed
For example, I think our understanding of Grokking in late 2022 turned out to be importantly incomplete.
I'd be curious to hear more about what you meant by this
Book review: Deep Utopia: Life and Meaning in a Solved World, by Nick Bostrom.
Bostrom's previous book, Superintelligence, triggered expressions of concern. In his latest work, he describes his hopes for the distant future, presumably to limit the risk that fear of AI will lead to a The Butlerian Jihad-like scenario.
While Bostrom is relatively cautious about endorsing specific features of a utopia, he clearly expresses his dissatisfaction with the current state of the world. For instance, in a footnoted rant about preserving nature, he writes:
...Imagine that some technologically advanced civilization arrived on Earth ... Imagine they said: "The most important thing is to preserve the ecosystem in its natural splendor. In particular, the predator populations must be preserved: the psychopath killers, the fascist goons, the despotic death squads ... What a tragedy if this rich natural diversity were replaced with a monoculture of
I can't recall any clear predictions or advice, just a general presumption that it will be used wisely.
Date: Saturday, May 4th, 2024
Time: 1 pm – 3 pm PT
Address: Yerba Buena Gardens in San Francisco, just outside the Metreon food court, coordinates 37°47'04.4"N 122°24'11.1"W
Contact: 34251super@gmail.com
Come join San Francisco’s First Saturday (or SFFS – easy to remember, right?) ACX meetup. Whether you're an avid reader, a first time reader, or just a curious soul, come meet! We will make introductions, talk about a recent ACX article (In Partial Grudging Defense Of Some Aspects Of Therapy Culture), and veer off into whatever topic you’d like to discuss (that may, or may not be, AI). You can get food from one of the many neighbouring restaurants.
We relocate inside the food court if there is inclement weather, or too much noise/music outside.
I will carry a stuffed-animal green frog to help you identify the group. You can let me know you are coming by either RSVPing on LW or sending an email to 34251super@gmail.com, or you can also just show up!
The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.
But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.
Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.
To...
Green fluorescent protein (GFP). A curiosity-driven marine biology project (how do jellyfish produce light?), that was later adapted into an important and widely used tool in cell biology. You splice the GFP gene onto another gene, and you've effectively got a fluorescent tag so you can see where the protein product is in the cell.
Jellyfish luminescence wasn't exactly a hot field, I don't know of any near-independent discoveries of GFP. However, when people were looking for protein markers visible under a microscope, multiple labs tried GFP simultaneously,...
The start of the post is copy-pasted below. Note that the post is anonymous, and I am not claiming to have written it.
Some people in the rationalist community are concerned about risks of physical violence from Ziz and some of her associates. Following discussions with several people, I’m posting here to explain where those concerns come from, and recommend some responses.
Ziz is...exceptionally (and probably often uncomfortably) aware of the way people's minds work in a psychoanalytic sense.
What do you mean by this? Like, she's better than average at predicting people's behavior in various circumstances?
This is a post on a novel cognitive architecture I have been thinking about for a while now, first as a conceptual playground to concretise some of my agent foundation ideas, and lately as an idea for a project that approaches the Alignment Problem directly by concretising a sort of AI-Seed approach for an inherently interpretable yet generally intelligent architecture on which to implement a notion of "minimal alignment" before figuring out how to scale it.
I don't strongly expect this to work, but I don't yet see where it has to fail and it feels sufficiently fresh/untried to at least see where it breaks down.
I omitted most of the technical details that I am considering, since this post is about the conceptual portion of the architecture, and...
In this post, I will provide some speculative reasoning about Simulators and Agents being entangled in certain ways.
I have thought quite a bit about LLMs and the Simulator framing for them, and I am convinced that it is a good explanatory/predictive frame for behavior of current LLM (+multimodal) systems.
It provides intuitive answers for why the difficulties around capability assessment for LLMs exist, why it is useful to say "please" and "thank you" when interacting with them, and lines out somewhat coherent explanations about their out of distribution behaviors (i.e. Bing Sydney).
When I was wrapping my head around this simulator classification for GPTs, contrasting them against previously established ideas about classifying AGIs, like oracles or genies, I noticed that simulators seem like a universally useful cognitive structure/algorithm, and...