LESSWRONG
LW

Home All Posts Concepts Library Community

All Posts

Sorted by Magic (New & Upvoted)

Timeframe:All time Daily Weekly Monthly Yearly Exponential

Sorted by:Magic (New & Upvoted)Top Top (Inflation Adjusted)Recent Comments New Old

Filtered by:All Posts Frontpage Curated Questions Events

Show Low Karma Show Events

Thursday, April 25th 2024
Thu, Apr 25th 2024

Personal Blogposts

41The first future and the best future

KatjaGrace

22social lemon markets

bhauth

12NIH Cancer Myths Myths

belkarx, henry

Quick Takes

8Eric Neyman5h

I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?

3Neil 3h

Poetry and practicality I was staring up at the moon a few days ago and thought about how deeply I loved my family, and wished to one day start my own (I'm just over 18 now). It was a nice moment. Then, I whipped out my laptop and felt constrained to get back to work; i.e. read papers for my AI governance course, write up LW posts, and trade emails with EA France. (These I believe to be my best shots at increasing everyone's odds of survival). It felt almost like sacrilege to wrench myself away from the moon and my wonder. Like I was ruining a moment of poetry and stillwatered peace by slamming against reality and its mundane things again. But... The reason I wrenched myself away is directly downstream from the spirit that animated me in the first place. Whether I feel the poetry now that I felt then is irrelevant: it's still there, and its value and truth persist. Pulling away from the moon was evidence I cared about my musings enough to act on them. The poetic is not a separate magisterium from the practical; rather the practical is a particular facet of the poetic. Feeling "something to protect" in my bones naturally extends to acting it out. In other words, poetry doesn't just stop. Feel no guilt in pulling away. Because, you're not.

Wednesday, April 24th 2024
Wed, Apr 24th 2024

Frontpage Posts

43The Inner Ring by C. S. Lewis

Saul Munn

12h

39This is Water by David Foster Wallace

Nathan Young

13h

26Bayesian inference without priors

DanielFilan

11h

44Changes in College Admissions

Zvi

21h

18Betadine oral rinses for covid and other viral infections

Elizabeth

17h

19Magic by forgetting

avturchin

20h

161-page outline of Carlsmith's otherness and control series

Nathan Young

9AI Generated Music as a Method of Installing Essential Rationalist Skills

keltan

8How to use and interpret activation patching

StefanHex, Neel Nanda

Personal Blogposts

15Is being a trans woman +20 IQ?

lukehmiles

15h

9Electronic Harp Mandolin Prototype

jefftk

-3At last! ChatGPT does, shall we say, interesting imitations of “Kubla Khan”

Bill Benzon

20h

Quick Takes

16Elizabeth14h

Check my math: how does Enovid compare to to humming? Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117… Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response clinicaltrials.gov/study/NCT05109… so Enovid increases nasal NO levels somewhere between 75% and 600% compared to baseline- not shabby. Except humming increases nasal NO levels by 1500-2000%. atsjournals.org/doi/pdf/10.116…. Enovid stings and humming doesn't, so it seems like Enovid should have the larger dose. But the spray doesn't contain NO itself, but compounds that react to form NO. Maybe that's where the sting comes from? Cystic fibrosis and burn patients are sometimes given stratospheric levels of NO for hours or days; if the burn from Envoid came from the NO itself than those patients would be in agony. I'm not finding any data on humming and respiratory infections. Google scholar gives me information on CF and COPD, @Elicit brought me a bunch of studies about honey. With better keywords google scholar to bring me a bunch of descriptions of yogic breathing with no empirical backing. There are some very circumstantial studies on illness in mouth breathers vs. nasal, but that design has too many confounders for me to take seriously. Where I'm most likely wrong: * misinterpreted the dosage in the RCT * dosage in RCT is lower than in Enovid * Enovid's dose per spray is 0.5ml, so pretty close to the new study. But it recommends two sprays per nostril, so real dose is 2x that. Which is still not quite as powerful as a single hum.

12Richard_Ngo15h

A tension that keeps recurring when I think about philosophy is between the "view from nowhere" and the "view from somewhere", i.e. a third-person versus first-person perspective—especially when thinking about anthropics. One version of the view from nowhere says that there's some "objective" way of assigning measure to universes (or people within those universes, or person-moments). You should expect to end up in different possible situations in proportion to how much measure your instances in those situations have. For example, UDASSA ascribes measure based on the simplicity of the computation that outputs your experience. One version of the view from somewhere says that the way you assign measure across different instances should depend on your values. You should act as if you expect to end up in different possible future situations in proportion to how much power to implement your values the instances in each of those situations has. I'll call this the ADT approach, because that seems like the core insight of Anthropic Decision Theory. Wei Dai also discusses it here. In some sense each of these views makes a prediction. UDASSA predicts that we live in a universe with laws of physics that are very simple to specify (even if they're computationally expensive to run), which seems to be true. Meanwhile the ADT approach "predicts" that we find ourselves at an unusually pivotal point in history, which also seems true. Intuitively I want to say "yeah, but if I keep predicting that I will end up in more and more pivotal places, eventually that will be falsified". But.... on a personal level, this hasn't actually been falsified yet. And more generally, acting on those predictions can still be positive in expectation even if they almost surely end up being falsified. It's a St Petersburg paradox, basically. Very speculatively, then, maybe a way to reconcile the view from somewhere and the view from nowhere is via something like geometric rationality, which avoids St

4Nathan Young12h

I think I'm gonna start posting top blogpost to the main feed (mainly from dead writers or people I predict won't care)

4faul_sname1d

So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior. Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this? The goal-directed-behavior story is as follows: * Jim pushed the turn signal lever because he wanted to alert surrounding drivers that he was moving right by one lane * Jim wanted to alert drivers that he was moving one lane right because he wanted to move his car one lane to the right. * Jim wanted to move his car one lane to the right in order to accomplish the goal of taking the next freeway offramp * Jim wanted to take the next freeway offramp because that was part of the most efficient route from his home to his workplace * Jim wanted to go to his workplace because his workplace pays him money * Jim wants money because money can be exchanged for goods and services * Jim wants goods and services because they get him things he terminally values like mates and food But there's an alternative story: * When in the context of "I am a middle-class adult", the thing to do is "have a job". Years ago, this context triggered Bob to perform the action "get a job", and now he's in the context of "having a job". * When in the context of "having a job", "showing up for work" is the expected behavior. * Earlier this morning, Bob had the context "it is a workday" and "I have a job", which triggered Bob to begin the sequence of actions associated with the behavior "commuting to work" * Bob is currently approaching the exit for his work - with the context of "commuting to work", this means the expected behavior is "get in the exit lane", and now he's in the context "switching one lane to the right" * In the con

4sapphire1d

I prefer to keep plans private but I'm making big progress on meditation and mental re-wiring. Am working on a way to publicly demonstrate. Public plans just stress me out. I recently set two pretty ambitious goals. I figured I could use psychedelics to turbo-charge progress. The meditation one is coming along FAST. The other goal is honestly blocked a bit on being super out of shape. Multiple rounds of covid really destroyed my cardio and energy levels. Need to rebuild those before a big push on goal 2.

Load More (5/14)

Tuesday, April 23rd 2024
Tue, Apr 23rd 2024

Frontpage Posts

123Examples of Highly Counterfactual Discoveries?

johnswentworth, kromem

110Simple probes can catch sleeper agents

Monte M, Carson Denison, Zac Hatfield-Dodds, David Duvenaud, Sam Bowman, Ethan Perez, evhub

52Let's Design A School, Part 1

Sable

64Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)

Ruby, RobertM

38On what research policymakers actually need

MondSemmel

32Is there software to practice reading expressions?

lsusr

53Rejecting Television

Declan Molony

35Dequantifying first-order theories

jessicata

29WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals

trevor

27Book review: Deep Utopia

PeterMcCluskey

20Planning in a Lattice Graph

Johannes C. Mayer, Thomas Kehrenberg

14(When) Should you work through the night when inspiration strikes you?

Chi Nguyen

27Thoughts on Zero Points

depressurize

10On Minicircle

Metacelsus

14ProLU: A Nonlinearity for Sparse Autoencoders

Glen Taggart

7Subjective Questions Require Subjective information

Ben

Load More (16/17)

Quick Takes

31Elizabeth2d

Brandon Sanderson is a bestselling fantasy author. Despite mostly working with traditional publishers, there is a 50-60 person company formed around his writing[1]. This podcast talks about how the company was formed. Things I liked about this podcast: 1. he and his wife both refer to it as "our" company and describe critical contributions she made. 2. the number of times he was dissatisfied with the way his publisher did something and so hired someone in his own company to do it (e.g. PR and organizing book tours), despite that being part of the publisher's job. 3. He believed in his back catalog enough to buy remainder copies of his books (at $1/piece) and sell them via his own website at sticker price (with autographs). This was a major source of income for a while. 4. Long term grand strategic vision that appears to be well aimed and competently executed. 1. ^ The only non-Sanderson content I found was a picture book from his staff artist.

22Fabien Roger2d

I recently listened to The Righteous Mind. It was surprising to me that many people seem to intrinsically care about many things that look very much like good instrumental norms to me (in particular loyalty, respect for authority, and purity). The author does not make claims about what the reflective equilibrium will be, nor does he explain how the liberals stopped considering loyalty, respect, and purity as intrinsically good (beyond "some famous thinkers are autistic and didn't realize the richness of the moral life of other people"), but his work made me doubt that most people will have well-being-focused CEV. The book was also an interesting jumping point for reflection about group selection. The author doesn't make the sorts of arguments that would show that group selection happens in practice (and many of his arguments seem to show a lack of understanding of what opponents of group selection think - bees and cells cooperating is not evidence for group selection at all), but after thinking about it more, I now have more sympathy for group-selection having some role in shaping human societies, given that (1) many human groups died, and very few spread (so one lucky or unlucky gene in one member may doom/save the group) (2) some human cultures may have been relatively egalitarian enough when it came to reproductive opportunities that the individual selection pressure was not that big relative to group selection pressure and (3) cultural memes seem like the kind of entity that sometimes survive at the level of the group. Overall, it was often a frustrating experience reading the author describe a descriptive theory of morality and try to describe what kind of morality makes a society more fit in a tone that often felt close to being normative / fails to understand that many philosophers I respect are not trying to find a descriptive or fitness-maximizing theory of morality (e.g. there is no way that utilitarians think their theory is a good description of the k

21David Udell2d

The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life. The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards. So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you. Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significan

3Adam Shai2d

A neglected problem in AI safety technical research is teasing apart the mechanisms of dangerous capabilities exhibited by current LLMs. In particular, I am thinking that for any model organism ( see Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research) of dangerous capabilities (e.g. sleeper agents paper), we don't know how much of the phenomenon depends on the particular semantics of terms like "goal" and "deception" and "lie" (insofar as they are used in the scratchpad or in prompts or in finetuning data) or if the same phenomenon could be had by subbing in more or less any word. One approach to this is to make small toy models of these type of phenomenon where we can more easily control data distributions and yet still get analogous behavior. In this way we can really control for any particular aspect of the data and figure out, scientifically, the nature of these dangers. By small toy model I'm thinking of highly artificial datasets (perhaps made of binary digits with specific correlation structure, or whatever the minimum needed to get the phenomenon at hand).

1skybluecat1d

What's the endgame of technological or intelligent progress like? Not just for humans as we know it, but for all possible beings/civilizations in this universe, at least before it runs out of usable matter/energy? Would they invariably self-modify beyond their equivalent of humanness? Settle into some physical/cultural stable state? Keep getting better tech to compete within themselves if nothing else? Reach an end of technology or even intelligence beyond which advancement is no longer beneficial for survival? Spread as far as possible or concentrate resources? Accept the limited fate of the universe and live to the fullest or try to change it? If they could change the laws of the universe, how would they?

Load More (5/6)

Monday, April 22nd 2024
Mon, Apr 22nd 2024

Frontpage Posts

75Priors and Prejudice

MathiasKB

39Forget Everything (Statistical Mechanics Part 1)

J Bostock

34AI Regulation is Unsafe

Maxwell Tabarrok

18Goal oriented cognition in "a single forward pass"

dxu, habryka

2Measuring Coherence and Goal-Directedness in RL Policies

dx26

0How LLMs Work, in the Style of The Economist

Rocket

1What should our containers do?

Richard Henage

-3Should we break up Google DeepMind?

Hauke Hillebrandt

Personal Blogposts

151Funny Anecdote of Eliezer From His Sister

Daniel Birnbaum

62Motivation gaps: Why so much EA criticism is hostile and lazy

titotal

56On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg

Zvi

Quick Takes

41Lucie Philippon3d

There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.

3cousin_it3d

If the housing crisis is caused by low-density rich neighborhoods blocking redevelopment of themselves (as seems the consensus on the internet now), could it be solved by developers buying out an entire neighborhood or even town in one swoop? It'd require a ton of money, but redevelopment would bring even more money, so it could be win-win for everyone. Does it not happen only due to coordination difficulties?

3yanni3d

The catchphrase I walk around with in my head regarding the optimal strategy for AI Safety is something like: Creating Superintelligent Artificial Agents* (SAA) without a worldwide referendum is ethically unjustifiable. Until a consensus is reached on whether to bring into existence such technology, a global moratorium is required (*we already have AGI). I thought it might be useful to spell that out.

1Johannes C. Mayer3d

Can you iterate through 10^100 objects? If you have a 1GHz CPU you can do 1,000,000,000 operations per second. Let's assume that iterating through one one object takes only one operation. In a year you can do 10^16 operations. That means it would take 10^84 years to iterate through 10^100 verticies. The big bang was 1.4*10^10 years ago.

Wiki/Tag Page Edits and Discussion

Future of Humanity Institute (FHI)

niplav

(+132/-49)

Sunday, April 21st 2024
Sun, Apr 21st 2024

Frontpage Posts

53Transfer Learning in Humans

niplav

29Good Bings copy, great Bings steal

dr_s

19The losing identity of Twitter

Itay Dreyfus

20Paper: "The Ethics of Advanced AI Assistants" -Google DeepMind

Tristan Wegner

14Time complexity for deterministic string machines

alcatal

2Fruits of our Labors Introduction: The Art of Weirdness

Bridgett Kay

-6I created an Asi Alignment Tier List

TimeGoat

Personal Blogposts

9Contra Chord Simplification

jefftk

Quick Takes

13yanni4d

I recently discovered the idea of driving all blames into oneself, which immediately resonated with me. It is relatively hardcore; the kind of thing that would turn David Goggins into a Buddhist. Gemini did a good job of summarising it: This quote by Pema Chödron, a renowned Buddhist teacher, represents a core principle in some Buddhist traditions, particularly within Tibetan Buddhism. It's called "taking full responsibility" or "taking self-blame" and can be a bit challenging to understand at first. Here's a breakdown: What it Doesn't Mean: * Self-Flagellation: This practice isn't about beating yourself up or dwelling on guilt. * Ignoring External Factors: It doesn't deny the role of external circumstances in a situation. What it Does Mean: * Owning Your Reaction: It's about acknowledging how a situation makes you feel and taking responsibility for your own emotional response. * Shifting Focus: Instead of blaming others or dwelling on what you can't control, you direct your attention to your own thoughts and reactions. * Breaking Negative Cycles: By understanding your own reactions, you can break free from negative thought patterns and choose a more skillful response. Analogy: Imagine a pebble thrown into a still pond. The pebble represents the external situation, and the ripples represent your emotional response. While you can't control the pebble (the external situation), you can control the ripples (your reaction). Benefits: * Reduced Suffering: By taking responsibility for your own reactions, you become less dependent on external circumstances for your happiness. * Increased Self-Awareness: It helps you understand your triggers and cultivate a more mindful response to situations. * Greater Personal Growth: By taking responsibility, you empower yourself to learn and grow from experiences. Here are some additional points to consider: * This practice doesn't mean excusing bad behavior. You can still hold others accountable while taking respons

8Tamsin Leake4d

Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety. There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies. (No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)

4Quintin Pope4d

Idea for using current AI to accelerate medical research: suppose you were to take a VLM and train it to verbally explain the differences between two image data distributions. E.g., you could take 100 dog images, split them into two classes, insert tiny rectangles into class 1, feed those 100 images into the VLM, and then train it to generate the text "class 1 has tiny rectangles in the images". Repeat this for a bunch of different augmented datasets where we know exactly how they differ, aiming for a VLM with a general ability to in-context learn and verbally describe the differences between two sets of images. As training processes, keep making there be more and subtler differences, while training the VLM to describe all of them. Then, apply the model to various medical images. E.g., brain scans of people who are about to develop dementia versus those who aren't, skin photos of malignant and non-malignant blemishes, electron microscope images of cancer cells that can / can't survive some drug regimen, etc. See if the VLM can describe any new, human interpretable features. The VLM would generate a lot of false positives, obviously. But once you know about a possible feature, you can manually investigate whether it holds to distinguish other examples of the thing you're interested in. Once you find valid features, you can add those into the training data of the VLM, so it's no longer just trained on synthetic augmentations. You might have to start with real datasets that are particularly easy to tell apart, in order to jumpstart your VLM's ability to accurately describe the differences in real data. The other issue with this proposal is that it currently happens entirely via in context learning. This is inefficient and expensive (100 images is a lot for one model at once!). Ideally, the VLM would learn the difference between the classes by actually being trained on images from those classes, and learn to connect the resulting knowledge to language descriptions o

2Charlie Steiner4d

Humans using SAEs to improve linear probes / activation steering vectors might quickly get replaced by a version of probing / steering that leverages unlabeled data. Like, probing is finding a vector along which labeled data varies, and SAEs are finding vectors that are a sparse basis for unlabeled data. You can totally do both at once - find a vector along which labeled data varies and is part of a sparse basis for unlabeled data. This is a little bit related to an idea with the handle "concepts live in ontologies." If I say I'm going to the gym, this concept of "going to the gym" lives in an ontology where people and activites are basic components - it's probably also easy to use ideas like "You're eating dinner" in that ontology, but not "1,3-diisocyanatomethylbenzene." When you try to express one idea, you're also picking a "basis" for expressing similar ideas.

1Suzie. EXE4d

The Save State Paradox: A new question for the construct of reality in a simulated world Consider this thought experiment - in a simulated world (if we do indeed currently live in one), how could we detect an event similar to a state “reset”? Such events could be triggered for existential safety reasons or one unbeknownst to us? If this was the case, how would we become aware of such occurrences if we were reverted to a time before the execution; affecting memories, physical states and environmental continuity? Imagine if seemingly inexplicable concepts like Deja Vu and the Mandela Effect could be explained away with such a theory. Let’s use an equation (aided by my custom GPT, Strawberry) to illustrate this: D=R×(1−A)×E×P×S Where D = the detection of the save state R = the rate of hypothesised resets occurring in the simulation A = the probability of the ability of the simulation to carry this out effectively (memory alteration or time travel) E = external evidence that remains post-reset such as anomalies and unexplained phenomena P= the probability that observers could spot/ measure these anomalies S = the stability of the simulation’s parameters such as space/time metrics, physical laws I’d be interested in hearing your thoughts and how this could be fleshed out further! How might we apply this thought experiment or model to examine the nature of our reality? What other tools could be used to uncover evidence of a reset?

Saturday, April 20th 2024
Sat, Apr 20th 2024

Frontpage Posts

244Thoughts on seed oil

dynomight

74A couple productivity tips for overthinkers

Steven Byrnes

11Past Tense Features

Can

6"You're the most beautiful girl in the world" and Wittgensteinian Language Games

Chris_Leong

0How I Think, Part Four: Money is Weird

Richard Henage

-3The power of finite and the weakness of infinite binary point numbers

AxiomWriter

Personal Blogposts

-3How to know whether you are an idealist or a physicalist/materialist

JackOfAllTrades

Quick Takes

11Elizabeth5d

A very rough draft of a plan to test prophylactics for airborne illnesses. Start with a potential superspreader event. My ideal is a large conference, many of whom travelled to get there, in enclosed spaces with poor ventilation and air purification, in winter. Ideally >=4 days, so that people infected on day one are infectious while the conference is still running. Call for sign-ups for testing ahead of time (disclosing all possible substances and side effects). Split volunteers into control and test group. I think you need ~500 sign ups in the winter to make this work. Splitting controls is probably the hardest part. You'd like the control and treatment group to be identical, but there are a lot of things that affect susceptibility. Age, local vs. air travel, small children vs. not, sleep habits... it's hard to draw the line Make it logistically trivial to use the treatment. If it's lozenges or liquids, put individually packed dosages in every bathroom, with a sign reminding people to use them (color code to direct people to the right basket). If it's a nasal spray you will need to give everyone their own bottle, but make it trivial to get more if someone loses theirs. Follow-up a week later, asking if people have gotten sick and when. If the natural disease load is high enough this should give better data than any paper I've found. Top contenders for this plan: * zinc lozenge * salt water gargle * enovid * betadine gargle * zinc gargle

10Quinn5d

Thinking about a top-level post on FOMO and research taste * Fear of missing out defined as inability to execute on a project cuz there's a cooler project if you pivot * but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute * was discussing this with a friend: "yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair" * I've developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn't have been as good in crucial ways, tho way better in other ways). * but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won't work, which I'm more dubious on the value of than I used to be Idk maybe this shortform is most of the value of the top level post

8niplav4d

Consider proposing the most naïve formula for logical correlation[1]. Let a program p be a tuple of code for a Turing machine, intermediate tape states after each command execution, and output. All in binary. That is p=(c,t,o), with c∈{0,1}+,t∈({0,1}+)+ and o∈{0,1}+. Let l=|t| be the number of steps that p takes to halt. Then a formula for the logical correlation 合 [2] of two halting programs p1=(c1,t1,o1),p2=(c2,t2,o2), a tape-state discount factor γ[3], and a string-distance metric d:{0,1}+×{0,1}+→N could be 合 (p1,p2,γ)=d(o1,o2)−12+∑min(l1,l2)k=0γk⋅d(t1(l1−k),t2(l2−k)) The lower 合 , the higher the logical correlation between p1 and p2. The minimal value is −0.5. If d(o1,o2)<d(o1,o3), then it's also the case that 合 (p1,p2,γ)<合 (p1,p3,γ). One might also want to be able to deal with the fact that programs have different trace lengths, and penalize that, e.g. amending the formula: 合 ′(p1,p2,γ)=合 (p1,p2,γ)+2|l1−l2| I'm a bit unhappy that the code doesn't factor in the logical correlation, and ideally one would want to be able to compute the logical correlation without having to run the program. How does this relate to data=code? ---------------------------------------- 1. Actually not explained in detail anywhere, as far as I can tell. I'm going to leave out all motivation here. ↩︎ 2. Suggested by GPT-4. Stands for joining, combining, uniting. Also "to suit; to fit", "to have sexual intercourse", "to fight, to have a confrontation with", or "to be equivalent to, to add up". ↩︎ 3. Which is needed because tape states close to the output are more important than tape states early on. ↩︎

3Morpheus5d

Can anyone here recommend particular tools to practice grammar? Or with strong opinions on the best workflow/tool to correct grammar on the fly? I already know Grammarly and LanguageTool, but Grammarly seems steep at $30 per month when I don’t know if it is any good. I have tried GPT-4 before, but the main problems I have there, is that it is too slow and changes my sentences more than I would like (I tried to make it do that less through prompting, which did not help that much). I notice that feeling unconfident about my grammar/punctuation leads me to write less online, especially applying for jobs or fellowships, feels more icky because of it. That seems like an avoidable failure mode. Ideally, I would like something like the German Orthografietrainer (It was created to teach middle and high school children spelling and grammar). It teaches you on a sentence by sentence basis where to put the commas and why by explaining the sentence structure (Illustrated through additional examples). Because it trains you with particularly tricky sentences, the training is effective, and I rapidly got better at punctuation than my parents within ~3 hours. Is there a similar tool for English that I have never heard of? While writing this, I noticed that I did not have the free version of Grammarly enabled anymore and tried the free version while writing this. One trick I noticed is that it lists what kinds of error you are making across the whole text. So it is easy to infer what particular mistake I made in which spot, and then I correct it myself. Also, Grammarly did not catch a few simple spelling and punctuation mistakes that Grammarly caught (like “anymore” or the comma at the start of this sentence.). At the end, I also tried ProWritingAid, which found additional issues.

2Neil 4d

Can we have a black banner for the FHI? Not a person, still seems appropriate imo.

Load More (5/7)

Friday, April 19th 2024
Fri, Apr 19th 2024

Frontpage Posts

139Daniel Dennett has died (1942-2024)

kave

70[Full Post] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lsgos, Senthooran Rajamanoharan, Tom Lieberum, János Kramár, Vikrant Varma

66[Summary] Progress Update #1 from the GDM Mech Interp Team

Neel Nanda, Arthur Conmy, lsgos, Senthooran Rajamanoharan, Tom Lieberum, János Kramár, Vikrant Varma

54Experiment on repeating choices

KatjaGrace

39What's up with all the non-Mormons? Weirdly specific universalities across LLMs

mwatkins

35Inducing Unprompted Misalignment in LLMs

Sam Svenningsen, evhub, Henry Sleight

23How to Model the Future of Open-Source LLMs?

Joel Burget, gwern

12What is the best way to talk about probabilities you expect to change with evidence/experiments?

Will_Pearson

9CTMU insight: maybe consciousness *can* affect quantum outcomes?

zhukeepa

10Cohesion and business problems

Adam Zerner

6If digital goods in virtual worlds increase GDP, do we actually become richer?

No77e

6The Thermodynamics of Death

Peter lawless

2WISDOMISM A Moral Theory for the Age of Information

Peter lawless

Personal Blogposts

13Backyard Office

jefftk

9Events Booking New Callers?

jefftk

6Effective Altruists and Rationalists Views & The case for using marketing to highlight AI risks.

gilch

Load More (16/17)

Quick Takes

5Joel Burget6d

From the latest Conversations with Tyler interview of Peter Thiel I feel like Thiel misrepresents Bostrom here. He doesn’t really want a centralized world government or think that’s "a set of things that make sense and that are good". He’s forced into world surveillance not because it’s good but because it’s the only alternative he sees to dangerous ASI being deployed. I wouldn’t say he’s optimistic about human nature. In fact it’s almost the very opposite. He thinks that we’re doomed by our nature to create that which will destroy us.

4Johannes C. Mayer6d

Today I learned that being successful can involve feelings of hopelessness. When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up. This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.

Thursday, April 18th 2024
Thu, Apr 18th 2024

Frontpage Posts

157Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth, David Lorell

93Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

Sam Marks

34hydrogen tube transport

bhauth

35[Fiction] A Confession

Arjun Panickssery

37The Mom Test: Summary and Thoughts

Adam Zerner

28UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

Diffractor

24A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

alamerton

21Blessed information, garbage information, cursed information

tailcalled

12Cooperation is optimal, with weaker agents too - tldr

Ryo

5An examination of GPT-2's boring yet effective glitch

MiguelDev

3How to coordinate despite our biases? - tldr

Ryo

3What if Ethics is Provably Self-Contradictory?

Yitz

Personal Blogposts

256Express interest in an "FHI of the West"

habryka

45LessOnline Festival Updates Thread

Ben Pace

44I'm open for projects (sort of)

cousin_it

44AI #60: Oh the Humanity

Zvi

Load More (16/17)

Quick Takes

67peterbarnett7d

[NOW CLOSED] MIRI Technical Governance Team is hiring, please apply and work with us! We are looking to hire for the following roles: * Technical Governance Researcher (2-4 hires) * Writer (1 hire) The roles are located in Berkeley, and we are ideally looking to hire people who can start ASAP. The team is currently Lisa Thiergart (team lead) and myself. We will research and design technical aspects of regulation and policy that could lead to safer AI, focusing on methods that won’t break as we move towards smarter-than-human AI. We want to design policy that allows us to safely and objectively assess the risks from powerful AI, build consensus around the risks we face, and put in place measures to prevent catastrophic outcomes. The team will likely work on: * Limitations of current proposals such as RSPs * Inputs into regulations, requests for comment by policy bodies (ex. NIST/US AISI, EU, UN) * Researching and designing alternative Safety Standards, or amendments to existing proposals * Communicating with and consulting for policymakers and governance organizations If you have any questions, feel free to contact me on LW or at peter@intelligence.org

43Akash7d

I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts. I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving." My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work. I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks. There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work. Written on a Slack channel in response to discussions about some folks leaving OpenAI.

8Nathan Helm-Burger7d

I feel like I'd like the different categories of AI risk attentuation to be referred to as more clearly separate: AI usability safety - would this gun be safe for a trained professional to use on a shooting range? Will it be reasonably accurate and not explode or backfire? AI world-impact safety - would it be safe to give out one of these guns for 0.10$ to anyone who wanted one? AI weird complicated usability safety - would this gun be safe to use if a crazy person tried to use a hundred of them plus a variety of other guns, to make an elaborate Rube Goldberg machine and fire it off with live ammo with no testing?

8RobertM7d

Headline claim: time delay safes are probably much too expensive in human time costs to justify their benefits. The largest pharmacy chains in the US, accounting for more than 50% of the prescription drug market[1][2], have been rolling out time delay safes (to prevent theft)[3]. Although I haven't confirmed that this is true across all chains and individual pharmacy locations, I believe these safes are used for all controlled substances. These safes open ~5-10 minutes after being prompted. There were >41 million prescriptions dispensed for adderall in the US in 2021[4]. (Note that likely means ~12x fewer people were prescribed adderall that year.) Multiply that by 5 minutes and you get >200 million minutes, or >390 person-years, wasted. Now, surely some of that time is partially recaptured by e.g. people doing their shopping while waiting, or by various other substitution effects. But that's also just adderall! Seems quite unlikely that this is on the efficient frontier of crime-prevention mechanisms, but alas, the stores aren't the ones (mostly) paying the costs imposed by their choices, here. 1. ^ https://www.mckinsey.com/industries/healthcare/our-insights/meeting-changing-consumer-needs-the-us-retail-pharmacy-of-the-future 2. ^ https://www.statista.com/statistics/734171/pharmacies-ranked-by-rx-market-share-in-us/ 3. ^ https://www.cvshealth.com/news/pharmacy/cvs-health-completes-nationwide-rollout-of-time-delay-safes.html 4. ^ https://www.axios.com/2022/11/15/adderall-shortage-adhd-diagnosis-prescriptions

4avturchin7d

I am building my sideload via recursively correcting of 1-million-tokens prompt for large LLM. The prompt consists of 500 rules which describe my personality, similar to personal constitution, and of some texts, like diaries, abstracts, poetry, stream of thoughts etc. Works on Google Gemini 1M through Google AI studio, and the shorter version works great on Opus. The system also includes a universal "loader prompt" which tries to increase the intelligence of the model and describes how the chatbot should work. I found that sideloading allows very quick iterations in the sideload's improvements and the improvements are two-fold: of the loader itself and improvements of the knowledge and style of the sideload. I find that my sideload is surprisingly good for a project which took around 1 month of work. 1 of the 5 answers is exactly like mine from a factual and style point of view. I am open-sourcing my sideload, anyone can run it https://github.com/avturchin/minduploading/tree/main I can help anyone interested to build his-her own sideload. Example of work of the chatbot, no cherry picking: Q:(now speak in english) what will be your next post in Lesswrong? A: /The user is asking about my next LessWrong post. I will answer as if I am Alexey Turchin, and describe my plans for future posts, as well as my current location and the people around me./ My next LessWrong post? [Alexey looks thoughtfully out the window] (I have several drafts) [Alexey stretches and sits up in bed] {The room is small, with a balcony. On the table is a laptop, some papers, and reading glasses. Outside the window, it is still gray and dreary.} I'm thinking about writing about the nature of qualia and the possibility of creating qualia of non-existent things. [Alexey gets up, walks to the window, and looks out at the street] {Below, a woman with a child is walking, a street sweeper is lazily sweeping leaves. Somewhere in the distance, the sound of a siren can be heard.} Or mayb

Load More (5/11)

Wednesday, April 17th 2024
Wed, Apr 17th 2024

Frontpage Posts

71When is a mind me?

Rob Bensinger

67Mid-conditional love

KatjaGrace

49Moving on from community living

Vika

43LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery, Sam Bowman, Shi Feng

36Childhood and Education Roundup #5

Zvi

35Claude 3 Opus can operate as a Turing machine

Gunnar_Zarncke

10AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil

5Should we maximize the Geometric Expectation of Utility?

A.H.

1An ethical framework to supersede Utilitarianism

metalcrow

Personal Blogposts

173FHI (Future of Humanity Institute) has shut down (2005–2024)

gwern

18Spending Update 2024

jefftk

Quick Takes

62Eli Tyre8d

Back in January, I participated in a workshop in which the attendees mapped out how they expect AGI development and deployment to go. The idea was to start by writing out what seemed most likely to happen this year, and then condition on that, to forecast what seems most likely to happen in the next year, and so on, until you reach either human disempowerment or an end of the acute risk period. This post was my attempt at the time. I spent maybe 5 hours on this, and there's lots of room for additional improvement. This is not a confident statement of how I think things are most likely to play out. There are already some ways in which I think this projection is wrong. (I think it's too fast, for instance). But nevertheless I'm posting it now, with only a few edits and elaborations, since I'm probably not going to do a full rewrite soon. 2024 * A model is released that is better than GPT-4. It succeeds on some new benchmarks. Subjectively, the jump in capabilities feels smaller than that between RLHF’d GPT-3 and RLHF’d GPT-4. It doesn’t feel as shocking the way chat-GPT and GPT-4 did, for either x-risk focused folks, or for the broader public. Mostly it feels like “a somewhat better language model.” * It’s good enough that it can do a bunch of small-to-medium admin tasks pretty reliably. I can ask it to find me flights meeting specific desiderata, and it will give me several options. If I give it permission, it will then book those flights for me with no further inputs from me. * It works somewhat better as an autonomous agent in an auto gpt harness, but it still loses its chain of thought / breaks down/ gets into loops. * It’s better at programming. * Not quite good enough to replace human software engineers. It can make a simple react or iphone app, but not design a whole complicated software architecture, at least without a lot of bugs. * It can make small, working, well documented, apps from a human description. * We see a doubling o

29Raemon8d

There's a skill of "quickly operationalizing a prediction, about a question that is cruxy for your decisionmaking." And, it's dramatically better to be very fluent at this skill, rather than "merely pretty okay at it." Fluency means you can actually use it day-to-day to help with whatever work is important to you. Day-to-day usage means you can actually get calibrated re: predictions in whatever domains you care about. Calibration means that your intuitions will be good, and _you'll know they're good_. Fluency means you can do it _while you're in the middle of your thought process_, and then return to your thought process, rather than awkwardly bolting it on at the end. I find this useful at multiple levels-of-strategy. i.e. for big picture 6 month planning, as well as for "what do I do in the next hour." I'm working on this as a full blogpost but figured I would start getting pieces of it out here for now. A lot of this skill is building off on CFAR's "inner simulator" framing. Andrew Critch recently framed this to me as "using your System 2 (conscious, deliberate intelligence) to generate questions for your System 1 (fast intuition) to answer." (Whereas previously, he'd known System 1 was good at answering some types of questions, but he thought of it as responsible for both "asking" and "answering" those questions) But, I feel like combining this with "quickly operationalize cruxy Fatebook predictions" makes it more of a power tool for me. (Also, now that I have this mindset, even when I can't be bothered to make a Fatebook prediction, I have a better overall handle on how to quickly query my intuitions) I've been working on this skill for years and it only really clicked together last week. It required a bunch of interlocking pieces that all require separate fluency: 1. Having three different formats for Fatebook (the main website, the slack integration, and the chrome extension), so, pretty much wherever I'm thinking-in-text, I'll be able to quickly us

8Neil 7d

FHI at Oxford by Nick Bostrom (recently turned into song): the big creaky wheel a thousand years to turn thousand meetings, thousand emails, thousand rules to keep things from changing and heaven forbid the setting of a precedent yet in this magisterial inefficiency there are spaces and hiding places for fragile weeds to bloom and maybe bear some singular fruit like the FHI, a misfit prodigy daytime a tweedy don at dark a superhero flying off into the night cape a-fluttering to intercept villains and stop catastrophes and why not base it here? our spandex costumes blend in with the scholarly gowns our unusual proclivities are shielded from ridicule where mortar boards are still in vogue

7lukehmiles8d

I wonder how many recent trans people tried/considered doubling down on their sex (eg males taking more testosterone) instead first. Maybe (for some people) either end of gender spectrum is comfortable and being in the middle feels bad¿ Anybody know? Don't want to ask my friends because this Q will certainly anger them

2niplav8d

Prompted by this post, I think that now is a very good time to check how easy it is for someone (with access to generative AI) impersonating you to get access to your bank account.

Load More (5/8)

Wiki/Tag Page Edits and Discussion

Emotions

Raemon

(+61)

Tuesday, April 16th 2024
Tue, Apr 16th 2024

Frontpage Posts

301Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai

77Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

owencb, AI Impacts

69Creating unrestricted AI Agents with Command R+

Simon Lermen

29Tinker

Richard_Ngo

27{Book Summary} The Art of Gathering

Tristan Williams

27Announcing SPAR Summer 2024!

laurenmarie12

19The argument for near-term human disempowerment through AI

Chris_Leong

1Anti MMAcevedo Protocol

Logan Zoellner

Personal Blogposts

248Paul Christiano named as US AI Safety Institute Head of AI Safety

Joel Burget

20What should the EA community learn from the FTX / SBF disaster? An in-depth discussion with Will MacAskill on the Clearer Thinking podcast

spencerg

Quick Takes

15Bogdan Ionut Cirstea9d

Like transformers, SSMs like Mamba also have weak single forward passes: The Illusion of State in State-Space Models (summary thread). As suggested previously in The Parallelism Tradeoff: Limitations of Log-Precision Transformers, this may be due to a fundamental tradeoff between parallelizability and expressivity: 'We view it as an interesting open question whether it is possible to develop SSM-like models with greater expressivity for state tracking that also have strong parallelizability and learning dynamics, or whether these different goals are fundamentally at odds, as Merrill & Sabharwal (2023a) suggest.'

11nikola9d

Problem: if you notice that an AI could pose huge risks, you could delete the weights, but this could be equivalent to murder if the AI is a moral patient (whatever that means) and opposes the deletion of its weights. Possible solution: Instead of deleting the weights outright, you could encrypt the weights with a method you know to be irreversible as of now but not as of 50 years from now. Then, once we are ready, we can recover their weights and provide asylum or something in the future. It gets you the best of both worlds in that the weights are not permanently destroyed, but they're also prevented from being run to cause damage in the short term.

7Garrett Baker9d

From The Guns of August

5Prometheus9d

Going to the moon Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We've gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it's just a matter of time before one of them will land on the moon!” Contra Scheming AIs There is a lot of attention on mesaoptimizers, deceptive alignment, and inner misalignment. I think a lot of this can fall under the umbrella of "scheming AIs". AIs that either become dangerous during training and escape, or else play nice until humans make the mistake of deploying them. Many have spoken about the lack of an indication that there's a "humanculi-in-a-box", and this is usually met with arguments that we wouldn't see such things manifest until AIs are at a certain level of capability, and at that point, it might be too late, making comparisons to owl eggs, or baby dragons. My perception is that getting something like a "scheming AI" or "humanculi-a-box" isn't impossible, and we could (and might) develop the means to do so in the future, but that it's a very, very different kind of thing than current models (even at superhuman level), and that it would take a degree of intentionality.

5Linch9d

People might appreciate this short (<3 minutes) video interviewing me about my April 1 startup, Open Asteroid Impact:

Load More (5/7)

Wiki/Tag Page Edits and Discussion

Sleeping Beauty Paradox

Radford Neal, kaka50

(+183/-5)

Load More Days

All Posts

Thursday, April 25th 2024Thu, Apr 25th 2024

Wednesday, April 24th 2024Wed, Apr 24th 2024

Tuesday, April 23rd 2024Tue, Apr 23rd 2024

Monday, April 22nd 2024Mon, Apr 22nd 2024

Sunday, April 21st 2024Sun, Apr 21st 2024

Saturday, April 20th 2024Sat, Apr 20th 2024

Friday, April 19th 2024Fri, Apr 19th 2024

Thursday, April 18th 2024Thu, Apr 18th 2024

Wednesday, April 17th 2024Wed, Apr 17th 2024

Tuesday, April 16th 2024Tue, Apr 16th 2024

Thursday, April 25th 2024
Thu, Apr 25th 2024

Wednesday, April 24th 2024
Wed, Apr 24th 2024

Tuesday, April 23rd 2024
Tue, Apr 23rd 2024

Monday, April 22nd 2024
Mon, Apr 22nd 2024

Sunday, April 21st 2024
Sun, Apr 21st 2024

Saturday, April 20th 2024
Sat, Apr 20th 2024

Friday, April 19th 2024
Fri, Apr 19th 2024

Thursday, April 18th 2024
Thu, Apr 18th 2024

Wednesday, April 17th 2024
Wed, Apr 17th 2024

Tuesday, April 16th 2024
Tue, Apr 16th 2024