This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Home
All Posts
Concepts
Library
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Cambridge ACX/SSC monthly meetup
Sat Apr 20
•
Cambridgeshire
Virtual AI Safety Unconference 2024
Thu May 23
•
Online
St. Louis ACX Meetups Everywhere Spring 2024
Sat Sep 9
•
St. Louis
ACX Montreal Meetup March 30th 2024 - Spaced Repetition
Sat Mar 30
•
Montréal
Dialogue Matchmaking
Subscribe (RSS/Email)
About
FAQ
All Posts
Sorted by Magic (New & Upvoted)
Timeframe:
All time
Daily
Weekly
Monthly
Yearly
Sorted by:
Magic (New & Upvoted)
Top
Top (Inflation Adjusted)
Recent Comments
New
Old
Filtered by:
All Posts
Frontpage
Curated
Questions
Events
Show Low Karma
Show Events
175
"How could I have thought that faster?"
mesaoptimizer
3d
31
223
My Clients, The Liars
ymeskhout
20d
84
139
Using axis lines for good or evil
dynomight
10d
39
253
Scale Was All We Needed, At First
Gabriel Mukobi
7d
29
104
Social status part 1/2: negotiations over object-level preferences
Steven Byrnes
14d
15
217
CFAR Takeaways: Andrew Critch
Raemon
1mo
62
340
There is way too much serendipity
Malmesbury
2mo
56
203
Believing In
AnnaSalamon
2mo
49
235
The case for ensuring that powerful AIs are controlled
Ω
ryan_greenblatt
,
Buck
2mo
Ω
66
288
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Ω
evhub
,
Carson Denison
,
Meg
,
Monte M
,
David Duvenaud
,
Nicholas Schiefer
,
Ethan Perez
2mo
Ω
94
400
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith
,
kman
3mo
162
138
And All the Shoggoths Merely Players
Zack_M_Davis
1mo
56
264
Gentleness and the artificial Other
Joe Carlsmith
3mo
31
124
Updatelessness doesn't solve most problems
Ω
Martín Soto
1mo
Ω
43
57
Acting Wholesomely
owencb
17d
62
150
Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI
Ω
Jeremy Gillen
,
peterbarnett
2mo
Ω
59
259
Constellations are Younger than Continents
Jeffrey Heninger
3mo
22
287
Speaking to Congressional staffers about AI risk
Akash
,
hath
1mo
23
303
Shallow review of live agendas in alignment & safety
Ω
technicalities
,
Stag
4mo
Ω
69
480
The Talk: a brief explanation of sexual dimorphism
Malmesbury
6mo
72
96
Attitudes about Applied Rationality
Camille Berger
2mo
17
122
A Shutdown Problem Proposal
Ω
johnswentworth
,
David Lorell
2mo
Ω
60
280
Social Dark Matter
[DEACTIVATED] Duncan Sabien
4mo
112
215
What are the results of more parental supervision and less outdoor play?
juliawise
4mo
30
260
The 6D effect: When companies take risks, one email can be very powerful.
scasper
4mo
40
129
Deep atheism and AI risk
Joe Carlsmith
24d
22
245
AI Timelines
Ω
habryka
,
Daniel Kokotajlo
,
Ajeya Cotra
,
Ege Erdil
5mo
Ω
74
324
Inside Views, Impostor Syndrome, and the Great LARP
johnswentworth
6mo
53
145
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Ω
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
,
Rohin Shah
3mo
Ω
21
281
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Ω
Zac Hatfield-Dodds
5mo
Ω
18
240
Book Review: Going Infinite
Zvi
5mo
109
238
Alignment Implications of LLM Successes: a Debate in One Act
Ω
Zack_M_Davis
5mo
Ω
50
131
The Dark Arts
lsusr
,
Lyrongolem
3mo
49
148
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
,
habryka
3mo
53
179
Thinking By The Clock
Screwtape
4mo
27
95
A case for AI alignment being difficult
Ω
jessicata
3mo
Ω
53
661
SolidGoldMagikarp (plus, prompt generation)
Ω
Jessica Rumbelow
,
mwatkins
1y
Ω
204
412
The ants and the grasshopper
Richard_Ngo
10mo
35
137
Moral Reality Check (a short story)
jessicata
4mo
44
305
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
Ω
evhub
,
Nicholas Schiefer
,
Carson Denison
,
Ethan Perez
7mo
Ω
26
455
How much do you believe your results?
Eric Neyman
1y
14
416
Steering GPT-2-XL by adding an activation vector
Ω
TurnTrout
,
Monte M
,
David Udell
,
lisathiergart
,
Ulisse Mini
10mo
Ω
97
249
Dear Self; we need to talk about ambition
Elizabeth
7mo
25
90
Meaning & Agency
Ω
abramdemski
3mo
Ω
17
222
Sum-threshold attacks
TsviBT
5mo
52
870
Where I agree and disagree with Eliezer
Ω
paulfchristiano
2y
Ω
219
887
AGI Ruin: A List of Lethalities
Ω
Eliezer Yudkowsky
2y
Ω
690
157
Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob
,
Robert Miles
,
Holly_Elmore
5mo
30
212
What I would do if I wasn’t at ARC Evals
Ω
LawrenceC
7mo
Ω
8
195
UDT shows that decision theory is more puzzling than ever
Ω
Wei Dai
6mo
Ω
51