Arjun Panickssery - LessWrong

If I understand the term "double crux" correctly, to say that something is a double crux is just to say that it is "crucial to our disagreement."

Arjun Panickssery's Shortform

Arjun Panickssery5d3-6

Quick Take: People should not say the word "cruxy" when already there exists the word "crucial." | Twitter

Crucial sometimes just means "important" but has a primary meaning of "decisive" or "pivotal" (it also derives from the word "crux"). This is what's meant by a "crucial battle" or "crucial role" or "crucial game (in a tournament)" and so on.

So if Alice and Bob agree that Alice will work hard on her upcoming exam, but only Bob thinks that she will fail her exam—because he thinks that she will study the wrong topics (h/t @Saul Munn)—then they might have this conversation:

Bob: You'll fail
Alice: I won't, because I'll study hard.
Bob: That's not crucial to our disagreement.

Express interest in an "FHI of the West"

Arjun Panickssery11d20

The older nickname was "Cornell of the West." Stanford was modeled after Cornell.

[Fiction] A Confession

Arjun Panickssery16d120

This story is inspired by The Trouble With Being Born, a collection of aphorisms by the Romanian philosopher Emil Cioran (discussed more here), including the following aphorisms:

A stranger comes and tells me he has killed someone. He is not wanted by the police because no one suspects him. I am the only one who knows he is the killer. What am I to do? I lack the courage as well as the treachery (for he has entrusted me with a secret—and what a secret!) to turn him in. I feel I am his accomplice, and resign myself to being arrested and punished as such. At the same time, I tell myself this would be too ridiculous. Perhaps I shall go and denounce him all the same. And so on, until I wake up.

The interminable is the specialty of the indecisive. They cannot mark life out for their own, and still less their dreams, in which they perpetuate their hesitations, pusillanimities, scruples. They are ideally qualified for nightmare.

Here on the coast of Normandy, at this hour of the morning, I needed no one. The very gulls’ presence bothered me: I drove them off with stones. And hearing their supernatural shrieks, I realized that that was just what I wanted, that only the Sinister could soothe me, and that it was for such a confrontation that I had got up before dawn.

Your LLM Judge may be biased

Arjun Panickssery1mo10

See "Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions" (Pezeshkpour and Hruschka, 2023):

Large Language Models (LLMs) have demonstrated remarkable capabilities in various NLP tasks. However, previous works have shown these models are sensitive towards prompt wording, and few-shot demonstrations and their order, posing challenges to fair assessment of these models. As these models become more powerful, it becomes imperative to understand and address these limitations. In this paper, we focus on LLMs robustness on the task of multiple-choice questions -- commonly adopted task to study reasoning and fact-retrieving capability of LLMs. Investigating the sensitivity of LLMs towards the order of options in multiple-choice questions, we demonstrate a considerable performance gap of approximately 13% to 75% in LLMs on different benchmarks, when answer options are reordered, even when using demonstrations in a few-shot setting. Through a detailed analysis, we conjecture that this sensitivity arises when LLMs are uncertain about the prediction between the top-2/3 choices, and specific options placements may favor certain prediction between those top choices depending on the question caused by positional bias. We also identify patterns in top-2 choices that amplify or mitigate the model's bias toward option placement. We found that for amplifying bias, the optimal strategy involves positioning the top two choices as the first and last options. Conversely, to mitigate bias, we recommend placing these choices among the adjacent options. To validate our conjecture, we conduct various experiments and adopt two approaches to calibrate LLMs' predictions, leading to up to 8 percentage points improvement across different models and benchmarks.

Also "Benchmarking Cognitive Biases in Large Language Models as Evaluators" (Koo et al., 2023):

Order Bias is an evaluation bias we observe when a model tends to favor the model based on the order of the responses rather than their content quality. Order bias has been extensively studied (Jung et al., 2019; Wang et al., 2023a; Zheng et al., 2023), and it is well-known that state-of-the-art models are still often influenced by the ordering of the responses in their evaluations. To verify the existence of order bias, we prompt both orderings of each pair and count the evaluation as a “first order” or “last order” bias if the evaluator chooses the first ordered (or last ordered) output in both arrangements respectively.

The Worst Form Of Government (Except For Everything Else We've Tried)

Arjun Panickssery1mo10

Do non-elite groups factor into OP's analysis. I interpreted is as inter-elite veto, e.g. between the regional factions of the U.S. or between religious factions, and less about any "people who didn't go to Oxbridge and don't live in London"-type factions.

I can't think of examples where a movement that wasn't elite-led destabilized and successfully destroyed a regime, but I might be cheating in the way I define "elites" or "led."

The Worst Form Of Government (Except For Everything Else We've Tried)

Arjun Panickssery1mo30

But, as other commenters have noted, the UK government does not have structural checks and balances. In my understanding, what they have instead is a bizarrely, miraculously strong respect for precedent and consensus about what "is constitutional" despite (or maybe because of?) the lack of a written constitution. For the UK, and maybe other, less-established democracies (i.e. all of them), I'm tempted to attribute this to the "repeated game" nature of politics: when your democracy has been around long enough, you come to expect that you and the other faction will share power (roughly at 50-50 for median voter theorem reasons), so voices within your own faction start saying "well, hold on, we actually do want to keep the norms around."

The UK is also a small country, both literally, having a 4-5x smaller population than e.g. France during several centuries of Parliamentary rule before the Second Industrial Revolution, and figuratively, since they have an unusually concentrated elite that mostly goes to the same university and lives in London (whose metro area has 20% of the country's population).

https://www.youtube.com/watch?app=desktop&v=dkhcNoMNHA0

Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search

Arjun Panickssery3mo41

Changes my view, edited the post.

Thanks for taking the time to respond; I didn't figure the post would get so much reach.

Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search

Arjun Panickssery3mo10

Wow, thanks for replying.

If the model has beaten GMs at all, then it can only be so weak, right? I'm glad I didn't make stronger claims than I did.

I think my questions about what humans-who-challenge-bots are like was fair, and the point about smurfing is interesting. I'd be interested in other impressions you have about those players.

Is the model's Lichess profile/game history available?

More Hyphenation

Arjun Panickssery3mo80

Powerful

LESSWRONG
LW

Posts

Wiki Contributions

Comments