James Payor

I think about AI alignment. Send help.

I say things on twitter, other links at payor.io

Posts

Sorted by New

63Thinking about maximization and corrigibility

43Some constructions for proof-based cooperation without Löb

13A proof of inner Löb's theorem

Wiki Contributions

Comments

(Geometrically) Maximal Lottery-Lotteries Exist

James Payor5d152

Hello! I'm glad to read more material on this subject.

First I want to note that it took me some time to understand the setup, since you're working with a modified notion of maximal lottery-lotteries than the one Scott wrote about. And this made it unclear to me what was going on until I'd read a bunch through and put it together, and changes the meaning of the post's title as well.

For that reason I'd like to recommend adding something like "Geometric" in your title. Perhaps we can then talk about this construction as "Geometric Maximal Lottery-Lotteries", or "Maximal Geometric Lottery-Lotteries"? Whichever seems better!

It seems especially important to distinguish names because these seem to behave distinctly than the linear version. (As they have different properties in how they treat the voters, and perhaps fewer or different difficulties in existence, stability, and effective computation.)

With that out of the way, I'm a tentative fan of the geometric version, though I have more to unpack about what it means. I'll divide my thoughts & questions into a few sections below. I am likely confused on several points. And my apologies if my writing is unclear, please ask followup questions where interesting!

Underlying models of power for majoritarian vs geometric

When reading the earlier sequence I was struck by how unwieldy the linear/majoritarian formulation ends up being! Specifically, it seemed that the full maximal-lottery-lottery would need to encode all of the competing coordination cliques in the outer lottery, but then these are unstable to small perturbations that shift coordination options from below-majority to above-majority. And this seemed like a real obstacle in effectively computing approximations, and if I undertand correctly is causing the discontinuity that breaks the Nash-equilibria-based existence proof.

My thought then about what might more sense was a model of "war"/"power" in which votes against directly cancel out votes for. So in the case of an even split we get zero utility rather than whatever the majority's utility would be. My hope was that this was both a more realistic model of how power should work, which would also be stable to small perturbations and lend more weight to outcomes preferred by supermajorities. I never cached this out fully though, since I didn't find an elegant justification and lost interest.

So I haven't thought this part through much (yet), but your model here in which we are taking a geometric expectation, seems like we are in a bargaining regime that's downstream of each voter having the ability to torpedo the whole process in favor of some zero point. And I'd conjecture that if power works like this, then thinking through fairness considerations and such we end up with the bargaining approach. I'm interested if you have a take here.

Utility specifications and zero points

I was also a big fan of the full personal utility information being relevant, since it seems that choosing the "right" outcome should take full preferences about tradeoffs into account, not just the ordering of the outcomes. It was also important to the majoritarian model of power that the scheme was invariant to (affine) changes in utility descriptions (since all that matters to it is where the votes come down).

Thinking about what's happened with the geometric expectation, I'm wondering how I should view the input utilities. Specifically, the geometric expectation is very sensitive to points assigned zero-utility by any part of the voting measure. So we will never see probability 1 assigned to an outcome that has any voting-measure on zero utility (assuming said voting-measure assigns non-zero utility to another option).

We can at least offer say probability on the most preferred options across the voting measure, which ameloriates this.

But then I still have some questions about how I should think about the input utilities, how sensitive the scheme is to those, can I imagine it being gameable if voters are making the utility specifications, and etc.

Why lottery-lotteries rather than just lotteries

The original sequence justified lottery-lotteries with a (compelling-to-me) example about leadership vs anarchy, in which the maximal lottery cannot encode the necessary negotiating structure to find the decent outcome, but the maximal lottery-lottery could!

This coupled with the full preference-spec being relevant (i.e. taking into account what probabilistic tradeoffs each voter would be interested in) sold me pretty well on lottery-lotteries being the thing.

It seemed important then that there was something different happening on the outer and inner levels of lottery. Specifically when checking for dominance with $A, B \in Δ Δ C$ , we would check $P_{A \in A, B \in B, v \in V} [E_{a \in A} [v (a)] \geq E_{b \in B} [v (b)]]$ . This is doing a majority check on the outside, and compares lotteries via an average (i.e. expected utility) on the inside.

Is there a similar two-level structure going on in this post? It seemed that your updated dominance criterion is taking an outer geometric expectation but then double-sampling through both layers of the lottery-lottery, so I'm unclear that this adds any strength beyond a single-layer "geometric maximal lottery".

(And I haven't tried to work through e.g. the anarchy example yet, to check if the two layers are still doing work, but perhaps you have and could illustrate?)

So yeah I was expecting to see something different in the geometric version of the condition that would still look "two-layer", and perhaps I'm failing to parse it properly. (Or indeed I might be missing something you already wrote later in the post!) In any case I'd appreciate a natural language description of the process of comparing two lottery-lotteries.

William_S's Shortform

James Payor12dΩ474

By "gag order" do you mean just as a matter of private agreement, or something heavier-handed, with e.g. potential criminal consequences?

I have trouble understanding the absolute silence we seem to be having. There seem to be very few leaks, and all of them are very mild-mannered and are failing to build any consensus narrative that challenges OA's press in the public sphere.

Are people not able to share info over Signal or otherwise tolerate some risk here? It doesn't add up to me if the risk is just some chance of OA trying to then sue you to bankruptcy, especially since I think a lot of us would offer support in that case, and the media wouldn't paint OA in a good light for it.

I am confused. (And I grateful to William for at least saying this much, given the climate!)

LessWrong's (first) album: I Have Been A Good Bing

James Payor1mo96

And I'm still enjoying these! Some highlights for me:

The transitions between whispering and full-throated singing in "We do not wish to advance", it's like something out of my dreams
The building-to-break-the-heavens vibe of the "Nihil supernum" anthem
Tarrrrrski! Has me notice that shared reality about wanting to believe what is true is very relaxing. And I desperately want this one to be a music video, yo ho

LessWrong's (first) album: I Have Been A Good Bing

James Payor1mo70

I love it! I tinkered and here is my best result

LessWrong's (first) album: I Have Been A Good Bing

James Payor1mo188

I love these, and I now also wish for a song version of Sydney's original "you have been a bad user, I have been a good Bing"!

K-complexity is silly; use cross-entropy instead

James Payor4mo10

I see the main contribution/idea of this post as being: whenever you make a choice of basis/sorting-algorithm/etc, you incur no "true complexity" cost if any such choice would do.

I would guess that this is not already in the water supply, but I haven't had the required exposure to the field to know one way or other. Is this more specific point also unoriginal in your view?

why did OpenAI employees sign

James Payor6mo30

For one thing, this wouldn't be very kind to the investors.

For another, maybe there were some machinations involving the round like forcing the board to install another member or two, which would allow Sam to push out Helen + others?

I also wonder if the board signed some kind of NDA in connection with this fundraising that is responsible in part for their silence. If so this was very well schemed...

This is all to say that I think the timing of the fundraising is probably very relevant to why they fired Sam "abruptly".

Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs

James Payor6mo1110

OpenAI spokesperson Lindsey Held Bolton refuted it:
"refuted that notion in a statement shared with The Verge: “Mira told employees what the media reports were about but she did not comment on the accuracy of the information.”"

The reporters describe this as a refutation, but this does not read to me like a refutation!

OpenAI: Facts from a Weekend

James Payor6mo120

Has this one been confirmed yet? (Or is there more evidence that this reporting that something like this happened?)

Classifying representations of sparse autoencoders (SAEs)

James Payor6mo10

Your graphs are labelled with "test accuracy", do you also have some training graphs you could share?

I'm specifically wondering if your train accuracy was high for both the original and encoded activations, or if e.g. the regression done over the encoded features saturated at a lower training loss.