Sequences

Linguistic Freedom: Map and Territory Revisted
INVESTIGATIONS INTO INFINITY

Comments

What did he say that was dishonest in the China article? (It's paywalled).

That's an excellent point.

I agree. I think that's probably a better way of clarifying the confusion that what I wrote.

I was under the impression that this meant that a sufficiently powerful AI would be outer-aligned by default, and that this is what enables several of the kinds of deceptions and other dangers we're worried about.


This would be the case if inner alignment meant what you think it does, but it doesn't.

Is the difference between the goal being specified by humans vs being learned and assumed by the AI itself?

Yeah, outer alignment is focused on whether we can define what we want the AI to learn (ie. write down a reward function). Inner alignment focused on what the learned artifact (the AI) ends up learning to pursue.

See: DeepMind's How undesired goals can arise with correct rewards for an empirical example of inner misalignment.

From a quick skim, that post seems to only be arguing against scheming due to inner misalignment. Let me know if I'm wrong.

I don't think that's quite accurate:  any sufficiently powerful AI will know what we want it to do.

I agree with Gwern. I think it's fairly rare that someone wants to write the whole entry themselves or articles for all concepts in a topic.

It's much more likely that someone just wants to add their own idiosyncratic takes on a topic. For example, I'd love to have a convenient way to write up my own idiosyncratic takes on decision theory. I tried including some of these in the main Wiki, but it (understandably) was reverted.

I expect that one of the main advantages of this style of content would be that you can just write a note without having to bother with an introduction or conclusion.

I also think it would be fairly important (though not at the start) to have a way of upweighting the notes added by particular users.

I agree with Gwern that this may result in more content being added to the main wiki pages when other users are in favour of this.

  1. Users can just create pages corresponding to their own categories
  2. Like Notion we could allow two-way links between pages so users would just tag the category in their own custom inclusions.

Cool idea, but before doing this one obvious inclusion would be to make it easier to tag LW articles, particularly your own articles, in posts by @including them.

I just dumped 100 mana on "no".

This comment indicates a major limitation which makes the result much less impressive.

Yeah, that's a pretty sharp limitation on the result.

I'd love to know if any other AI is able to pass this test when we exclude the tag.

Load More