Davide_Zagami - LessWrong

AI Safety Prerequisites Course: Revamp and New Lessons

Registration and access to the lessons is completely free. Where do you see a paywall?

AI Safety Prerequisites Course: Basic abstract representations of computation

Hi, full time content developer at RAISE here.

The overview page you are referring to (is it this one?) contains just some examples of subjects that we are working on.

1. One of the main goals is making a complete map of what is out there regarding AI Safety, and then recursively create explanations for the concepts it contains. That could fit multiple audiences depending on how deep we are able to go. We have started doing that with IRL and IDA. We are also trying a bottom-up approach with the prerequisite course because why not.

2. Almost the same as reading papers, with clear pointers to references to quickly integrate any missing knowledge. Whether this will be achieved in the best case or in the average case is currently under testing.

3. I don’t know about the absolute amount of time required for that. Keep in mind that this remains to be confirmed, but we have recently started collecting some statistics that suggest it's going to be at least comparatively quicker to read RAISE material, compared to having to search for the right papers plus reading and understanding them. This would be the second main goal.

(Of course, it's also much easier from my position to be engaging/critiquing existing works, than to actually put in the effort to make all of this happen. I don't mean any of the above as an indictment. It's admirable and impressive that y'all have coordinated to make this happen at all!)

Thanks :)

Duplication versus probability

Davide_Zagami6y20

After reading this I feel that how one should deal with anthropics strictly depends on goals. I'm not sure exactly which cognitive algorithm does the correct thing in general, but it seems that sometimes it reduces to "standard" probabilities and sometimes not. May I ask what does UDT say about all of this exactly?

Suppose you're rushing an urgent message back to the general of your army, and you fall into a deep hole. Down here, conveniently, there's a lever that can create a duplicate of you outside the hole. You can also break open the lever and use the wiring as ropes to climb to the top. You estimate that the second course of action has a 50% chance of success. What do you do?

Obviously, if the message is your top priority, you pull the lever, and your duplicate will deliver that message. This succeeds all the time, while the wire-rope only has 50% chance of working.

Agree.

after pulling the lever, do you expect to be the copy on the top or the copy on the bottom?

Question without meaning per se, agree.

what if the lever initially creates one copy - and then, five seconds later, creates a million? How do you update your probabilities during these ten seconds?

Before pulling the lever, I commit to do the following.

For the first five seconds, I will think (all copies of me will think) "I am above". This way, 50% of all my copies will be wrong.

For the remaining five seconds, I will think (all copies of me will think) "I am above". This way, one millionth of all my copies will be wrong.

If each of my copies was receiving money for distinguishing which copy he is, then only one millionth of all my copies would be poor.

This sounds suspiciously like updating probabilities the "standard" way, especially if you substitute "copies" with "measure".

Against accusing people of motte and bailey

Davide_Zagami6y10

But suppose that we were discussing something of which there were both sensible and crazy interpretations - held by different people. So:

group A consistently makes and defends sensible claim A1

group B consistently makes and defends crazy claim B1

and maybe even:

group C consistently makes crazy claim B1, but when challenged on it, consistently retreats to defending A1

I may be missing something but it seems to me that:

if C is accused of motte-and-bailey fallacy there is no problem;
if B is accused of motte-and-bailey fallacy there is a problem because they never defended claim A1;
if A is accused of motte-and-bailey fallacy there is a problem because they never defended claim B1.

I hope I'm not being silly: would it be fair to say that you are pointing to the existence of the "accuse people who are not making a motte-and-bailey fallacy of making a motte-and-bailey fallacy" fallacy? Could we call it "straw-motte-and-bailey fallacy" or something?

"Just Suffer Until It Passes"

Davide_Zagami6y40

Ah! I independently invented this strategy some months ago and amazingly it doesn't work for me simply because I'm somehow capable of remaining in the "do nothing" state for literally days. However I thought it was a brilliant idea when I came up with it and I still think it is, I would be surprised if it doesn't work for a lot of people.

Babble

Davide_Zagami6y60

This post made a lot of things click for me. Also it made me realize I am one of those with an "overdeveloped" Prune filter compared to the Babble filter. How could I not notice this? I knew something was wrong all along, but I couldn't pin down what, because I wasn't Babbling enough. I've gotta Babble more. Noted.

"Slow is smooth, and smooth is fast"

Davide_Zagami6y40

Extremely important post in my opinion. The central idea seems true to me. I would like to see if someone has (even anecdotal) evidence for the opposite.

The Mad Scientist Decision Problem

Davide_Zagami6y10

Probably you should have simply said something similar to "increasing portions of physical space have diminishing marginal returns to humans".

LESSWRONG
LW

Posts

Wiki Contributions

Comments