Frank Mager, in various books, including "Preparing Instructional Objectives", suggests working backward from evidence that would make you conclude that someone is, e.g. a Bayesian Master Rationalist, to the tests (and instructional objectives) for a course of instruction intended to turn someone into a Bayesian Master Rationalist (or whatever you want to turn them into).

3 Levels of Rationality Verification

Previously in seriesSchools Proliferating Without Evidence
Followup to
A Sense That More Is Possible

I strongly suspect that there is a possible art of rationality (attaining the map that reflects the territory, choosing so as to direct reality into regions high in your preference ordering) which goes beyond the skills that are standard, and beyond what any single practitioner singly knows.  I have a sense that more is possible.

The degree to which a group of people can do anything useful about this, will depend overwhelmingly on what methods we can devise to verify our many amazing good ideas.

I suggest stratifying verification methods into 3 levels of usefulness:

  • Reputational
  • Experimental
  • Organizational

If your martial arts master occasionally fights realistic duels (ideally, real duels) against the masters of other schools, and wins or at least doesn't lose too often, then you know that the master's reputation is grounded in reality; you know that your master is not a complete poseur.  The same would go if your school regularly competed against other schools.  You'd be keepin' it real.

Some martial arts fail to compete realistically enough, and their students go down in seconds against real streetfighters.  Other martial arts schools fail to compete at all—except based on charisma and good stories—and their masters decide they have chi powers.  In this latter class we can also place the splintered schools of psychoanalysis.

So even just the basic step of trying to ground reputations in some realistic trial other than charisma and good stories, has tremendous positive effects on a whole field of endeavor.

But that doesn't yet get you a science.  A science requires that you be able to test 100 applications of method A against 100 applications of method B and run statistics on the results.  Experiments have to be replicable and replicated.  This requires standard measurements that can be run on students who've been taught using randomly-assigned alternative methods, not just realistic duels fought between masters using all of their accumulated techniques and strength.

The field of happiness studies was created, more or less, by realizing that asking people "On a scale of 1 to 10, how good do you feel right now?" was a measure that statistically validated well against other ideas for measuring happiness.  And this, despite all skepticism, looks like it's actually a pretty useful measure of some things, if you ask 100 people and average the results.

But suppose you wanted to put happier people in positions of power—pay happy people to train other people to be happier, or employ the happiest at a hedge fund?  Then you're going to need some test that's harder to game than just asking someone "How happy are you?"

This question of verification methods good enough to build organizations, is a huge problem at all levels of modern human society.  If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential?  If you give colleges the power to grant degrees, then do they have an incentive not to fail people?  (I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.)  If a hedge fund posts 20% returns, are they really that much better than the indices, or are they selling puts that will blow up in a down market?

If you have a verification method that can be gamed, the whole field adapts to game it, and loses its purpose.  Colleges turn into tests of whether you can endure the classes.  High schools do nothing but teach to statewide tests.  Hedge funds sell puts to boost their returns.

On the other hand—we still manage to teach engineers, even though our organizational verification methods aren't perfect.  So what perfect or imperfect methods could you use for verifying rationality skills, that would be at least a little resistant to gaming?

(Added:  Measurements with high noise can still be used experimentally, if you randomly assign enough subjects to have an expectation of washing out the variance.  But for the organizational purpose of verifying particular individuals, you need low-noise measurements.)

So I now put to you the question—how do you verify rationality skills?  At any of the three levels?  Brainstorm, I beg you; even a difficult and expensive measurement can become a gold standard to verify other metrics.  Feel free to email me at [email protected] to suggest any measurements that are better off not being publicly known (though this is of course a major disadvantage of that method).  Stupid ideas can suggest good ideas, so if you can't come up with a good idea, come up with a stupid one.

Reputational, experimental, organizational:

  • Something the masters and schools can do to keep it real (realistically real);
  • Something you can do to measure each of a hundred students;
  • Something you could use as a test even if people have an incentive to game it.

Finding good solutions at each level determines what a whole field of study can be useful for—how much it can hope to accomplish.  This is one of the Big Important Foundational Questions, so—

Think!

(PS:  And ponder on your own before you look at the other comments; we need breadth of coverage here.)

 

Part of the sequence The Craft and the Community

Next post: "Why Our Kind Can't Cooperate"

Previous post: "Schools Proliferating Without Evidence"

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 12:29 PM
Select new highlight date
Rendering 50/227 comments  show more

Occasionally, well-respected community members could say things that are intentionally false, but persuasive and subtle, a la http://www.overcomingbias.com/2008/02/my-favorite-lia.html.

You get points for catching these mistakes. Perhaps you submit your busts privately to some arbiter so others have the same challenge.

Later, the error is revealed and discussed.

This would also have the benefit of causing everyone to read the most-respected members' writings ultra-critically, rather than sitting back and being spoon-fed.

One key thing this idea has is short term feedback. Frequent, rapid feedback is essential for getting good at this kind of thing. (IMO that's why economics is still so useless relative to the other sciences: the experiments take fifty years to run.)

Well, you asked for DUMB ideas, so here's mine. It has the advantage that I'm sure no one else will suggest it. This is based on an accidental discovery (so far as I know, unpublished) that one can compare two arbitrary documents for similarity (even if they are in different word-processor formats) by running them both through a recognizer built out of a random state machine and comparing bit masks of all the states traversed. The more common they are, the more states will be traversed in both.

So, lets assume we have a panel of highly rational individuals which are our control group. We generate a random multiple-choice questionnaire consisting of nonsensical questions and answers. Things like:

1) How Green is the Smell of Bacon?

a) 7.5

b) Neon

c) Introspection

d) Larger

You then do a correlation over how your panel of experts chose their answers and see if there is a common pattern. You then score students who take the test based on how similar to the common pattern they are.

Assuming this idea works at all, the advantage of this is that it would be extremely difficult to game. The disadvantage would be that it would penalize those who are significantly more rational than the 'norm'. It would probably also require the panel to be similar to each other in cognition. There is also the general problem of not knowing if you're really testing for what you think you're testing.

Frankly, I don't know if I'd be more happy if this was tested and shown to be workable, or if it turned out to be a really stupid idea.

NOT CRAZY ENOUGH! We need EVEN STUPIDER ideas!

(Voted up for being the best try so far, though.)

People tend to compartmentalize. We need to bear in mind that anything we come up with that involves testing someone when they know they're being tested can only check how rational they can be if they put their mind to it, not how rational they are when they're not being tested.

For 'hot' political and religious biases, create materials in which apparent advocates of different ideologies or parties are arguing for some particular empirical prediction, e.g. about the relationship between different tax rate changes and economic growth, with some predictions being right and some wrong. The subject then needs to make his or her own prediction about some easily-verifiable but obscure empirical fact related to the argument, e.g. whether a graph of GDP and tax rates matches Norway or Iceland.

Scoring would reflect the degree to which the ideological affiliation in the prompt biased the results. If it was being gamed you might need to add in scoring for accuracy. Challenges would be producing a large enough inventory of test items, keeping them secret, and the need to tailor tests to locally popular ideologies or ideologies of interest.

More surveys that study the relationship between knowledge about verifiable facts and values. What sorts of information do those with different values tend to have, and what are the values of those whose knowledge covers the pet facts of all camps? There is a fair amount of this literature in political science aimed at the electorate and its political knowledge, but it would be good to extend it to other topics, e.g. scientific ones.

Announced probability distributions (not just predictions, so as to enable better scoring) for the results of upcoming experiments. For instance, we know that in the next 2-3 years we are going to get a huge amount of genomic data that will answer a lot of questions about the genetic architecture of human diseases. Making public quantitative predictions about things like that could be quite informative.

Organize large games/contests where a lot of candidates are locked up in an area, and have a finite time to reach a certain point / find a certain object.

The exact rules would be specially designed each time for that years challenge, by a group of rationalists and game designers. So the details would vary, but some common themes would be:

  • physical prowess does not come into play (beyond maybe moving around faster, not getting tired as easily etc.)
  • some people would be liars / saboteurs, and not real candidates

For example, the candidates are blindfolded and brought into a large underground circular room, whose only unlocked exits are twenty slides along on the edge (so, one-way exit only). The goal is to take the exit that's due north.

Or, the players are dropped in a maze, and each player is given twenty balls with his name written on them. In the maze are tall glass tubes in which the player can drop their balls. The players know that at the end of the games everyone gets points for the balls with his name that are in "good" tubes (from 10 to 1 points, depending on whether his ball is at the bottom or top - only ten balls fit in a tube), and loses points for balls in "bad" tubes (whatever it's position). There are also neutral tubes. On the tubes are various signs and portents, and on the walls are statements about the the meanings of the signs ("about 10% of good tubes have red triangles", "two squares of the same color cancel out", "a blue triangle means that there's a bad tube close to this one"). The players have 30 minutes to place their balls.

Additional twists:

  • there are in fact several simultaneous games taking place, in the same place, but the rules are such that it's very difficult to tell who's part of which game (for example, if some players' goal is to unmask/identify other players)
  • the goal may not be reachable at all (no candidates accepted this year). The "global" rules of the contest might include that there must be a certain probability each year (10% ?) that the contest is impossible.
  • candidates are not alone but in teams

... well, there is plenty of inspiration to take from board games and TV shows. And many factors of those can be controlled by careful design (importance of luck or of trivia knowledge, how much "herd behaviour" can come into play, etc.). The games should be more complicated than what's said above, and contain many red herrings. The designers should try to introduce as much sources of bias and irrationality as possible.

Voted up if only because this reads like a description for the first reality TV show I would actually want to watch.

I think that the most important skill a rationalist can have is the ability to assess the quality of other rationalists, and to participate effectively in team projects. A measurement of individual rationality has to include how well a randomly selected team including that individual performs on team rationality tests.

So, I think that a rationalist 'decathlon' would consist of a variety of competitions between individuals and small teams including math/logic problems, general knowledge tests, cooperative and non-cooperative game theory games, prediction markets, and engineering challenges (egg drops, programming robots to compete in some arena, etc.)

But then there would be a second level, in which individuals and teams would compete in a prediction market in which they observe (by video recording) the deliberations of other teams on first-level problems and bet on their relative performance.

And even a third level, in which individuals observe the deliberations of second-level teams and bet on their performance in that second-level prediction market.

There are a variety of other things that might be interesting to measure - for example, what team sizes perform best, whether individual rationalism and team-participant rationalism are different skills, and whether team performance is best predicted by strongest member, average member, or weakest member.

Hrm... Well, one initial notion I have is along the lines of this: Rationality training should improve how good one can become at other stuff, or at least improve ability to gain skills/etc in other fields.

So, maybe tests could be something along the lines of find various subjects/fields a student is unfamiliar with and basically assign them to "get some knowledge and skill in this field."

How efficiently students can basically bootstrap up into something they're unfamiliar with should vary with their rationality, right? So something like this may be a starting point.

(Yes, I can see a bunch of details that would need to be worked out, but seems to be that this notion may at least be somewhere to start for developing rationality tests.)

Carry around a notepad, form probabilistic opinions on lots of little questions that you can find out the answer to soon after, record all the probabilities assigned to correct answers, where applicable add tags like "politics", "project completion", "my social status", "trivia", put into a spreadsheet or something and see if you're miscalibrated globally and for different tags.

Compile a large enough database of historical events that nobody could memorize more than a fraction of it. For the test, choose a few events at random, describe the initial conditions and ask the candidate to predict the outcomes.

Here's a stupid idea: Evaluate people by auditing their domiciles. I've read (and from personal experience, I believe it) that you get really solid insight into someone's personal qualities by inspecting their home, as good as interviewing them and all of their friends and family. (I googled a bit, but I can't find the source.)

Anyway, it can probably be gamed.

Give the students sodium pentothal and ask if they're one of the top 50% of rationalists in their school. However many out of 200 say 'no', that's the school's percentage score. Schools scoring over 100% are thrown out for cheating.

There are two problems with measuring rationality, one of which is difficult but manageable, the other of which might be insurmountable. The first problem is that most conceivable tests of rationality require using information from other fields (such as finance, physics, or psychology), such that you can gain a considerable advantage on the test by studying things from that field which don't actually make you more rational. This can be solved with sufficient cleverness.

The second problem is that how rational someone is depends on how well they maintain it under stress. Pressure, fatigue, emotionally charged situations, alcohol, and/or deliberate manipulation, can make the best rationalists act completely insane. (About a year ago, I went on a reality television show, which was in a way like a series of rationality tests. I didn't do all that well, rationality-wise, but some people who should have known better did dramatically worse.)

Ask a thousand married rationalists of a given school to estimate the probability that their spouses have cheated on them. Confidentially ask their spouses if they have. Measure group calibration.

ETA: This applies to any potentially painful, but verifiable question. Ask them to draw a probability distribution over their date of death, or the longevity of their marriages. Estimate the probability of various kinds of cancer appearing over the next (5,10,15) years, etc. etc.

Here's an immoral one: crack a rationalist

Most, if not all, human minds are vulnerable to hacking, eg by cults, religions, pseudoscience, etc. The minds of rationalists should be harder to hack than others.

Make a copy of a (would-be) rationalist, subject the copy to significant emotional stress, and then send missionaries his way.

The myths carried by the missionaries should be invented for the challenge so everyone can agree that they are false, but should, of course, be significantly more plausible than today's religions.

Make a copy of a (would-be) rationalist, subject the copy to significant emotional stress, and then send missionaries his way.

Moral qualms aside, we should probably have a back-up plan just in case we don't solve human uploading before we want to start testing.

Frank Mager, in various books, including "Preparing Instructional Objectives", suggests working backward from evidence that would make you conclude that someone is, e.g. a Bayesian Master Rationalist, to the tests (and instructional objectives) for a course of instruction intended to turn someone into a Bayesian Master Rationalist (or whatever you want to turn them into).

Clearly real life achievement correlates well with rationality, by definition. So an impractical but "gold standard guaranteed" test of rationality would be to wait until the person in question got to the age of, say, 50, and check to see whether they had made lots of money, or achieved other obvious life goals (fame, for example).

A more specific good test of rationality is the world of startups. Other than the OB/LW community, the entrepreneurial world is the closest to perfect rationality I have found. You could test someone in a month or so by asking them to enter a startup competition, like this:

http://www.cue.org.uk/5k-Challenge

Again, probably not so practical.

The kind of challenges you find on Alan Sugar's "The Apprentice" are fairly rationality oriented, and administrable in a few days.

Potential rationalists could be tested by putting them in the position of a venture capitalist/angel investor, and have a combination of real business and fakes come to pitch at them. This test could be made harder by supplying some of the fakes with convincing cover stories and the real businesses with poor presentation skills, forcing the rationalist to concentrate on the merit of the underlying idea rather than superficial clues. It could be made a better test by allowing participants to do their own research beforehand, and giving them the opportunity to hire and consult "experts" some of whom would again be fakes.

In general, rationality tests have to take a long time IMO, because genuinely creative thinking takes a long time at a serial speed of ~10Hz. Any "test" that takes a few hours (like an exam) is just going to be regurgitation of previously memorized material, or activation of previously trained-up hardwired circuits, a la "cached thoughts". It seems to me that a day long test is the absolute minimum.

Of course you could also administer a classic academic style exam on the OB material, cognitive biases, etc. But someone could do very well on that without really understanding it. Still, it would provide some indication of real life rationality performance.

I'm not sure why "teaching to the test" is so disparaged for its effects on the learning process. Obviously that is a different use for tests than evaluation of ability, as is the main goal here.

Studying for the LSAT taught me to feel genuine physical unease when I read a bad argument, then be calm it by the next problem. It's very hard to turn that off when reading the newspaper.

The third stage of my growth as a rationalist was discovering this site. I no longer go through the day thinking of things I read and hear: "Wrong (fallacy), wrong (incorrect premise), wrong (fallacy), true (but irrelevant)." Now it's more like: "Wrong (fallacy), not even wrong (internally inconsistent), wrong (map/territory confusion), wrong (fallacy), not even wrong (argument from definition)."

I propose thinking of ways to hijack the human mental machinery as an alternative to overcoming it, akin to what evolution does.

"Piggyback" on other tests: ask people taking part in various tests (standardized exams, sport competitions, driving lessons, programming contests, art exhibitions - whatever) their chances of success (or their probability distribution over the range of results).

The other items should themselves be important enough, so it would fit well with a university cursus, so that it can be "automated" for a lot of things. The way of asking for predictions should be made so as to maximize bad predictions: for example the students are asked to give estimations in front of their peers (if that's shown to get them to overestimate), but afterwards not reminded of the prediction they gave nor of whether it came true (so that they don't deliberately try to make it come true).

It could also be extended to other events like "when I'll turn in my thesis" or even "whether I'll be single in a year" or "how much I'll weight in six months".

The more subjects they have to estimate on, the better. At the end, measure the Bayes-score.

This could be combined to some more "dramatic" and explicit rationality tests (see the other comments) to constitute the scoring method of a university rationality course. The explicit rationality tests would also help take a bit of attention away from the day-to-day probability estimates on exams and stuff, to diminish the "only rational when deliberately thinking about it" phenomenon.

Oh, also - ask the students for an estimate before the exam and after the exam (but before they have a chance of talking to someone else). Maybe even a week before and a week after too.

There is a recent trend of 'serious games' which use video games to teach and train people in various capacities, including military, health care, management, as well as the traditional schooling. I see no reason why this couldn't be applied to rationality training.

I always liked adventure style games as a kid, such as King's Quest or Myst, and wondered why they aren't around any more. They seemed to be testing rationality in that you would need to guide the character through many interconnected puzzles while figuring out the model of the world and how best to achieve the goals of the protagonist. It seems like the perfect video game genre for both developing and testing rationality skills.

Specifically, I've thought of a microcosm of the real world, taking place in a different setting yet similar enough to our real world that there would be analogues to religion, science, politics, etc. As you progress through the game, say from child to adult, you learn about the world and see how different beliefs and strategies effect the game. Players would encounter similar challenges to the real world but be disconnected enough not to put up a defense mechanism, yet involved enough to care about the outcome. Add MMO et al features to taste.

I'm not sure if this has already been said, but does the "biases" literature not already contain a lot of perfectly good (although probably overly game-able) rationality tests? Just pick an experiment at random from Tversky and Kahneman and see how well the people in the school do.

Of course, there is a problem of people learning how to do some of these tests, but I'm pretty sure there are some that could be reworked so that they're pretty damned hard to pass even if you're well-acquainted with the literature. I'm thinking particularly those where half of the subjects are asked a different question to the other half, and the results compared - e.g., tests for the Lake Wobegon effect, for Social Attribution Bias, etc.

Use small-scale, limited-term betting markets with play money.

Put the group of people you want to rank relative to each other into a room - without internet access. Everyone starts with 0 points. People are ranked on how many points they have at the end of the test.

Participants make bets (for points) with each other. There's a time limit for settling those debts; all bets made have to be specified in a way that clearly determines the winner within a fixed period after the end of the test. Of course, bets that can be settled immediately (e.g. on current trivia, history or fiction) are also permissible.

Aside from that, there's no limits: Any time two participants agree they want to bet against each other, on whatever they specify for however many points they choose, they can register that bet.

For instance, Alice and Bob bet on the temperature as reported by for at 6:00 local time, monday after the test:

  • Bob will pay Alice 5 points if the temperature is at most 20 degree Celsius
  • Otherwise, Alice will pay Bob 20 points.

After enough time has passed for all bets to be settled, have a trusted third party determine the winner for each, tally up the points and rank participants by final score.

This game is absolute zero-sum: the only way to earn points is by taking them from another participant. Test runs and outcomes can be published without obviously weakening the idea: If there's something to be learned from previous rounds, all participants have a chance to learn it.

Studying obsesssively on certain subjects may help you, but only to the point that other participants don't know you've done it: If everyone knows that you are a major Higurashi no Naku Koro ni fan, they're unlikely to bet against you on that subject - or if they do, they won't bet very much.

Edit: Thinking about this some more, this kind of test has a failure mode: There's a strong incentive not to bet against people who are better at tests like this than you, so with sufficient information about the players the entire game may freeze up: For every possible bet, there's somebody who expects to end up worse off, no bets get made and everyone always walks out with 0 points.

Possible solution: Keep participants anonymous to each other during each test. If nobody knows who they're playing against, there's a higher chance they'll be willing to make some bets.

Well, there's always the idea of using fMRI scans to determine if someone is thinking in 'rational' patterns. You stick them under the machine and give them a test. You ignore the results of the test, but score the student on what parts of their brains light up.

(haven't looked through comments, so this may have been suggested many times over)

In a college-level rationality course, it would be most appropriate for a portion of the grade to be determined by an artificial economy. That is, set up a currency and a (relatively even) starting distribution, add (probabilistic) opportunities for investment (perhaps linked to other important parts of the course) and, most importantly, make defection possible, anonymous and easy. Make it, as much as possible, like a vast array of one-shot (or known number of iterations) Prisoner's Dilemmas.

Then allow students to organize into institutions with rules. Well-taught rationalists should be able to construct a very strong economy along these lines; poorly-taught ones will be only rational enough not to cooperate out of an irrational sense of honor. A student's final grade on that component will be the logarithm of their final wealth, curved as little as possible.

It would take a well-designed setup, of course, to ensure that we're truly measuring rationality and not (say) merely group cameraderie; but I think it could be worked out in a satisfactory way.

The main upshot of this as regards rationality verification: if two different rationality curricula run the same economy setup, a consistently better growth rate of one class economy is evidence of the second kind that more complete rationality is being taught. The students have a much bigger incentive towards their own grade than towards the reputation of the class, so it should be a pretty decent test.

I'm tempted to say "have them play poker", except it uses lots of domain-specific knowledge as well as general rationality. Perhaps if you could generate random games from a large enough space that people don't build up game-specific skills, and the games just end up testing general rationality? While poker-like games don't test all aspects of rationality, there are some things like "ability to keep making good decisions when frustrated / bored / angry" that these games test very well.

I think people would develop skill at the whole class of games...but at the same time, they would be improving their rationality.

Maybe there is a simple thing, which rational people can't do - always get wrong.

Some not very good examples could be:

Skipping with closed eyes.

Telling a lie to a stranger without it being discovered

Saying - "Ooops, I' m wrong," quickly enough

Going to church and sitting thru' a whole sermon without getting very very upset

Multi-tasking

Irony

Understanding metaphors metaphorically.......

Yeah... I can't think of any good actual examples either, but maybe we should be trying to falsify rationality, rather than verify it.

Another key feature of [edit] group rationality is the ability to not be swayed by what the social group thinks.

There are simple experiments (though I cannot think of the relevant keywords) where a test subject is put in a room full of confederates, all of whom estimate one line segment to be longer than another when the two lines are in fact the same length.

EDIT: Conforming to the group opinion (on average) increases the probability that you are right, thus improving individual truth-tracking. But adding more conformers to the LW community just screws it over. In the limit of infinitely many perfect conformers, the community would display irreversible belief hysteresis; as soon as >50% of the group believed X, all the conformers would switch to believing X, and they would stay that way.

EDIT: The Asch experiments found that only 25% of people would consistently report the truth their own eyes were telling them. Thus most people don't make good group rationalists.

This comment thread makes clear the need to distinguish between creating good individual rationalists and good group rationalists. Optimizing for group rationality and individual rationality are different tasks.

A gold-plated ignorance prior will be awarded to the first commenter who finds the link for me...

I don't see what I thought were the obvious answers, so here they are. The foundations are elsewhere on the site, but they seemed missing from this list.

Reputational: Expect Bayesian masters to participate in other scientific fields. People who make more discoveries in other fields get more street cred among rationalists, especially when they can explain how rationalism helped them make the discoveries. Obviously, this is a long-term process that doesn't lend itself to improving the art quickly.

Experimental: This one's a two-step process. First, ask a large collection of university professors to insert one lie into each of their lectures a'la http://www.overcomingbias.com/2008/02/my-favorite-lia.html (mentioned in another comment). Have them note which students discover each lie, but don't have that count for any sort of grade (to prevent gaming). Second, sort students randomly into the experimental rationality classes, and/or have the classes "fill up" (with a lottery for seats) to provide a control. Look for whether there's a difference in lie-detection rates between the differently-taught groups.

Experimental #2, much longer term: Track the career outcomes of the students who took each different rationality class. See whether there's a difference in winning between the groups.