Suppose you make a general rule, ie. "I won't eat any cookies". Then you encounter a situation that legitimately feels exceptional , "These are generally considered the best cookies in the entire state". This tends to make people torn between two threads of reasoning:

1) Clearly the optimal strategy is to make an exception this one time and then follow the rule the rest of the time.

2) If you break the rule this one time, then you are likely to dismantle the rule and ending up not following it at all, so you should whether the rule is good on the whole, instead of for this particular opportunity

How can we resolve this? For a very, very long time I didn't want to take option 2) because I felt that taking a sub-optimal strategy was irrational. It may seem silly, but I found this incredibly emotionally compelling and I had no idea of how to respond to this with logic. Surely rationality should never require you to be irrational? It took me literally years to work out, but 1) is an incredibly misleading way of framing the situation. Instead, we should be thinking:

3) If future you will always make the most rational decision, the optimal strategy is going to clearly be to make the exception. If there is a chance that it will cause you to fall off the path, either temporarily or permanently, then we need to account for these probabilities in the expected value calculation.

As soon as I realised this was the more accurate way of framing 1), all of its rhetorical power disappeared. Who cares about what is best for a perfectly rational agent? That isn't you. The other key benefit of this framing is that it pushes you to think in terms of probability. Too often I've thought, "I can make an exception without it becoming a habit". This is bad practise. Instead of thinking in terms of a binary "Can I?" or "Can't I?", we should be thinking in terms of probabilities. Firstly, because 0% probability is unrealistic, secondly because making a probability estimate allows you to better calibrate over time.

We can also update 2) to make it a more sophisticated argument as well.

4) If you break the rule this one time, then you risk dismantling the rule and ending up not following it at all. Further, humans tend to heavily biased towards believing that their future selves will make the decisions that they want it to make. So much so, that attempting the calculate this probability is hopeless. Instead, you should only make an exception if the utility gain would be so much that you would be willing to lose the habit altogether.

We still have two different ways of thinking of the problem, but at least they are more sophisticated than when we started.

(This article was inspired by seeing: The Solitaire Principle. I wanted to explain how I've progressed on this issue and I also wanted to have a post to link people to which is much shorter)

New to LessWrong?

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 6:00 AM

I think this schema could benefit from a distinction between rules for internal and external consumption. For external consumption there's some benefit to having (implied) policies for exceptions that are responsive to likely costs, both the internal costs of acting according to the rule in an emergency, and the external cost of having one's expectations violated.

But for internal consumption, it makes more sense to, as Said Achmiz points out, just change the rule to a better one that gets the right answer in this case (and all the prior ones). I think people are confused by this in large part because they learned, from authoritarian systems, to rule themselves as though they were setting rules for external consumption or accountability, instead of reasoning about and doing what they want.

This leads to a weird tension where the same person is sternly setting rules for themselves (and threatening to shame themselves for rule violations), and trying to wiggle out of those same rules as though they were demands by a malevolent authority.

There is another alternative:

Consider even a single exception to totally undermine any rule. Consequently, only follow rules with no exceptions[1]. When you do encounter a legitimate exception to a heretofore-exceptionless rule, immediately discard the rule and replace it with a new rule—one which accounts for situations like this one, which, to the old rule, had to be exceptions.

This, of course, requires a meta-rule (or, if you like, a meta-habit):

Prefer simplicity in your rules. Be vigilant that your rules do not grow too complex; make sure you are not relaxing the legitimacy criteria of your exceptions. Periodically audit your rules, inspecting them for complexity; try to formulate simpler versions of complex rules.

So, when you encounter an exception, you neither break the rule once but keep following it thereafter, nor break it once and risk breaking it again. If this is really an exception, then that rule is immediately and automatically nullified, because good rules ought not have exceptions. Time for a new rule.

And if you’re not prepared to discard the rule and formulate a new one, well, then the exception must not be all that compelling; in which case, of course, keep following the existing rule, now and henceforth.


[1] By which I mean “only follow rules to which no legitimate exception will ever be encountered”, not “continue following a rule even if you encounter what seems like a legitimate exception”.


Edit: I feel like I haven’t made my core point strongly enough, so here’s an addendum.

Why do I say that good rules ought not have exceptions? Because rules already don’t have exceptions.

Exceptions are a fiction. They’re a way for us to avoid admitting (sometimes to ourselves, sometimes to others) that the rule as stated, together with the criteria for deciding whether something is a “legitimate” exception, is the actual rule.

The approach I describe above merely consists of making this fact explicit.

This is an interesting way of thinking about things, but you have to trust yourself not to constantly invent new exceptions. I suppose your ability to remember the rule places a natural limit on length.

That’s what the periodic audit is for.

Remember, the idea is that exceptions don’t exist. You’re not “inventing new exceptions”, you’re reformulating the rule to encompass things that would have to be exceptions to the old rule.

Done this way, it will quickly become evident if you’re gerrymandering the rule so much as to effectively neuter it; that’s when you say “ok, clearly this isn’t working; time to rethink my life”.

That sounds very close to the meta rule of being only allowed to change the rule to more precise rule. So you have the rule of not eating cookies and come across a very special cookie. Making an exception opens the door to arbitrary exceptions. But what about changing the rule to allow only cookies that you have never eaten before? That is clearly a rule that allows this special cookie and also future special cookies, satisfies the culinary curiosity without noticaby impacting the calories.

Making a rule for yourself is like making a rule for someone else, only that someone else is future you. If future you breaks your rules, just like if someone else breaks your rules (e.g. "don't cheat on me"), that weakens the trust between the two of you, and you'll find it harder to cooperate in the future. And it's not much fun not being able to cooperate between your past and future selves: that means you can't make plans expecting future you to follow through on them.

("But darling, clearly the optimal strategy was to cheat this one time and then be faithful the rest of the time...")

And on the flip side of that, if past-you makes the rules so strict that they're practically unfollowable for future-you, that also weakens the trust between the two of you, because future-you will perceive past-you as tyrannical and be less inclined to trust that the rule was actually a good idea in the first place (and therefore be less likely to follow the rules it percieves as not-so-good). So past-you has to earn the trust of future-you by making rules it expects future-you to be able to follow, and future-you has to earn the trust of past-you by following the rules past-you makes.

Yes, exactly. You can get a lot of mileage out of importing intuitions from how to build trust between people to how to build trust between your past and future selves. (For example there's an interesting analogy between labels in romantic relationships and personal identity.)

That's a fascinating way to think about it.

There is a wider picture to consider. If you find yourself faced with a decision whether or not to break a rule or commitment, to yourself or to someone else, it is worth asking how you got into that situation. These things usually do not come out of the blue.

If you committed to be at a certain place at a certain time, and find yourself running late, what happened earlier to make you late? Did you already, unconsciously, drop out on that commitment then? Did you waste time doing things that did not have to be handled at that moment? And if so, how did that happen? And so on.

And if you do find something unsatisfactory about your earlier actions, was that an isolated occurrence, or something you find yourself habitually doing?

Part of the power of committing to a rule, whether it is about a big matter or a small one, is that it gives you an opportunity to practice that exploration every time you bump up against it.

I suppose I've learned to have good enough knowledge of myself to often feel "I currently want to break a rule because I feel like I have a good enough excuse. If I make an exception to the rule in this situation, how many other times will I feel like I have an excuse equivalently as good? If it's so often that the rule will cease to be useful, then I can't do it now (alternatively I will update that it's a silly rule I shouldn't have wanted to follow in the first place). However if it's legitimately a very rare reason, then that's acceptable."

This is the standard Functional Decision Theory approach, and I've found it useful in dieting.

There was a LW post on this by Scott that I found useful.

I like this as another take on the standard LessWrong question of when to make exceptions to your habits/rules (Scott, Zvi). Deriving important ideas for yourself is super valuable and I like this new way of framing it. Also this is short and readable - for these reasons, I've curated this post.

Another thing that it is possible to do is to go ahead and break the rule, and not change the rule, nor call this an exception.

Somewhat related to that is: how do you handle noticing after the fact that you broke a rule, but didn't notice at the time?

I formed a similar argument around vegetarianism; I predicted that it is easier for me to draw a hard line than it is to reconsider that line on a case by case basis. Rational me is more than capable of distinguishing between lobster and cow, but there is a lot of power in being able to tell myself to just eat the things with the label.

This is an extreme overapproximation but, given the moral stakes and my general unreliability, the successful results seem sufficient justification.

given the moral stakes and my general unreliability

This is a very strange thing to say. I have never felt that I can't rely on myself not to violate my own important moral rules. Perhaps if I was an addict, but even then, I expect that having violated my moral rules I would realize that those rules weren't really important to me.

I am a very strong satisficer, in direct conflict to my moral system which would rather I maximise, so I live under the general understanding that I'm very far from my ideal.

Suppose you make a general rule, ie. "I won't eat any cookies". Then you encounter a situation that legitimately feels exceptional

This probably means that you're bad at making rules, i.e. that the rule you made was based on an oversimplified model of the world.