If you ever end up moderating a forum, or just becoming deeply involved in the meta section of one like me, it is almost inevitable that you will become involved in disputes over exactly what one is and is not allowed to say on it. You may find these choices can be very difficult as, more often than not, there are no good options. You typically have two choices:

(Note: This post touches on hot-button issues more than I'd like. I'll probably come back later and edit it to avoid these)

  • Libertarian approach to moderation

In this approach, you refuse to get involved in particular kinds of disputes. The standard example is that you may refuse to moderate posts based on political opinions so long as the poster is polite and they stick to all the other rules. Or you may decide not to moderate insults and flaming, so long as no-one engages in doxxing. This approach has a major downside in that if you aren't moderating based on politics, you'll have to let through posts from white supremacists so long as they are polite and if you aren't moderating insults, you'll get a bunch of jerks using the forum. I'm not saying that this approach is flawed, just that this is what is likely to happen.

  • Non-libertarian approach to moderation

In this approach, you concede a need to at least occasionally intervene in a particular kind of dispute such as banning the white supremacists or deleting comments that contain racial slurs. It is really, really good to be able to do this. On the other hand, once you intervene in a particular manner, you create the expectation of intervention. Suppose you've banned the white supremacists, but since people expect you to be consistent you end up extending the ban other hateful ideologies too. Quite quickly, you'll find yourself being forced to render a verdict on a whole host of ideologies, not just the clear cut examples. Before you intervened you could refuse take pick sides in certain disputes, but once you've intervened your decision not to intervene in another situation will be taken to mean that you don't consider a particular ideology hateful. If this is a contentious issue, then people will be unhappy no matter how you decide; often you would much prefer to not have to make a ruling on this issue. Again, I'm not saying that this approach is flawed, just that if you choose it, you will most likely run into this issue.

Broader principle

Even if you don't moderate a forum, you will see this dilemma crop up in many other situations.

  • Suppose you are in charge of a company that is deciding whether or not to have a policy on what its employees post on social media. If you don't have such a policy, they may say horrible things that are offensive and destroy your company's reputation. If you do have such as policy and you choose not to fire someone over something they post, it will be seen as you believing that it is not hateful. So you may actually end up with a worse reputation than if you didn't have a policy at all.
  • Suppose that you on a government panel that is deciding the extent to which you should regulate the safety of toys. You might decide to take a minimal regulatory approach and only ban the most unsafe toys. This would allow to protect a number of people from harm, but you've now also created the expectation that any toy not banned must be safe. You could try writing the opposite on your website or try spending some of your minimal marketing budget informing people, but realistically most people will make this assumption regardless of what you do. You might even end up increasing the total number of injuries by luring consumers into a false sense of security.
  • Suppose you write up a set of rules for your club. This makes it easier to ensure that everyone is aware of them. However, in the absence of any written rules the expectation is that you should use your common sense. When these rules are written down, people will start to assume that if something isn't in the rules, it must be allowed, particularly if it would have been easy to put it in the rules if they had wanted to. So you may actually find that more people end up doing the things that you don't want.
  • During Coronavirus, some governments have made restrictions like banning gatherings of 100 people or more. This has been taken by people as meaning that it's okay run gatherings with 99 people.

This is scary. In many of these cases, it seems incredibly obvious those in power should at least, even if they do nothing else, regulate the worst cases. But quite quickly we see that this isn't as obvious as it sounds.

The mechanism in the Moderator's Dilemma is that refusing to ban a particular action is seen as either implicitly endorsing it, or at least claiming that it is "not bad". If this isn't the case, then you don't run into the Moderator's Dilemma. For example, you may be able to avoid the Moderator's Dilemma if you have a "Historical Schelling Point". For example, if you ban racial slurs there may be disputes over what counts as a slur, but because there is precedent of treating these differently from other things that are offensive, you may be able avoid having to adjudicate a flamewars where no slurs occur.

Relationship to Other terms

This is related to a few other terms:

  • Precedent is a term most associated with legal contexts. Courts try to keep the law consistent so that people can follow it, so once one court rules one way, other courts will tend to rule in a manner consistent with this. However, while Precedent is primarily about consistency and only secondarily about how the rules might later expand, the Moderator's Dilemma is primarily about expansion, but also more about being forced to issue a ruling than about the outcomes.
  • Slippery Slopes are an argument that is often claimed to be a logical fallacy, but which is only poor reasoning if you fail to justify why we are likely to inevitably progress along the slope. The Moderator's Dilemma is a kind of slippery slope, but while the focus of the Slippery Slope is the endpoint and how bad it is, the Moderator's Dilemma is more about being forced to issue a ruling when you don't want to.

Conclusion

I believe that Terminology is Important and so the purpose of this article was to describe a particular situation that I have seen in multiple contexts, but which I did not already have a name for. Group rationality is hard because you can't just look at the immediate effects, but all of the effects that might occur downstream. This is a small attempt to grapple with these problems by identifying one particularly common downstream pattern.

Note: Please try to avoid turning the comments section into a debate on the political examples included in this post. I tried to avoid discussing these topics more than necessary, but these were the clearest examples that I could think of.

New to LessWrong?

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 4:04 PM

I think the bigger issue is trust. Do you trust the moderators to moderate fairly? Do you trust that when the moderators ban something for violating policy, their interpretation of the policy is correct? Do you trust that, when you disagree with the moderators, they know better than you?

If a mod team has the trust of the community, it can use whatever moderation form it likes. If it doesn't have that trust, then any form of moderation will be called out by some as being unfair. If this sounds like politics, that's because it is.

I don't believe the establishment or dissipation of this trust has a lot to do with the specific form of moderation. I think it has more to do (at least in this community) with the moderators being transparent and self-critical about moderation decisions. It also requires people who are willing to stand fast in the face of public criticism without ignoring it or high-handedly proclaiming executive fiat. Policy can help here, but it is no substitute for personality. Focus on choosing people with good judgement, and the correct temperament and the policy takes care of itself. Conversely, choose people with bad judgement or the wrong kind of temperament, and no amount of formal policy will save you.

Is your point that there should be no formal policy in the first place?

Once you do have such a formal policy, the best of judgment doesn't necessarily help you if you can't circumvent the constraints set by these policies.

This is a good perspective from off-site: https://www.shamusyoung.com/twentysidedtale/?p=19709

Excellent article btw:

"And then, in every sample, you’re likely to have a tiny minority of completely batshit crazy moron assholes... A lot of things can put someone into this last group. Maybe they’re performing for attention and don’t care how destructive they’re being. Maybe they have a bunch of pain in their lives and they’re trying to share it. Maybe they were raised in some messed-up abusive environment and aggressive hate is their normal. Maybe they just aren’t very good at communicating. It doesn’t matter. They’re broken, and as a moderator you don’t have the power to fix them. The problem is in their heart, and nothing you say or do can make them care about others.

Most communities are built around the idea of getting these people to behave. This is a mistake. Broken people cannot be fixed by rules. If you make the rules loose, they will find weak spots and exploit them. If you make the rules tight and specific, they will rules-lawyer you to the brink of insanity. They will haggle over the specifics of the rules, and they will insist everyone be held to precisely the same standards. If you let someone else slide, the nut will condemn you as a hypocrite or accuse you of injustice.

... Instead of making rules to compel crazies to behave – which can become a full-time enforcement project – I allow them to act out. And then I ban them. I want to know who the crazy people are, as fast as possible. The sooner they reveal their character, the sooner I can pull them out of the pool before they make a mess. This isn’t hard. Problem People are usually easy to spot.

Now, in the context of an open system like this blog, “banning” doesn’t mean much. People can change personal details and come in as someone new. But so what? If someone assumes a new identity, they still have to pass the sanity test. They still have to behave like a human being. And if a banned person assumes a new identity and then behaves in a civilized manner? That’s not a flaw in enforcement. That’s mission accomplished."

> When these rules are written down, people will start to assume that if something isn't in the rules, it must be allowed, particularly if it would have been easy to put it in the rules if they had wanted to. So more people may end up doing things that you don't want.

A related thing that can happen here is that the people who would otherwise do forbidden things, will now do the closest-possible permitted thing.

Like, everyone likes saying numbers between 0 and 1. Most of your users like numbers close to 0, like 0.1 and 0.03. A few like numbers close to 1, like 0.95. Basically no one says numbers between 0.3 and 0.7.

Team Low convinces you that high numbers are bad, and pushes you to ban people from saying them. You agree, and pick a fairly arbitrary cutoff, say 0.6. You figure it doesn't matter much exactly where the cutoff is, there's a whole range of numbers that are basically functionally identical because nobody uses them.

But now you've got a lot of people saying 0.6, and Team Low isn't very happy about that (0.6 is still pretty high), and Team High aren't very happy either (you've banned their favourite numbers).

It's not just that any toy not banned will be assumed safe. It's that toys will suddenly start edging as close as possible to the safety limits. If you say "food weights must be within 3% of what's on the packaging", then they'll be 3% below.

LARPs (live-action roleplay) have a rule 7: "don't take the piss." This allows enforcers to punish people for that kind of "closest-possible permitted thing." (an extension of the "don't take the piss" rule is "if you have to ask if you're taking the piss, you are").

If we port this back to the moderator's dilemma, this would entail giving the moderator more power to use judgements or to be capricious (and an acceptance of that by the people posting). This gives great power to the moderators, and can easily be abused (that's roughly how Chinese government censorship works); but if the moderators are reasonably trustworthy for that particular forum, and if they are known to be capricious, this can work very well.

> If you say "food weights must be within 3% of what's on the packaging", then they'll be 3% below.

I would guess that setting this sort of regulation takes this into account, and the process for devising it must be accordingly more complicated.

The way the regulatory agency ought to get this number (note: I have no relevant background or experience, so this is all wild guessing) might look something like:

- Estimate the cost to the supplier of determining the weight of a package as a function of average accuracy. For example, it may cost $0.01 per package to determine the weight with a standard deviation of 1%, $0.04 for std dev of 0.5%, etc.

- ‎Estimate the cost to the consumer of inaccurate package weights (which could just be linear, as in a package which is 3% underweight may be 3% undervalue, but for large inaccuracies and certain products - like medicine - can be different).

- ‎Find some balance between costs to the supplier (taking into account, perhaps, that such costs may be different for large suppliers with enough throughput to easily absorb the fixed cost and small suppliers which will face a daunting up-front cost) and the consumer (taking into account, perhaps, that some consumers may be more sensitive than others to inaccuracies).

In such a case, the average inaccuracy wouldn't be 3%; if the supplier had the capability to calculate their weights so exactly that every package was no more or less than 3% underweight, the regulatory agency would presumably insist on 0% inaccuracy. Instead, the supplier would pick an inaccuracy between 0% and 3% such that they would be certain that enough of their packages would be less than 3% inaccurate.

Any supplier which did not do so wisely might get away with it for a time, but would eventually put out a sub-par product and presumably be penalized for it. Even a self-interested person will build a margin of error to ensure that they are never found to be noncompliant. These margins may be different for different suppliers - if the agency makes regulations with the intent of keeping barriers to entry low, it will allow a wider margin of error so as not to exclude potential suppliers without affordable access to super-precise measurement technology, allowing suppliers with it to choose an inaccuracy which is allowed by regulation but still safe for them. While more difficult, a remedy to this issue may be to additionally regulate on intent (e.g. an internal email saying "let's aim for 2% inaccuracy, we can get away with it" would be evidence of wrongdoing) or known ability (e.g. holding suppliers with better measurement capability to a higher standard).

To return to forum posting: rational trolls will not break every unwritten rule while perfectly respecting all of the written ones, because they likely can't be precise enough to do so consistently. Irrational trolls will try, fail, and be punished for it. Some trolls are better at staying close to the line without crossing it than others; moderators may want to hold these folks to higher standards or simply add a provision to punish those they believe to be acting within the rules but without good faith.

This is a good comment. I wonder how much regulator agencies do do that kind of thinking. It's the sort of "here's an interesting problem someone had, and here's the interesting solution" story that I'd be interested in reading about, and would weakly expect to reach high visibility in my social circles if someone wrote an article about it. So I guess this is weak evidence that they don't do that? But I'm certainly not confident one way or the other, and I'm interested in learning more.

You're right about the forum trolls; the 0.6 line is likely to cull some of them, and permitting an amount of arbitrariness in moderation will help cull others. I think the thing I'm trying to point at is still a relevant thing to keep in mind, but it's not as clear-cut as I described.

>I wonder how much regulator agencies do do that kind of thinking.

They might do (civil services are generally more careful than you'd expect), or it may emerge as they adjust their rules in response to what companies do.

I find this easier to parse from a non-neutral perspective: If all bad comments are (currently) overtly bad, you might think we could ban overt bad comments and win at moderation. But in fact, once the ban is in effect, the bad commenters might switch to covert bad comments instead.

The ban isn't necessarily wrong, but this effect has to be considered in the cost-benefit analysis.

>If you say "food weights must be within 3% of what's on the packaging", then they'll be 3% below.

How about "within 3%, and with an average no lower than the printed weight"? That seems it would work for a lot of different levels of precision on the part of the company.

That's the correct solution for food weights, but this is sort of beyond philh's point, which is just that those you govern will adapt their behavior to the rules you put in place.

That may be enough: https://xkcd.com/810/

In US law*, restrictions on speech can be content-based (e.g. banning white supremacist content, even if it's polite) or content-neutral (e.g. banning insults by anyone). I think it maps rather well onto what you're describing and is a better dichotomy than libertarian vs. non-libertarian.

• Source: https://lawshelf.com/courseware/entry/limitations-on-expression

[-]lmn7y-30

In this approach, you concede a need to at least occasionally intervene in a particular kind of dispute such as banning the white supremacists

Another problem with this is what does one mean by "white supremacists"? The definition used by the people who most advocate banning them tends to include anyone who believes in certain statements about the differences between races that are almost certainly true. For example, how race correlates with IQ. This is especially a problem for a forum that wants to be "rational".

Yeah. Two decisions must be made: Do I want to ban ideology X? Is comment Y an example of ideology X?

So even if you could somehow reach a consensus about which ideologies are banned and which are not, users will continue their wars by arguing whether given comment is an example (or a "dog whistle" for) one of the banned ideologies. If you interpret it too literally, everything you banned will come back, only the users will avoid using certain keywords, sometimes openly mocking the moderation system. Interpreting it otherwise will force you to express opinions on thousand topics, and any decision will seem biased in favor of some side.

Sorry for the downvote (because I'm guessing you weren't trying to needlessly debate the specific example you mention), but it seems pretty obvious that any 'formal' policy (i.e. 'laws') will create an opportunity for 'lawyering', e.g. arguing about the definition of specific formal terms.