Open thread, 23-29 June 2014

Previous open thread

 

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one.

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Comments

sorted by
magical algorithm
Highlighting new comments since Today at 12:48 AM
Select new highlight date
All comments loaded

EDIT: I've removed this draft & posted a longer version incorporating some of the feedback here at http://lesswrong.com/lw/khd/confound_it_correlation_is_usually_not_causation/

I would prefer posts like that to stand on their own in discussion and not be posted in an open thread.

Hi, gwern it's awesome you are grappling with these issues. Here are some jambling responses.


You might enjoy Sander Greenland's essay here:

http://bayes.cs.ucla.edu/TRIBUTE/festschrift-complete.pdf

Sander can be pretty bleak!


But does the number of causal relationships go up just as fast? I don't think so (although at the moment I can't prove it).

I am not sure exactly what you mean, but I can think of a formalization where this is not hard to show. We say A "structurally causes" B in a DAG G if and only if there is a directed path from A to B in G. We say A is "structurally dependent" with B in a DAG G if and only if there is a marginal d-connecting path from A to B in G.

A marginal d-connecting path between two nodes is a path with no consecutive edges of the form -> <- * (that is, no colliders on the path). In other words all directed paths are marginal d-connecting but the opposite isn't true.

The justification for this definition is that if A "structurally causes" B in a DAG G, then if we were to intervene on A, we would observe B change (but not vice versa) in "most" distributions that arise from causal structures consistent with G. Similarly, if A and B are "structurally dependent" in a DAG G, then in "most" distributions consistent with G, A and B would be marginally dependent (e.g. what you probably mean when you say 'correlations are there').

I qualify with "most" because we cannot simultaneously represent dependences and independences by a graph, so we have to choose. People have chosen to represent independences. That is, if in a DAG G some arrow is missing, then in any distribution (causal structure) consistent with G, there is some sort of independence (missing effect). But if the arrow is not missing we cannot say anything. Maybe there is dependence, maybe there is independence. An arrow may be present in G, and there may still be independence in a distribution consistent with G. We call such distributions "unfaithful" to G. If we pick distributions consistent with G randomly, we are unlikely to hit on unfaithful ones (subset of all distributions consistent with G that is unfaithful to G has measure zero), but Nature does not pick randomly.. so unfaithful distributions are a worry. They may arise for systematic reasons (maybe equilibrium of a feedback process in bio?)

If you accept above definition, then clearly for a DAG with n vertices, the number of pairwise structural dependence relationships is an upper bound on the number of pairwise structural causal relationships. I am not aware of anyone having worked out the exact combinatorics here, but it's clear there are many many more paths for structural dependence than paths for structural causality.


But what you actually want is not a DAG with n vertices, but another type of graph with n vertices. The "Universe DAG" has a lot of vertices, but what we actually observe is a very small subset of these vertices, and we marginalize over the rest. The trouble is, if you start with a distribution that is consistent with a DAG, and you marginalize over some things, you may end up with a distribution that isn't well represented by a DAG. Or "DAG models aren't closed under marginalization."

That is, if our DAG is A -> B <- H -> C <- D, and we marginalize over H because we do not observe H, what we get is a distribution where no DAG can properly represent all conditional independences. We need another kind of graph.

In fact, people have come up with a mixed graph (containing -> arrows and <-> arrows) to represent margins of DAGs. Here -> means the same as in a causal DAG, but <-> means "there is some sort of common cause/confounder that we don't want to explicitly write down." Note: <-> is not a correlative arrow, it is still encoding something causal (the presence of a hidden common cause or causes). I am being loose here -- in fact it is the absence of arrows that means things, not the presence.

I do a lot of work on these kinds of graphs, because these are graphs are the sensible representation of data we typically get -- drawn from a marginal of a joint distribution consistent with a big unknown DAG.

But the combinatorics work out the same in these graphs -- the number of marginal d-connected paths is much bigger than the number of directed paths. This is probably the source of your intuition. Of course what often happens is you do have a (weak) causal link between A and B, but a much stronger non-causal link between A and B through an unobserved common parent. So the causal link is hard to find without "tricks."


The dependence that arises from a conditioned common effect (simplest case A -> [C] <- B) that people have brought up does arise in practice, usually if your samples aren't independent. Typical case: phone surveys are only administered to people with phones. Or case control studies for rare diseases need to gather one arm from people who are actually already sick (called "outcome dependent sampling.")


Sterner measures might be needed: could we draw causal nets with not just arrows showing influence but also another kind of arrow showing correlations?

Phil Dawid works with DAG models that are partially causal and partially statistical. But I think we should first be very very clear on exactly what a statistical DAG model is, and what a causal DAG model is, and how they are different. Then we could start combining without confusion!


If you have a prior over DAG/mixed graph structures because you are Bayesian, you can obviously have beliefs about a causal relationship between A and B vs a dependent relationship between A and B, and update your beliefs based on evidence, etc.. Bayesian reasoning about causality does involve saying at some point "I have an assumption that is letting me draw causal conclusions from a fact I observed about a joint distribution," which is not a trivial step (this is not unique to B of course -- anyone who wants to do causality from observational data has to deal with this).


what's the psychology of this?

Pearl has this hypothesis that a lot of probabilistic fallacies/paradoxes/biases are due to the fact that causal and not probabilistic relationships are what our brain natively thinks about. So e.g. Simpson's paradox is surprising because we intuitively think of a conditional distribution (where conditioning can change anything!) as a kind of "interventional distribution" (no Simpson's type reversal under interventions: http://ftp.cs.ucla.edu/pub/stat_ser/r414.pdf).

This hypothesis would claim that people who haven't looked into the math just interpret statements about conditional probabilities as about "interventional probabilities" (or whatever their intuitive analogue of a causal thing is).

Good comment - upvoted. Just a minor question:

I qualify with "most" because we cannot simultaneously represent dependences and independences by a graph, so we have to choose. People have chosen to represent independences. That is, if in a DAG G some arrow is missing, then in any distribution (causal structure) consistent with G, there is some sort of independence (missing effect).

You probably did not intend to imply that this was an arbitrary choice, but it would still be interesting to hear your thoughts on it. It seems to me that the choice to represent independences by missing arrows was necessary. If they had instead chosen to represent dependences by present arrows, I don't see how the graphs would be useful for causal inference.

If missing arrows represent independences and the backdoor criterion holds, this is interpreted as "for all distributions that are consistent with the model, there is no confounding". This is clearly very useful. If arrows represented dependences, it would instead be interpreted as "For at least one distribution that is consistent with the DAG model, there is no confounding". This is not useful to the investigator.

Since unconfoundedness is an independence-relation, it is not clear to me how graphs that encode dependence-relations would be useful. Can you think of a graphical criterion for unconfoundedness in dependence graphs? Or would dependence graphs be useful for a different purpose?

Hi, thanks for this. I agree that this choice was not arbitrary at all!

There are a few related reasons why it was made.

(a) Pearl wisely noted that it is independences that we exploit for things like propagating beliefs around a sparse graph in polynomial time. When he was still arguing for the use of probability in AI, people in AI were still not fully on board, because they thought that to probabilistically reason about n binary variables we need a 2^n table for the joint, which is a non-starter (of course statisticians were on board w/ probability for hundreds of years even though they didn't have computers -- their solution was to use clever parametric models. In some sense Bayesian networks are just another kind of clever parametric model that finally penetrated the AI culture in the late 80s).

(b) We can define statistical (causal) models by either independences or dependences, but there is a lack of symmetry here that the symmetry of the "presence or absence of edges in a graph" masks. An independence is about a small part of the parameter space. That is, a model defined by an independence will correspond to a manifold of smaller dimension generally that sits in a space corresponding to a saturated model (no constraints). A model defined by dependences will just be that same space with a "small part" missing. Lowering dimension in a model is really nice in stats for a number of reasons.

(c) While conceivably we might be interested in a presence of a causal effect more than an absence of a causal effect, you are absolutely right that generally assumptions that allow us to equate a causal effect with some functional of observed data take the form of equality constraints (e.g. "independences in something.") So it is much more useful to represent that even if we care about the presence of an effect at the end of the day. We can just see how far from null the final effect number is -- we don't need a graphical representation. However a graphical representation for assumptions we are exploiting to get the effect as a functional of observed data is very handy -- this is what eventually led Jin Tian to his awesome identification algorithm on graphs.

(d) There is an interesting logical structure to conditional independence, e.g. Phil Dawid's graphoid axioms. There is something like that for dependences (Armstrong's axioms for functional dependence in db theory?) but the structure isn't as rich.

edit: there are actually only two semi-graphoids : one for symmetry and one for chain rule.

edit^2: graphoids are not complete (because conditional independence is actually kind of a nasty relation). But at least it's a ternary relation. There are far worse dragons in the cave of "equality constraints."

This post is a good example of why LW is dying. Specifically, that it was posted as a comment to a garbage-collector thread in the second-class area. Something is horribly wrong with the selection mechanism for what gets on the front page.

Underconfidence is a sin. This is specifically about gwern's calibration. (EDIT: or his preferences)

Not everyone is in the same situation. I mean, we recently had an article disproving theory of relativity posted in Main (later moved to Discussion). Texts with less value that gwern's comment do regularly get posted as articles. So it's not like everyone is afraid to post anything. Some people should update towards posting articles, some people should update towards trying their ideas in Open Thread first. Maybe they need a little nudge from outside first.

How about adding this text to the Open Thread introduction?

As a rule of thumb, if your top-level comment here received 15 or more karma, you probably should repost it as a separate article -- either in the same form, or updated. (And probably post texts of similar quality directly as articles in the future.)

And a more courageous idea: Make a script which collects all top-level Open Thread articles with 15 and more karma from all Open Threads in history and sends their authors a message (one message per author with links to all such comments, to prevent inbox flood) that they should consider posting this as an article.

As a rule of thumb, if your top-level comment here received 15 or more karma, you probably should repost it as a separate article -- either in the same form, or updated. (And probably post texts of similar quality directly as articles in the future.)

More generally, I'd say that if it's longer than about four paragraphs, it's probably better suited as its own article than as a comment.

Let's talk about the fact that the top two comments on a very nice contribution in the open thread is about how this is the wrong place for the post, or how it is why LW is dying. Actually let's not talk about that.

A simple informal reason for giving a small probability to "A causes B" could be this:

For any fixed values A and B, there is only one explanation "A causes B", one explanation "B causes A", and many explanations "C causes A and B" (for many different values of C). If we split 100% between all these explanations, the last group gets most of the probability mass.

And as you said, the more complex given field is, the more realistic values of C there are.

I think that deserves its own post.

It would be interesting to try to come up with good priors for random causal networks.

You are leaving out one type of causation. It is possible that you are conditioning on some common effect C of A and B.

You may argue that this does not actually give a correlation between A and B, it only give a correlation between A given C and B given C. However, in real life, there will always be things you condition on whenever you collect data, so you cannot completely remove this possibility.

Regarding the psychology of why people overestimate the correlation-causation link, I was just recently reading this, and something vaguely relevant struck my eye:

Later, Johnson-Laird put forward the theory that individuals reason by carrying out three fundamental steps [21]:

  1. They imagine a state of affairs in which the premises are true – i.e. they construct a mental model of them.

  2. They formulate, if possible, an informative conclusion true in the model.

  3. They check for an alternative model of the premises in which the putative conclusion is false.

If there is no such model, then the conclusion is a valid inference from the premises.

Johnson-Laird and Steedman implemented the theory in a computer program that made deductions from singly-quantified assertions, and its predictions about the relative difficulty of such problems were strikingly confirmed: the greater the number of models that have to be constructed in order to draw the correct conclusion, the harder the task [25]. Johnson-Laird concluded [22] that comprehension is a process of constructing a mental model, and set out his theory in an influential book [23]. Since then he has applied the idea to reasoning about Boolean circuitry [3] and to reasoning in modal logic [24].

I hate cauliflower! But a few days ago a restaurant gave me some pickled cauliflower with my salmon dinner. Not recognizing the cauliflower for what it was, I tried and greatly enjoyed the vegetable. I liked it so much that I asked my waitress what the food was so I could be sure to get more of it in the future. Alas, as soon as the waitress told me that the food was cauliflower, I experienced a horrible nauseating after-taste from having consumed the cauliflower, reinforcing my disgust of this vile weed.

A session I'm planning on organising for one of the London meetups in August: filling procedural knowledge gaps.

A motivating example is setting up investment in an index fund. At our last meetup, there was some division in the group between people who'd already done this and found it very straightforward, vs. others who'd started looking into it but found it prohibitively difficult. The blame was squarely placed on procedural knowledge gaps, and the proposed solution was someone assembling a 15-minute talk on index funds, along with a step-by-step guide to explain all the actions required to get from not having an index-fund product to having one.

A few other examples and suggestions were proposed, but I'd like to invite suggestions from the LW-readership. Is there anything Less Wrong has convinced you it's a good idea to do, but for which you don't know the next actionable step in doing?

Some areas that I have gaps in that I would like to fill with actionable steps:

  • Networking

  • Finding a reputable lawyer (for just in case)

  • Signing up for cryonics

  • Index funds is a great one

the proposed solution was someone assembling a 15-minute talk on index funds, along with a step-by-step guide to explain all the actions required to get from not having an index-fund product to having one.

Oh, please make it also a text. With pictures (activating near mode thinking).

Alternatively, a LW wiki page. (Then it's okay without pictures.)

Not unexpectedly, the mysterious mass-downvoter was lying low during the recent discussion of the issue, but is back at it now.

Apologies for political content. It shouldn't hurt too much.

Many of you are probably familiar with Upworthy, which uses a system of A/B testing to find the most link-baity headlines for left-wing/progressive social media content before inflicting them on the world. My typical reaction to such content was "this would seem amazingly insightful if you hadn't thought about the issue for ten minutes already". Last year, shortly before blocking all such posts on my Facebook feed, I remember wondering what the right-wing equivalent would be.

I've recently become aware of Britain First, a British Nationalist political group. They have a Facebook page which is the result of a deliberated, professional social media campaign, luring people in with schmaltzy motivational images and kitten pictures before sprinkling on a bit of "send the darkies back". It's not on the same level of technical sophistication or scale as Upworthy, but neither is its intended audience (c.f. my mum and dad).

This makes me wonder, where is the clever social media presence of all the sane stuff? There's a recognisable cluster of "actually, I think you'll find it's a bit more complicated than that" to which Less Wrong belongs, but I rarely see it being endorsed in the same way as either of the two above examples. Is it just too difficult to spin a populist narrative for this sort of material? Is there an underlying cooperation problem? Or is it there hasn't been a concerted effort yet?

tl;dr: can we raise the sanity waterline with clever use of social media?

Have you heard the slogan, "The truth is too complicated to fit on a bumper sticker"? I'd wholeheartedly endorse that if it's brevity didn't make me suspicious.

Some truths, like the sunk cost fallacy or the value of sensible communication, might be simple enough to fit in a brief funny video.

Firstly, it is an outrageous slur to think that the right-wing equivalent of Upworthy is the BNP. The right-wing equivalent is, of course, the Daily Mail, which has so mastered the art of click-baitery as to become the most read newspaper in the world (as of 2013 - may no longer be true).

Secondly, the social media cluster to which Less Wrong belongs is, obviously, Salon, Slate, and works of that ilk. Yes, Caliban, it's true.

And no, you cannot raise the sanity waterline with social media. All you will get is (as the joke goes) people enthusiastically retweeting a study that finally proves what they'd always believed about confirmation bias.

Secondly, the social media cluster to which Less Wrong belongs is, obviously, Salon, Slate, and works of that ilk. Yes, Caliban, it's true.

This seems obviously untrue. Salon and Slate don't seem to have any intellectual or philosophical content other than "left good, right bad." Salon/slate are also borderline buzzfeed clones at this point. Laughably bad.

Does anyone else experience the following problem:

Something reminds you of an event which happened long ago; the event was annoying or created some other negative emotion; and you again feel that annoyance or negative emotion. I get these "annoyance flashbacks" now and then and it seems like they are more frequent now that I calorie restrict.

Any good ideas for dealing with this?

Critical thinking exercise for anyone who wants to try it! I wasted an evening on this, so I thought I'd share it... Here is a Wikipedia article: https://en.wikipedia.org/wiki/Bicycle_face It looks pretty good, well-cited, and is about something amusing. However, it needs to be burned with fire; why?

(Please avoid looking at the article history, and respond in ROT13. If you saw any of the #lesswrong conversation on this article, please refrain from commenting or giving hints.)

V sbyybjrq n srj bs gur fbheprf naq vg frrzrq gb or n fznyy zvabevgl bs ohgguheg qbpgbef engure guna gur "zrqvpny rfgnoyvfuzrag". Vf gung gur chapuyvar?

Pybfr. Vg'f npghnyyl jbefr guna gung: sbe 'fznyy zvabevgl' ernq 'dhvgr cbffvoyl n fvatyr npghny qbpgbe jub qvqa'g rira pner gung zhpu nobhg vg', naq sbe 'frkvfg ercerffvba' gel 'vg jnfa'g traqrerq hagvy srzvavfgf n praghel yngre qrpvqrq gb qb fbzr zlgu-znxvat'. Gur ragver negvpyr vf fubg guebhtu jvgu OF hafhccbegrq ol gur cevznel fbheprf. Sbe rknzcyr, gurer'f mreb rivqrapr ovplpyr snpr qvq nalguvat gb qvfpbhentr jbzra sebz ovxvat, gubhtu gur negvpyr cebpynvzf vg unq n znwbe rssrpg. Vg ybbxf yvxr gur pbaprcg jnf pbasvarq gb bar be gjb bss-unaq zragvbaf naq fbzr arjfcncre tbffvc/pyvccvat pbyhzaf. Jvxvcrqvn jbegul? Jryy, 'jub jubz'...

EDIT: Sbe n zber qrgnvyrq cbvag-ol-cbvag pevgvdhr, frr zl pbzzrag va uggcf://cyhf.tbbtyr.pbz/h/0/103530621949492999968/cbfgf/vK58f8UkL5x

Recently, I started reading The Sound and the Fury by William Faulkner. I am ~50 pages in and I don't understand any of it. The stream of consciousness narrative is infuriatingly hard to read, and the storyline jumps across decades without any warning. What are some techniques I can use to improve my comprehension of the book?

What are you trying to optimize for? Are you sure the experience you're having now isn't the whole point of the thing?

I've begun researching cryonics to see if I can afford it/want to sign up. Since I know plenty here are already signed up, I was hoping someone could link me to a succinct breakdown of the costs involved. I've already looked over Alcor's webpage and the Cryonics Institute, but I'd like to hear from a neutral party. Membership dues and fees, average insurance costs (average since this would change from person to person), even peripheral things like lawyer fees (I assume you'll need some legal paperwork done for putting your body on ice). All the main steps necessary to signing up and staying safe.

Basically, I would very much appreciate any help in understanding the basic costs and payoffs so I can budget accordingly.

I've already looked over Alcor's webpage and the Cryonics Institute

Don't forget Oregon Cryonics.

How do you deal with the risk of people using high-power laser pointers on you? Where I am many "kids" are "playing" with strong laser pointers they ordered over the Internet. The strong ones can easily cause permanent damage or permanently blind you. If this becomes more prevalent, what can we do to protect ourselves?

[LINK] Givewell discusses the progress of their thinking about the merits of funding efforts to ameliorate global existential risks.