Marsh et al. "Serotonin Transporter Genotype (5-HTTLPR) Predicts Utilitarian Moral Judgments"
The whole paper is here. In short, they found a genotype that predicts people's response to the original trolley problem:
A trolley (i.e. in British English a tram) is running out of control down a track. In its path are five people who have been tied to the track by a mad philosopher. Fortunately, you could flip a switch, which will lead the trolley down a different track to safety. Unfortunately, there is a single person tied to that track. Should you flip the switch or do nothing?
Participants with one kind of serotonin transmitter (LL-homozygotes) judged flipping the switch to be better than a morally neutral action. Participants with the other kind (S-carriers) judged flipping the switch to be no better than a morally neutral action. The groups responded equally to the "fat man scenario" both rejecting the 'push' option.
Some quotes:
We hypothesized that 5-HTTLPR genotype would interact with intentionality in respondents who generated moral judgments. Whereas we predicted that all participants would eschew intentionally harming an innocent for utilitarian gains, we predicted that participants' judgments of foreseen but unintentional harm would diverge as a function of genotype. Specifically, we predicted that LL homozygotes would adhere to the principle of double effect and preferentially select the utilitarian option to save more lives despite unintentional harm to an innocent victim, whereas S-allele carriers would be less likely to endorse even unintentional harm. Results of behavioral testing confirmed this hypothesis.
Participants in this study judged the acceptability of actions that would unintentionally or intentionally harm an innocent victim in order to save others' lives. An analysis of variance revealed a genotype × scenario interaction, F(2, 63) = 4.52, p = .02. Results showed that, relative to long allele homozygotes (LL), carriers of the short (S) allele showed particular reluctance to endorse utilitarian actions resulting in foreseen harm to an innocent individual. LL genotype participants rated perpetrating unintentional harm as more acceptable (M = 4.98, SEM = 0.20) than did SL genotype participants (M = 4.65, SEM = 0.20) or SS genotype participants (M = 4.29, SEM = 0.30).
...
The results indicate that inherited variants in a genetic polymorphism that influences serotonin neurotransmission influence utilitarian moral judgments as well. This finding is interpreted in light of evidence that the S allele is associated with elevated emotional responsiveness.
I believe you. Suppose they were just looking for any genotype that might interact, and then there is a chance given all the zillions of genotypes that any genotype will show interaction just due to random chance.
If that's the case, that the interaction occurred by random chance, then you wouldn't expect the interaction to occur in a second experiment. But then -- is it really necessary to do two experiments? One could just look for interactions in the first half of the observations and then see if they are repeated in the second. Even so, you would still have a probability of some false positives but the probability would be smaller. So finally, I expect that any experimentalist working with such data would be aware of all of this and would look for a signal with a false positive probability of less than .05, so it would all be factored in the statistical analyses.
... so in the paper, is the probability of the interaction being significant calculated before or after considering there is some chance that any random genotype might show p<.05?
As I understand them, proper scientific standards demand that researchers pick a hypothesis and then test it using pre-chosen statistical methods, and ideally whether or not the null hypothesis is rejected at this level of significance they report their experimental results by publishing a paper. Yudkowsky suggests that experimenters should be asked to publish papers before they conduct their experiments, to ensure that this good practice is upheld (and also suggests that Bayesian likelihood ratios should be published instead of frequentist statistics).
The (relatively) benign problem that is often seen is the "file drawer" effect where only papers that reject the null hypothesis (i.e. have interesting results) are published. The more serious allegation is that researchers have data-dredged, i.e. conducted an experiment and then looked for plausible hypotheses that they can claim to have "tested" and found a positive result, or run various statistical analyses until they found one that suited them (it's very optimistic to claim that this doesn't happen!) As you say, in this kind of investigation there are bound to be many such spurious results to exploit, particularly in a field where 5% significance is the standard.
I was questioning the likelihood that the researchers really thought they had good cause to conduct an experiment to test the hypothesis that some apparently obscure genotype was causally related to people making utilitarian moral judgements. It doesn't pass the laugh test for me, but that might just be my ignorance.