Inconsistent Multiple Testing Corrections
The Fallacy of Using Family-Based Error Rates to Make Inferences About Individual Hypotheses
During multiple testing, researchers often adjust their alpha level to control the familywise error rate for a statistical inference about a joint union alternative hypothesis (e.g., “H1,1 or H1,2”). However, in some cases, they do not make this inference. Instead, they make separate inferences about each of the individual hypotheses that comprise the joint hypothesis (e.g., H1,1 and H1,2). For example, a researcher might use a Bonferroni correction to adjust their alpha level from the conventional level of 0.050 to 0.025 when testing H1,1 and H1,2, find a significant result for H1,1 (p < 0.025) and not for H1,2 (p > .0.025), and so claim support for H1,1 and not for H1,2. However, these separate individual inferences do not require an alpha adjustment. Only a statistical inference about the union alternative hypothesis “H1,1 or H1,2” requires an alpha adjustment because it is based on “at least one” significant result among the two tests, and so it refers to the familywise error rate. Hence, an inconsistent correction occurs when a researcher corrects their alpha level during multiple testing but does not make an inference about a union alternative hypothesis. I discuss this inconsistent correction problem in this new article.
To be clear, I am not opposed to an alpha adjustment for multiple testing under the appropriate circumstances. Hence, my article is not an “anti-adjustment article” (Frane, 2019, p. 3). It is a pro-consistency article! My key point is that researchers should be logically consistent in their use of multiple testing corrections. If researchers use multiple testing corrections, then they should make corresponding statistical inferences about family-based joint hypotheses. They should not correct their alpha level and then only proceed to make statistical inferences about individual hypotheses because such inferences do not require an alpha adjustment (Armstrong, 2014, p. 505; Cook & Farewell, 1996, pp. 96–97; Fisher, 1971, p. 206; García-Pérez, 2023, p. 15; Greenland, 2021, p. 5; Hewes, 2003, p. 450; Hurlbert & Lombardi, 2012, p. 30; Matsunaga, 2007, p. 255; Molloy et al., 2022, p. 2; Parker & Weir, 2020, p. 564; Parker & Weir, 2022, p. 2; Rothman, 1990, p. 45; Rubin, 2017, pp. 271–272; Rubin, 2020a, p. 380; Rubin, 2021a, 2021b, pp. 10978-10983; Rubin, 2024; Savitz & Olshan, 1995, p. 906; Senn, 2007, pp. 150-151; Sinclair et al., 2013, p. 19; Tukey, 1953, p. 82; Turkheimer et al., 2004, p. 727; Veazie, 2006, p. 809; Wilson, 1962, p. 299; for the relevant quotes and links to these articles, please see Appendix B here).
Multiple testing increases the probability that at least one of your significant results is a false positive, but it doesn’t increase the probability that each one of your significant results is a false positive, and so if you make an inference about a joint null hypothesis that can be rejected following at least one significant result, then an alpha adjustment is necessary, and if you don’t, it isn’t!
Based on a review by García-Pérez (2023), I argue that inconsistent corrections are likely to be very common. I also point out that inconsistent corrections lead to a loss of statistical power. If a researcher adjusts their alpha level below its nominal level to account for multiple testing but only makes statistical inferences about individual hypotheses and not about a joint hypothesis, then they will have lowered the power of their individual tests for no good reason. Consequently, their Type I error rate will be unnecessarily low, and their Type II error rate will be unnecessarily high (see also García-Pérez, 2023, p. 11). I illustrate these issues using three recent psychology studies.
I conclude that inconsistent corrections represent a symptom of statisticism - an overgeneralization of abstract statistical principles at the expense of context-specific nuance and caveats. In response, I argue that we should adopt an inference-based perspective that advocates an alpha adjustment in the case of inferences about intersection null hypotheses but not in the case of inferences about individual null hypotheses.
Further Information
The Article
Rubin, M. (2024). Inconsistent multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses. Methods in Psychology, 10, Article 100140. https://doi.org/10.1016/j.metip.2024.100140
Related Work
García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology, 8, Article 100120. https://doi.org/10.1016/j.metip.2023.100120
Rubin, M. (2021). There’s no need to lower the significance threshold when conducting single tests of multiple individual hypotheses. Academia Letters, Article 610. https://doi.org/10.20935/AL610
Rubin, M. (2021). When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing. Synthese, 199, 10969–11000. https://doi.org/10.1007/s11229-021-03276-4
Love it. Related to an idea I've just started to work on, the seeming tension between pre-registration and MHC. Practitioners are hesitant to pre-register additional tests because the redundant correction you discuss is indeed the status quo. But this is obviously bad for science, especially in the case of an expensive field experiment: researchers testing *fewer* outcomes because of the power costs, even though the marginal resource cost of these tests is trivial compared to the experiment itself.