Preregistration, Severity, and Deviations
Preregistration does not improve the transparent evaluation of severity in Popper’s philosophy of science or when deviations are allowed
Preregistration Distinguishes Between Exploratory and Confirmatory Research?
Previous justifications for preregistration have focused on the distinction between “exploratory” and “confirmatory” research. However, as I discuss in this recent presentation, this distinction faces unresolved questions. For example, the distinction does not appear to have a formal definition in either statistical theory or the philosophy of science. In addition, critics have questioned related concerns about the “double use” of data and “circular reasoning” (Devezer et al., 2021; Rubin, 2020, 2022; Rubin & Donkin, 2024; Szollosi & Donkin, 2021; see also Mayo, 1996, pp. 137, 271-275; Mayo, 2018, p. 319).
Preregistration Improves the Transparent Evaluation of Severity
Lakens and colleagues provide a more coherent justification for preregistration based on Mayo’s (1996, 2018) error statistical approach (Lakens, 2019, 2024; Lakens et al., 2024; see also Vize et al., 2024). Specifically, Lakens (2019) argues that “preregistration has the goal to allow others to transparently evaluate the capacity of a test to falsify a prediction, or the severity of a test” (p. 221).
A hypothesis passes a severe test when there is a high probability that it would not have passed, or passed so well, if it was false (Mayo, 1996, 2018). A test procedure’s error probabilities play an important role in evaluating severity. In particular, “pre-data, the choices for the type I and II errors reflect the goal of ensuring the test is capable of licensing given inferences severely” (Mayo & Spanos, 2006, p. 350). For example, a test procedure with a nominal pre-data Type I error rate of α = 0.05 is capable of licensing specific inferences with a minimum “worst case” severity of 0.95 (i.e., 1 – α; Mayo, 1996, p. 399).
Importantly, “biasing selection effects” in the experimental testing context (e.g., p-hacking) can lower the capability of a test procedure to license inferences severely by increasing the error probability with which the procedure passes hypotheses. From this error statistical perspective, preregistration allows a more transparent evaluation of the capability of a test procedure to perform severe tests. In particular, preregistration reveals a researcher’s planned hypotheses, methods, and analyses and enables a comparison with their reported hypotheses, methods, and analyses in order to identify any biasing selection effects in the experimental testing context that may increase the test procedure’s error probabilities and lower its capability for severe tests.
What Type of Severity?
Mayo’s (1996, 2018) error statistical conceptualization of severity is not the only one out there! Other types of severity have been proposed by Bandyopadhyay and Brittan (2006), Hellman (1997, p. 198), Hitchcock and Sober (2004, pp. 23-25), Horwich (1982, p. 105), Lakatos (1968, p. 382), Laudan (1997, p. 314), Popper (1962, 1983), and van Dongen et al. (2023). Furthermore, preregistration may not facilitate the transparent evaluation of these other types of severity. In my recent article, I illustrate this point by showing that, although preregistration can facilitate the transparent evaluation of Mayoian severity, it does not improve the transparent evaluation of Popperian severity.
I show that a valid measurement of Popperian severity can be made using a potentially p-hacked result, a potentially HARKed hypothesis, and potentially biased background knowledge. In addition, I show that Popper’s "requirement of sincerity" can be transparently evaluated during a public critical rational discussion among scientists. Preregistration does not facilitate a transparent evaluation in either case because neither evaluation requires knowledge of the researcher’s planned approach or unreported biasing selection effects.
Preregistration When Deviations are Allowed
I also argue that a preregistered test procedure that allows deviations does not provide a more transparent evaluation of Mayoian severity than a non-preregistered procedure. In particular, I consider deviations that are intended to maintain or increase the validity of a test procedure in light of unexpected issues that arise in particular samples of data (e.g., a violation of the assumption of homogeneity). I argue that a test procedure that allows these sample-based validity-enhancing deviations in its implementation will suffer an unknown inflation of its Type I error rate due to the forking paths problem (Gelman & Loken, 2013, 2014). Consequently, the test procedure will have an unknown reduction of its capability to license inferences with Mayoian severity.
I conclude that preregistration does not improve the transparent evaluation of severity in Popper’s philosophy of science or when deviations are allowed.
The Article
Rubin, M. (2025). Preregistration does not improve the transparent evaluation of severity in Popper’s philosophy of science or when deviations are allowed. ArXiv.
