Aurélien Allard

28 December 2021

No Comments

Home Op-ed

Bad news for reproducibility in Cancer biology.

Bad news for reproducibility in Cancer biology.

Social and bio-medical sciences have suffered from a replicability crisis over the last 10 years. Researchers in psychology (Open Science Collaboration, 2015), behavioral economics (Camerer et al., 2016), and related fields (Camerer et al., 2018) have repeatedly shown that major experiments fail to replicate when rigorously conducted a second time. On the 7th of December, the journal E-life published the results of the first major replicability project in bio-medical sciences, the Reproducibility Project: Cancer Biology (Errington, Denis, et al., 2021; Errington, Mathur, et al., 2021). Researchers attempted to reproduce 158 effects from 23 published papers. The outcome was a source of concern: according to some metrics, only 46% of the effects successfully replicated.

While there had already been strong doubts regarding the reliability of research in cancer biology, most of the worries had to be based on anecdotes.

The Reproducibility Project: Cancer Biology (From now on: RP: CB) is the first of its kind in bio-medical fields. While there had already been strong doubts regarding the reliability of research in cancer biology, most of the worries had to be based on anecdotes. Ten years ago, scientists working in private companies published several papers alerting the scientific community that private research companies generally failed to replicate experiments from universities and public research in cancer biology (Begley & Ellis, 2012; Prinz et al., 2011). However, for proprietary concerns, these publications did not include much details in terms of methodology, and it was hard to assess from the outside how the studies had been selected or whether the replications had been rigorously conducted. In contrast, the RP: CB is fully transparent, and its bleak results should represent a stern warning for the field.

 

The authors originally aimed at replicating 193 experiments from 53 high-impact papers. However, they met unexpected difficulties, and in the end had to limit the project to 50 experiments from 23 papers, giving a total of 158 effects. Despite this being the biggest replicability project ever conducted in bio-medical sciences, this is an extremely low sample size, leading to a high degree of uncertainty in the estimates.

The project identified two different kinds of issues: the first concerns the transparency of published articles and the second is about the starking diminution in effect sizes when going from the original experiments to the replications.

The project identified two different kinds of issues in Cancer Biology. The first kind of issue concerns the transparency of published articles. The main reason behind the low number of replicated experiments lies in the difficulty in understanding precisely the methods and the results published in the original articles. The raw data were publicly available for only 4 of the original 193 experiments, and, in the vast majority of cases (68%), the replicators were unable to obtain the data from the original authors. Second, none of the 193 experiments were described with enough details to replicate the original methods, so that the replicators had to be guided by the original authors regarding key information. This means that, when the original authors were unavailable, replicating the original experiments was simply impossible.

 

The second kind of issue identified by the project concerns the starking diminution in effect sizes when going from the original experiments to the replications. The vast majority of the original effects were positive effects (136 out of 158, or 86%). If we focus on the positive effects, the team observed a reduction in 85% in effect sizes when comparing the median effect size in the original experiments and the median effect size of the replications. If the selected experiments were an unbiased selection, we would expect no diminution at all. Still focusing on the positive results, 97% of the effect sizes of the replications were lower than the original effect sizes. Combining both positive results and null results, 92% of replication effect sizes were smaller than the original. While declaring that a replication is a success or a failure is subject to debates, the RP: CB team offered five different methods, and concluded that, if determining replication success as success based on at least 3 of these 5 methods, the overall replicability success was 46%.

While this represents concerning news, these results probably shouldn’t be considered as representative of all biology, or all medical sciences.

While this represents concerning news, a few caveats should be noted. First, these results probably shouldn’t be considered as representative of all biology, or all medical sciences. Clinical trials, that is, trials that involved human participants and experimented on them, have stood much more scrutiny than animal or in vitro research. It is consequently likely that their results would be more replicable. The low replicability of preclinical trials is a massive source of waste and inefficiency (Arrowsmith, 2012), but it doesn’t mean that drugs destined for human consumption are unsafe and inefficient, since they have been submitted to much more rigorous tests.

 

Second, we should note two possible biases in the selection process. The studies were selected due to their having high impact, and past research has shown that highly cited papers tend to be less replicable (Cova et al., 2018; Serra-Garcia & Gneezy, n.d.). This means that randomly selected papers in Cancer Biology might have fared better. However, highly-cited papers are arguably the most important ones, since they are likely to be the ones leading to clinical trials in the future; we might not care much about the replicability of papers that are not cited.

The second bias in study selection points in the opposite direction: the RP: CB has only replicated studies that had enough methodological details.

Moreover, the second bias in study selection points in the opposite direction: the RP: CB has only replicated studies that had enough methodological details, or whose authors were cooperative enough to fill in the missing information. This means that, out of all most-cited papers, the authors have arguably only attempted to reproduce the most rigorous ones. Indeed, in an unrelated project in psychology, Wicherts and colleagues showed preliminary evidence that original authors were less likely to respond to follow-up emails if their articles had more errors and weaker evidence, even when the follow-up emails made no reference to these errors (Wicherts et al., 2011). It is likely that the replicability of the most influential papers in Cancer Biology is even worse than the dismal 46% replicability rate makes it appear. All in all, these results point towards the need for major reforms in Cancer biology.

 

References

Arrowsmith, J. (2012). A decade of change. Nature Reviews Drug Discovery, 11(1), 17–18. https://doi.org/10.1038/nrd3630

 

Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 483531a. https://doi.org/10.1038/483531a

 

Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., & Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science, aaf0918. https://doi.org/10.1126/science.aaf0918

 

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z

 

Cova, F., Strickland, B., Abatista, A., Allard, A., Andow, J., Attie, M., Beebe, J., Berniūnas, R., Boudesseul, J., Colombo, M., Cushman, F., Diaz, R., N’Djaye Nikolai van Dongen, N., Dranseika, V., Earp, B. D., Torres, A. G., Hannikainen, I., Hernández-Conde, J. V., Hu, W., … Zhou, X. (2018). Estimating the Reproducibility of Experimental Philosophy. Review of Philosophy and Psychology. https://doi.org/10.1007/s13164-018-0400-9

 

Errington, T. M., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021). Challenges for assessing replicability in preclinical cancer biology. ELife, 10, e67995. https://doi.org/10.7554/eLife.67995

 

Errington, T. M., Mathur, M., Soderberg, C. K., Denis, A., Perfito, N., Iorns, E., & Nosek, B. A. (2021). Investigating the replicability of preclinical cancer biology. ELife, 10, e71601. https://doi.org/10.7554/eLife.71601

 

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716

 

Prinz, F., Schlange, T., & Asadullah, K. (2011). Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery, 10(9), 712–712. https://doi.org/10.1038/nrd3439-c1

 

Serra-Garcia, M., & Gneezy, U. (n.d.). Nonreplicable publications are cited more than replicable ones. Science Advances, 7(21), eabd1705. https://doi.org/10.1126/sciadv.abd1705

 

Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLOS ONE, 6(11), e26828. https://doi.org/10.1371/journal.pone.002682

 

Feature image author – @Pavel Danilyuk

Leave a Reply

Your email address will not be published. Required fields are marked *