Eugenijus Gefenas, Margarita Poškutė & Vygintas Aliukonis

01 April 2021

No Comments

Home Op-ed

Publication bias – understanding statistical significance.

Publication bias – understanding statistical significance.

Publication bias has been observed in various fields of scientific literature for almost four decades[i]. Usually, scientific research with negative results is less likely to be published than research with positive results. Meantime, research with positive results in scientific literature averagely increased by 6 percent early from 1990 and reached a soaring 85 percent in 2007[ii].

 

To better understand the topic we are dealing with, let’s first find out what positive and negative results mean. Positive research results support the research hypothesis, such as that a new treatment is superior compared to the existing one and presents statistically significant results. Meanwhile, the term negative research result can generate confusion because it refers to both: insufficient statistical significance and proven null hypothesis[iii], where the new treatment being tested had no effect as compared to the existing one.

 

A misconception is often found in the literature that a study without positive results was unsuccessful[iv]. This mistaken belief is so widespread that both editors and the authors themselves do not tend to publish negative results, as such publications receive hardly any readers and even fewer citations[v].

Usually, scientific research with negative results is less likely to be published than research with positive results.

The question may be, what’s wrong here with not publishing negative research?

 

When negative results aren’t published, researchers cannot learn from other research professionals in the field. Therefore, without knowing it, they repeat the same experiments, again and again, thus wasting time and money. What is even worse, such behavior does not help science to move forward and distorts the results. If the same experiment is repeated several times, but only in one of them we get a positive outcome, can we say that the hypothesis has been confirmed? However, due to the publication bias, it is likely that only that single positive study will be published, thus significantly distorting the overall knowledge in the specific scientific field. While in social research, such behavior poses a resource problem; it can cost health or even life in biomedicine. Limiting our analysis only to positive research, we create an unreasonably high expectation for a new treatment or diagnostic methods.

 

In this post, we want to focus on the statistical significance part of the negative research results. You’ve probably read something like “statistically significant” or heard scientists refer to  P-values whenever they report the results from their experiment. Statistical significance is usually understood as P ≤ 0.05.

It is likely that only that single positive study will be published, thus significantly distorting the overall knowledge in the specific scientific field.

But what exactly is a P-value? A p-value is a number that can be any value between 0 and 1. This number represents how much the observed data disagrees with the null hypothesis. And the null hypothesis states that there is no difference between study groups. In other words, when the p-value is minimal – data suggests it is less likely that the groups being studied are the same.

 

Put it simply, P ≤ 0.05 means that there is more than a 95% probability that there is a real difference between the study groups. However, there’s no good reason for P ≤ 0.05 to be the cutoff for the statistical significance – it’s arbitrary.

 

For example, imagine that you are researching a drug that reduces blood pressure. In this case, the null hypothesis is that the drug is not effective. If we find that the drug reduces blood pressure by some meaningful amount after conducting a randomized experiment, we can reject the null hypothesis.

When the p-value is minimal – data suggests it is less likely that the groups being studied are the same.

It is essential to understand that we haven’t proven that the drug is effective because we haven’t tested it in every possible case. It was just a single experiment. However, we’ve provided evidence to reject the null hypothesis that the drug is not effective. The more experiments we run that show that the drug effectively reduces blood pressure, the more our assurance develops.

 

We cannot reach a stage where we can claim with absolute certainty that a drug lowers blood pressure because we cannot perform all possible experiments with all possible variables. Still, we can reach a level of evidence that would allow us to practically prescribing the drug. So when you see someone reporting a statistically significant result, what they mean is that there was enough evidence provided to reject a null hypothesis.

 

But why do we want to talk about P-value in general?

 

In 2017, the American Statistical Society (AMSTAT) issued a statement calling researchers to use the P value less.  Most scholars now understand the term “statistically significant” as a hypothesis stating that such a study is of high quality. In contrast, statistically insignificant studies are inferior and not worthy of attention. However, P-value was never intended for such use. According to the AMSTAT, “Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply the lack of importance or even lack of effect.”[vi]

So when you see someone reporting a statistically significant result, what they mean is that there was enough evidence provided to reject a null hypothesis.

A P-value of 0.05 does not guarantee inferior results. Statistics cannot be viewed in a vacuum when attempting to make conclusions in science. By focusing merely on P-value but not performing deeper analysis, researchers may mistakenly assume that a therapeutic option is insignificant while the reality is different. A notable example has been a clinical trial with pentoxifylline – an inexpensive drug registered in most parts of the world to treat intermittent claudication. The researchers wanted to check the indications of this drug for resolving recurrent venous leg ulcers. In a randomized study, the investigators received a clear clinical benefit compared to placebo, but the P-value was higher than 0.05 due to the study’s methodological features. Consequently, the drug has not been approved for the treatment of this pathology. Nevertheless, a meta-analysis later confirmed that the drug is effective in the treatment of venous leg ulcers[vii].

 

This case only confirms that no experiment can provide absolute proof of the absence of an effect.

 

We need to remember that statistical significance is not the same thing as clinical significance. Clinical significance is the practical importance of the finding. In contrast to the pentoxifylline example, there may be times when there is a statistically significant difference between the two drugs, but the difference is so tiny that using one over the other does not change the situation. Let us return to our first example of the new antihypertensive drug trial, which provided a statistically significant blood pressure reduction. Still, in case that reduction is only 1 or 2 mm Hg, it has no clinical implications for the patient.

We need to remember that statistical significance is not the same thing as clinical significance.

In conclusion:

 

The P-value alone cannot answer significant questions. To make larger conclusions about research results, you also need to consider additional factors such as the study’s design and other studies’ results on similar topics. A study can have a P-value of less than 0.05 and be poorly designed or disagree with all available research on the matter and vice versa. A study with P-value >0.05 can be methodologically sound and provide invaluable knowledge to science. Therefore, it is necessary to educate the academic community that exemplary research with excellent methodology is relevant to science, has lasting value, and promotes scientific development regardless of statistical significance.

[i]Begg CB. A measure to aid in the interpretation of published clinical trials. Stat Med. 1985 Jan-Mar;4(1):1-9.

[ii]Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics 2012;90:891–904. https://doi.org/10.1007/s11192-011-0494-7

[iii]Fanelli D. Negative results are disappearing from most disciplines and countries. Scientometrics. 2012;90(3):891-904. https://doi.org/10.1007/s11192-011-0494-7

[iv]Bresee L. The Importance of Negative and Neutral Studies for Advancing Clinical Practice. Can J Hosp Pharm. 2017 Nov-Dec;70(6):403-4..

[v]Dickersin K. The existence of publication bias and risk factors for its occurrence. JAMA. 1990;263:1385–9

[vi]https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.YDp6CNyxVPb

[vii]https://www.clinicalleader.com/doc/are-you-overvaluing-your-clinical-trial-p-values-0001

Leave a Reply

Your email address will not be published. Required fields are marked *