We use cookies to improve your website experience. To learn about our use of cookies and how you can manage your cookie settings, please see our Cookie Policy. By closing this message, you are consenting to our use of cookies.

Episode 2

Ron Wasserstein: Misunderstandings of ‘statistical significance’

A small p-value is like a right swipe in Tinder. It means you have an interest. It doesn’t mean you’re ready to book the wedding venue.

GUEST RESEARCHER: RON WASSERSTEIN

Ron Wasserstein has been the executive director of the American Statistical Association (ASA) since 2007, promoting the practice and profession of statistics. Previously, he was a faculty member of the department of mathematics and statistics at Washburn University in Kansas.

The podcast episode focuses on Ron’s research article: ‘The ASA’s Statement on p-Values: Context, Process, and Purpose‘. Ron was tasked with leading the creation of a framework outlining how p-values should be used in research and this article was the result. It’s had over 300,000 views and over 1,000 citations and is changing the way researchers approach understanding their results.

12-WEEK LEARNING PROGRAMS FOR RESEARCHERS

Supercharge your research career with our learning programs for early and mid-career researchers. Over 12 weeks you’ll learn how to enhance your personal profile, boost the impact of your published research, and develop key skills to progress your research career.

TRANSCRIPT

Ron Wasserstein (RW): I use this example in my talks to various groups on p-values. A small p-value is like a right swipe in Tinder. It means you have an interest. It doesn’t mean that you’re ready to book a wedding venue.

Kaitlyn Regehr (KR): That was Ron Wasserstein, executive director of the American Statistical Association, the ASA, an organisation whose job it is to promote the practice and profession of statistics. Ron wasn’t talking about the probability of meeting a potential spouse on a dating app, but rather the cautionary approach to the use of p-values, that we as researchers should apply.

RW: Uncertainty exists every place. It’s just like the frigid weather in a Wisconsin winter. There are those that will flee from it, trying to hide in warmer havens elsewhere. Others will accept it and even delight in the omnipresent cold. These are the ones who buy the right gear and bravely take advantage of all the wonders of a challenging climate.

Significance tests and dichotomised p-values, which we’ll talk a lot about during this podcast, have turned many researchers into what I’ll call, ‘scientific snow birds’, trying to avoid dealing with uncertainty by escaping to a ‘happy place’, where results are either statistically significant or they’re not. But in the real world, data provide a noisy signal. Variation, one of the causes of uncertainty, is it’s everywhere. Exact replication is difficult to achieve. So what we’ve argued, what I’ll talk about during this podcast, is it’s time to get the right statistical gear on and move towards a greater acceptance of uncertainty and embracing of variation.

[How Researchers Changed the World podcast music and introduction]

KR: Welcome to How Researchers Changed the World, a podcast series supported by Taylor & Francis which will demonstrate the real-world relevance, value, and impact of academic research; and highlight the people and stories behind the research. My name is Dr. Kaitlyn Regehr. I’m an academic researcher; an author and a scholar of digital and modern culture, and I’m interested in how new technologies can broaden the reach and real-world impact of academic research.

In today’s episode we’re exploring The American Statistical Association’s statement on ‘P-values: Context, Process and Purpose’P-values or probability values are a statistical evaluation tool used widely in research. The use of p-values in statistical hypothesis testing is common in many fields of research such as physics, economics, finance, political science, psychology, biology and sociology. Their misuse has been a matter of considerable controversy for decades. And as Ron tells us, it’s time to set the record straight.

RW: I’m Ron Wasserstein and I’m the executive director of The American Statistical Association, the world’s largest community of statisticians. We have members, statisticians, in over 90 counties. We provide membership services for statisticians. Statisticians are people who work in colleges and universities but also in government and in every manner of industry providing various kinds of really interesting statistical services for people who need to understand their data better.

KR: Okay so what exactly is a p-value and what’s the big problem?

RW: Why don’t I start with a little bit about what a p-value is and why anybody should care? So a p-value is, at the most basic level, a way of summarizing the compatibility or incompatibility between a set of data has and a proposed model for that data. And the model includes a whole bunch of assumptions. And one assumption that scientists like to call a null hypothesis and simply stated the smaller the p-value, the less compatible that data is with the model.

KR: Okay, that sounds a bit complex. Let’s pull that apart a bit.

RW: That’s pretty confusing isn’t it. In fact, I’d say there’s a decent chance that when I hear this podcast later, I’ll discover I said something wrong in there. And so, right there is one of the problems. P-values are hard to explain properly and when you do, they they aren’t quite what you want to know. And so, what happened is, over the years this p-value – it’s a useful tool, let me say that to begin with, it’s a very common and very useful tool for getting a handle on whether our data that we’ve collected fits or does not fit a model – but unfortunately, as you’ve gathered from my explanation, it’s also a tool that’s very easily misinterpreted.

Might be easy for someone to conclude that, based on what I’ve said so far, that p-values are bad, they’ve always been bad. That’s not true. P-values are a great tool. They were used very effectively for many decades to advance science in many fruitful ways. Unfortunately, they also began to be used inappropriately, to be misused and to be used less effectively.

KR: P-values and statistics go back a long way.

RW: P-value was really made popular around 1925 by the most famous statistician of the twenthieth century, R.A. Fisher.

KR: R.A. Fisher was a British statistician and geneticist. For his work in statistics, he has been credited as almost single-handedly creating the foundations for modern statistical science.

RW: His idea was this. If as the result of your research, you got a small p-value then that was worth looking into further. Unfortunately, he, he chose the word significant for that. An article in Scientific American a few years ago named the word ‘significant’ as one of the seven most misunderstood words in science.

KR: Despite being a key weapon in the quantitative research arsenal, after years of controversy p-values were coming under widespread attack from various sources in the research community and in turn, have brought an entire field of statistics into disrepute.

RW: It really was right around 2010, we for example we saw an article with the headline ‘Odds are, it’s wrong: Science fails to face the shortcomings of statistics’. And that’s a pretty galling headline. But I went back this week and re-read that article and in it the author refers to statistics as a ‘mutant form of mathematics’. That’s just painful.

KR: Yeah and statisticians don’t like being referred to as mutants. But it also had wider implications for the field at large, for example in 2015 the Journal of Basic and Applied Social Psychology, banned the use of p-values and other forms of statistical methods as a result of these broad strokes misinterpretations.

RW: It was unfair and inappropriate and yet at the same time, our field had been writing for six decades about problems with p-values.

KR: For Ron and his colleagues at the ASA there was a lot at stake. Not only for the reputation of p-values but for the future of statistics as a valued and trusted research methodology. And so the ASA, led by Ron, decided to embark on the largest consultation and evaluation of its kind.

RW: We were challenged to do the ASA statement on p-values because of these attacks on statistics as a whole field of research. Because attacks on the misuse of p-values were essentially being conflated with attacks on statistics as a field.

In April of 2015, after having been challenged by a colleague, my friend George Cobb challenged me to take up this issue. April of 2014 actually. I went to the ASA board of directors and asked for some time and resources to, to take up this issue. And specifically, I asked the board for permission to start the process that might lead to the ASA writing a position paper on this issue. It was not something that we had really done before, taking a position on an issue of statistical practice this, this broad. But I felt like it was something we should consider doing and I asked the board if they agreed and and I’m very grateful to the board for giving me room to start this process.

KR: The gauntlet had been laid down and the stage had been set; not for the vindication of p-values, but for an open scientific discussion on how the ASA might provide guidance and a statement of their position on the matter.

RW: This is a unique situation and we felt after after much deliberation that it took a special measure, because this is a specific aspect of statistics, statistical inference that goes well beyond ordinary statistical practice. It has been debated for so very very long, we weren’t going to go about this in a way that said, “alright, we’re just you know, we’re taking a vote on science here and majority rules. If 15 people voted this way in favour of p-values and and 14 people vote this other way, then that’s science.” That’s not science, it’s never been science and God help us all if that becomes science.

What we decided to do was to get experts with a variety of opinions together and see what could be agreed upon. And if there wasn’t anything that could be agreed upon then so be it. If there were a set of principles upon which there were some agreement, then that’s what we would put forward. And ultimately, and I’ll say more about the process in a bit, but ultimately that’s what was published in this article, was some principles upon which this panel of experts felt that they could put their hand to, they could put their names to and that we could be collectively comfortable with putting the ASA’s name behind.

KR: Ron set out on a quest that would see hundreds of conversations, panels and meetings, disagreements and reconciliations. He began the process of finding out what others had to say about the topic, and we asked him about this process.

RW: I was tasked by the board to identify people who were interested in the topic and to convene the group and try to see whether consensus could be brought together. And so it wasn’t hard to identify people who were interested in the topic because lots of people have been writing on the topic for years. So I reached out to those people and I asked them about their interests, and I asked them to tell me who else was interested in the topic. And I had no idea how people would react. And I would say the most common reaction was, “I don’t think that there is a chance in hell that you’re going to get any kind of agreement on this, but I don’t want to be left out of the process, so I’m in.”

KR: Ron is an organizer and quickly positioned himself as the ‘ring master’, and like any maestro conducting a massive orchestra of disparate voices, he set about forming teams, assigning people to those teams, setting timelines and setting up a clear and well-thought-out reporting mechanism. Ron was excited but did this meticulous approach work?

RW: I have to say that, that was the last point in time when that project was on the rails. After that for the longest time, nothing went right. This was way way harder than I imagined it and for probably the next 16 to 17 months I wasn’t sure whether there would be an ASA statement on p-values and statistical significance.

[Break intro music]

KR: This series would not be possible without our supporting partner, internationally renowned publisher, Taylor & Francis Group. We’ve been working with Taylor & Francis to create a 12-week learning program to accompany this series; aimed at academic researchers looking to supercharge their career. It’s called ‘How Your Research Can Change The World’. Working with thousands of academic researchers, Taylor and Francis have sought to make the journey of publishing as painless as possible, no matter what stage of your career.

This flexible program will deliver the ultimate step-by-step guide to publishing your research, boosting your impact, and building your profile. Each week you’ll receive a chapter via email. Over 12 weeks those chapters will build into an indispensable guide you can continue to use throughout your research career. It’s completely free and at the end of the 12 weeks, you’ll receive a certificate and LinkedIn accreditation and have the opportunity to attend a boot camp organized by Taylor & Francis.

Interested? Head on over to howresearchers.com

[Break outro music]

Before the break we heard Ron Wasserstein in uncertainty. In all probability his plan to unify clashing factions and publish a statement on p-values was in jeopardy. It was looking like the gamble would not pay off.

RW: This was an incredibly difficult topic on which to reach any kind of consensus. It was vastly more controversial that I realized that it would be.

KR: Ron forged ahead, and after months of challenges and against seemingly impossible odds, an end was in sight.

RW: So, after many months of discussion, we gathered together right here in our offices in Alexandria, Virginia for two days of discussion. Not everybody involved in the process was able to join us but we had a couple of dozen people here. And over the course of two days of discussion ably facilitated by my colleague, Regina Nuzzo, we were able to hammer out six principles that we could agree on about p-values, that formed the framework for the ASA statement that was ultimately published.

KR: Those six values are the following:

  1. P-values can indicate how incompatible the data are with a specified statistical model
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data was produced by random chance alone
  3. Scientific conclusions, business or policy decisions should not be based solely on whether a p-value passes a specific threshold
  4. Proper inference requires full reporting and transparency
  5. A p-value, or the statistical significance, does not measure the size of an effect or the importance of a result, and
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis

RW: We did that in November of 2016. We spent another three months writing ultimately the article that became the ASA statement on p-values, hashing that out together. Even then I wasn’t sure for another couple of months whether we would manage to get that agreed upon and signed off. But we did and now some three years later that article has been viewed over 300,000 times, which is huge by statistic’s standards. It’s been cited over 1700 times which amounts to 11 citations per week over that three-year period. That’s extremely exciting. So I feel like it’s beginning to have an impact on science, which is what we really hoped for all along.

KR: The importance of this paper is hard to overstate. Underpinning many published scientific conclusions is the concept of “statistical significance,” and this is typically assessed with a p-value index. P-values are as widely used as they are misused. In an age of big-data, unprecedented scientific developments and an increase of complex datasets, properly conducted analyses and the correct interpretation of statistical results play a key role in ensuring that conclusions are sound and that uncertainty surrounding them is represented properly. The impact of proper guidelines and the proper use and interpretation of P-values affect not only research, but research funding, journal practices, career advancement, scientific education, public policy, journalism, and law. So how has this impacted the research world?

RW: Our hope was, our hope still is and we’re beginning to see that hope realized, that that tool could be re-harnessed to be properly used and, and that other tools could be appropriately used. That the p-value could be reigned in and not have the oversized impact that it has. We’re seeing that start to happen, through those citations people are recognizing that the p-value needs to be properly used. And now with the new paper that we just released and a whole bunch of other papers that were released with it, we think that the major shift that we hope to see with regards to statistical inference, is taking place and that there will be a major change in how science uses statistics effectively in evaluating research.

KR: But it would seem as though Ron’s work has only just begun. The ASA has just published further guidance in the most recent edition of The American Statistician, which is open access and written for non-statisticians. The guidance is intended to go further and argues for an end to the concept of statistical significance and towards a model which the ASA have coined their ATOM Principle: Accept uncertainty, Thoughtful, Open and Modest.

The article spells out in great detail how those principles are intended to be applied. If you’re planning on using p-values, you need to be looking into this and the ASA statement. You can find the link and summary to both on our website: howresearchers.com. There is also a full list of contributors. The debate on this matter rages on and according to Ron, that’s just the way it should be.

To find out more about this podcast and today’s topic, visit howresearchers.com/pvalues. We’d love to hear your feedback so please follow us on TwitterFacebook or LinkedIn @howresearchers. This podcast was written and produced by Monchü and recorded at Under the Apple Tree Studios. Our producers were Ryan Howe and Tabitha Whiting with editing, mixing and mastering by Miles Myerscough-Harris at WBBC. We would like to acknowledge the incredible support of Taylor & Francis Group with a special thank you to Elaine Devine and Clare Dodd.

I’m Dr. Kaitlyn Regehr. Join us next time for How Researchers Changed the WorldThanks for listening.

We use cookies to improve your website experience. To learn about our use of cookies and how you can manage your cookie settings, please see our Cookie Policy. By closing this message, you are consenting to our use of cookies.