comparing two groups, the control group and the experimental group, is the name of the game. And we said that we're trying to take some test statistic and to measure the difference between those two groups. But, how different is different enough to be significant? Okay? So how, in other words, how do we know that the difference that we saw in the experiment is not attributable to just chance? Well, the answer is we don't, but we can calculate the probability that that, that it's attributable to chance. And that's what the p-value is, okay? So, the p-value is the following, if you'd repeat the experiment over and over again at the same sample size What percentage of the time would you see results that were at least as extreme as the ones you got in this experiment? And this is all assuming the null hypothesis is true. So, let me say that again. Assuming that there is no difference between the groups, right, the control group really is the same population as the experimental group Group. Alright, the treatment has no effect. If I were to do the same experiment over and over again, what percentage of the time would I, would I see, see a difference in the treatment group anyway, just by chance? Okay, and that's what the p-value is. Fine. So, more terminology, you know, you me, you could think about two sided versus one sided. So two sided is if we're measuring something in terms of the absolute value. Right, so the p-value is two times the probability that x is greater than the absolute value of the measured value, and if the test is one sided It's either greater than or less than and here the notation that we're using is mu, which is a mean, and mu, mus sub 0 is the, mean of the population of the Null Hypothesis. So this screenshot is taken from a nice applet that you can find. Online and play with here. But here if, if the null hypotheses is that the mean is 325, and we're doing a two sided test, where mu is not equal to 325.
We're saying it must be something
different than that. Either higher or lower, right. and the sample size is 10 and the observed sample mean is 328. Then when you click the Show P button on the applet, what you get is Is it computes the p value for you and shows this colored region. And so, these colored regions. And so these colored regions the area to that curve is the p-value. Okay. So that's the probability. That it's at least as extreme as the measured value. Okay. And if you get, you know, here the only change I've made is that the sample mean was 329 instead of 328, which means it's even less likely that you would see this by chance. And so the area into those curves is even smaller. And you notice the p value change. The p value went from 0.0574 to 0.0114, okay? So in order to make some sort of a decision, you know, did this treatment work, right? Do we invest in this treatment? Do we move on to the next stage of trials? We need some sort of a threshold, some sort of a cut-off for the p value. So what is that cut off? Well it's 0.05. Why? No good reason, it makes the math work out, okay. So this is a 1 in 20 chance. If you can show it's more rare than a 1 in 20 chance then that's deemed to be good enough, okay. This is the subject of a lot of controversy, depending on what circles you what, what sort of literature you're reading. And we'll talk a little more about this in, in a few segments, but that's all I'm going to say about it right now. So 0.05 is what people are looking for. Alright. So now that you are armed with a little bit of basic terminology, let's go back to. This first slide from this New York article. And so the question that we raised was what accounts for this truth wearing off
effect, how can we explain what's going
on. Okay. So one reason is publication bias. All right. So let me read you a couple quotes of, from this article about publication bias. So, in the last few years several meta-analysis, and we're talk about what a meta-analysis is in a little bit, have re-appraised the efficacy and safety of antidepressants included a therapeutic value of these drugs, may have been significantly over-estimated. Okay. Although public, and, and there's other examples in this article as well, okay. So go back review the literature and find out that things have been overstated. Why? So, although publication bias has been documented in literature for decades and origins and consequences debated extensively, there is evidence suggesting that this bias is increasing. Alright. So I haven't told you about publication bias is yet but you may be familiar with the concept and see some of the effects. So a case in point in the field of biomedical research and autism spectrum disorder which suggest that in some areas negative results are completely absent. Alright. So what does that mean? That means that you're only publishing papers that show significant positive gains, right? If we try several treatments and none of them work except for one, we try 20 treatments and only one works. How many papers do we publish? One not 20. Okay? So, how is this a problem? Okay. So, do we have an explanation for this decline effect with publication bias? So, how does this actually work? Well, Let's make a plot where those study size is on the x axis, and notice this is log scale. Right. So this is ten and this a 100 and this is 1000 and so on. Alright? Well, somebody decides we need the number of patients, saying that are involved in the study. So the bigger the study size, the more
statistical power you have and we'll
define statistical power means precisely in a bit. But the better you are able to determine actual effects, right. And the assumption here is perhaps that you know, as time goes on and you see some results, you are able to, you or other researchers are able to garner more money. More funding to do larger and larger studies. Right? So this is maybe phase one, phase two, phase three trials of some new drug. They get bigger, and bigger, and bigger sets of patients as you get more momentum behind it. And this data is not real. This data is simulated. But imagine you see this kind of decline effect where the results,okay well, sorry, the y-axis is the effect size. And we'll talk about what the effect size is precisely in a little while but this is the degree of positive outcome, let's say. Let's say negative is bad and positive is good. So this is, you know, the number of smokers you were able to convince to quit with some intervention counseling method. Or the, you know, number of white blood cells increased as result of some treatment or so on. Okay. So, positive is good. Well, this decline, let's imagine, shows this sort of a pattern. Right? Where early studies with just a ten participants is up here, and as the study went up it sort of got worse, and worse, and worse. This is where, this is the effect that we see. How do we explain this? Well, this is directly explainable, this kind of and effect would be directly explainable just by publication. Bias. [COUGH] Right? Imagine that every dot, now is a test that done by some group somewhere for this phenomenon. What you'd expect is this kind of funnel shape, where, where the studies get more and more accurate as you get larger and larger and larger. Right, and they well, this is, this is,
you can't get around this.
Right, as the study size goes up, you do have more statistical power, you're able to better discriminate real effects from false effects and so on. But you'll notice that the actual effect that it's regressing to here is 0.0. There is no effect and yet, of course, you're going to get some just due to variability out here. And so if you only report the positive ones you'll end up with this mysterious decline effect. Okay.