starting a new section of the course, where are we? We talked about what I maybe in this slide [UNKNOWN] calling informatics. So the management, manipulation, integration of data. And we have some emphasis on scale and we have some emphasis on specific tools. Okay, so now we're moving into what I'll call analytics. And so we're going to talk about statistical estimation and prediction. And one of the points I want to make is that this builds on the informatics, in that the things you're going to learn, you can implement using the tools before. And we already saw a piece of this, where we did sort of mult-, you know, matrix multiplication in various tools and so on. Okay? Okay. The other point, if you remember, we made early on was that, you know, 80% of what people think of as analytics really boils down to the ability to do sums and averages. And so we'll see a little bit of that in here. We're maybe, you know, understanding the problem and understanding the solution is, you know, hard or easy, depending on your, maybe your background. But as far as implementing it, it's, it's not too bad. Okay? And then we'll move on to the visual, visualization in a couple of weeks. [MUSIC]. All right. So to get started on this [SOUND] I want to call your attention to this article in 2010 from The New Yorker, with the caveat that this is far from a research article. and in fact, a lot of what the article has to say, I'm not sure I'd recommend taking to, to heart. But the point the, you know, topic that they bring up is that, you know, the title is here, is the truth wears off. And what they're exploring is this notion that statistical results in the sciences seem to have gotten weaker over time. And so, John Davis', you know, a researcher at University of Illinois who does work on anti-depressants, is quoted in this and, is discussed as talking
about how a forthcoming analysis
demonstrating the efficacy of any anti-depressant has gone down as much as three fold in recent decades. So this is that the effectiveness as measured by clinical trials of these antidepressants has gotten a lot weaker. The article also talked about Anders Moller who studied barn swallows and discovered that the females were more likely to mate with males that had long symmetrical feathers. And these, these findings sort of relied on precise measurements of the symmetry. And so, this was a pretty significant discovery, but over the course of the next five or six years, the effect size, as discovered by himself and other researchers shrank by 80%. Okay. A lot of the article talks about Jonathan Schooler in 1990 who made a discovery of an effect that he called verbal overshadowing. which was counterintuitive because it showed that people that are asked to describe a face, you know, using English, that they've, that they've seen were actually less likely to remember it than those who had just seen the face. And so, the, you know, talking about the face somehow overshadowed the effect of just seeing it alone. Okay. And once again, this effect seemed to get weaker over time and it became increasingly difficult to measure but you including by Jonathan Schooler himself. And, in fact, he's quoted as saying this, this frustrated him. Right? He was having trouble replicating it. Okay. And then they also bring up someone who's a little less respected. Well a little off, fair amount unless respected in the scientific community as an historical example. And its the person who actually coined the term the decline effect, which is brought up over and over again in the article. So, in the 1930s, Joseph Rhine tested individuals with these card guessing experiments in an attempt to measure the effect of extrasensory perception or ESP. And he's the one who actually coined that term. So, he had a few students that achieved multiple streaks of very low probability,
you know, many, many cards in a row they
guessed right and so on. But there was a decline effect, in the same candidates, the same participants, couldn't match the earlier performance, okay. And so, the article touches on what is essentially the correct explanation [LAUGH] to this effect. And it also sort of brings up the possibility of a bunch of quasi, you know, mystical, incorrect explanations of this, at least in my opinion. So I want to tell you about, I wan-, I want to, you know, in the next couple of segments, in the next few segments, I want to sort of explore this as a test case for statistics, statistics and statistical estimation. And I want to use it as a vehicle to introduce the fundamental concepts of statistics and also some of the, somewhat more advanced concepts of statistics, especially as, as they relate to big data. Okay. So to get started, let's talk about the background here, and then we'll come back to this specific article and, and explore the reasons for why this, why the truth wears off. Okay. So this is going to be, this is not going to be a replacement for a introductory college statistics course. This is going to be a quick overview of the terminology and the concepts you should be familiar with. Okay. So we're talking about statistical inference, here. And so these are methods for drawing conclusions about a popula-, general population from sample data. And there's two key methods that you can use here: hypothesis tests and confidence intervals. And we're going to bring up confidence intervals again later on, but I'm not going to talk about them directly right now. All right. So what is hypotheses testing? Well, you're going to be comparing an experimental group, to a control group. And, there's always going to be a null hypotheses. And a null hypotheses is there's just no difference between these two groups. Right?
The one who received the treatment in
question, are no different than the ones who did not. you know, the, the new website generates no more traffic than the old website, than the control website, than the default, and so on. Okay, so that's the null hypothesis. The alternative hypothesis is that there is an effect. That there's a statistically significant difference between the two. And so here difference is defined in terms of some test statistic. And you can, most of these examples you'll find in an introductory course, or really any course, there going to be about comparing the means. Right? So the average affect in the control group was different than the average affect in the experimental group, okay. Now, a lot of what statistics is about is actually designing the experiment to collect the data. And in a data science regime, in a big data regime, we're actually less frequently in the con-, in, in in a position to design these things in the first place. A lot of times we are dealing with data that we did not Is how they collect. Okay. So that's maybe one difference between classical statistics and the, the way I want to present this material for purposes of data science. Okay. That being said, it's important to understand that careful experimental design is really the most important [LAUGH] thing there is in all this work, right. The, the, the analysis techniques are second fiddle to the proper collection of data, okay. So this includes things like randomized trials, blinded and double blinded. And so, you know. What is blinded, means that the, participants themselves do not know which group they're in. And that's, pretty much non-negotiable, right? You can't tell people that they're getting, a placebo drug versus the, the actual drug or where they'll, they'll, you know, the [LAUGH] they'll report their symptoms differently as an effect. randomize is also, would be non
negotiable except the fact that it's
difficult to achieve in practice, in some cases. So randomized would mean we we draw a sample through some method and then we assign them to the groups to the control group and the experimental group with no process whatsoever. Right? It's just purely random. Okay. So this framework expressed in just these sort of few bullets at a high level is unbelievably powerful. Right? It's completely universal to data analysis. It's really important to internalize these points. And we'll go into some detail on the other aspects that aren't included in these slides. So, some examples you can dream up, you know, that the measuring the effect of a new ad placement on your website, compared to the control group of the existing placement measuring the effect of a treatment against a sugar pill, or the best existing treatment. Okay. And everything else you might imagine. So, to summarize hypothesis testing, you can organize the terminology into this grid here where there's two possibilities [MUSIC] for this true state of the world. One is that the null hypothesis is true. There is no difference between the control group and the experimental group. And the other is that the null hypothesis is false. That there is an effect that you're measuring. Okay. And so, then, there's also two possibilities for the outcome of your statistical test. In one case, you do not reject the null hypothesis. Right? You find no evidence that there's any difference between the groups. And the other is that you reject the null hypothesis. You do find evidence that there's differences between the groups. Okay? So if the null hypothesis is true, there is no difference. But you detect a difference. That's a Type 1 error.
And the, rate at which that happens you
can, is, we will refer to as alpha. And that might come up at times as we may, as we have this discussion. [INAUDIBLE] And if you make the correct decision, then the probability of that is 1 minus alpha, when the null hypothesis is true. When the null hypothesis is false and you fail to reject it, alright, there is an effect and you fail to measure it, that's Type 2 error and that's beta. And when you do reject a null hypothesis when it's false, right, you detect an effect when there is an effect to detect that's one minus beta. And this is called this, the power of the test, the statistical power.