You are on page 1of 6

[MUSIC].

All right, so we've started a new week,


starting a new section of the course,
where are we?
We talked about what I maybe in this
slide [UNKNOWN] calling informatics.
So the management, manipulation,
integration of data.
And we have some emphasis on scale and we
have some emphasis on specific tools.
Okay, so now we're moving into what I'll
call analytics.
And so we're going to talk about
statistical estimation and prediction.
And one of the points I want to make is
that this builds on the informatics, in
that the things you're going to learn,
you can implement using the tools before.
And we already saw a piece of this, where
we did sort of mult-, you know, matrix
multiplication in various tools and so
on.
Okay?
Okay.
The other point, if you remember, we made
early on was that, you know, 80% of what
people think of as analytics really boils
down to the ability to do sums and
averages.
And so we'll see a little bit of that in
here.
We're maybe, you know, understanding the
problem and understanding the solution
is, you know, hard or easy, depending on
your, maybe your background.
But as far as implementing it, it's, it's
not too bad.
Okay?
And then we'll move on to the visual,
visualization in a couple of weeks.
[MUSIC].
All right.
So to get started on this [SOUND] I
want to call your attention to this
article in 2010 from The New Yorker, with
the caveat that this is far from a
research article.
and in fact, a lot of what the article
has to say, I'm not sure I'd recommend
taking to, to heart.
But the point the, you know, topic that
they bring up is that, you know, the
title is here, is the truth wears off.
And what they're exploring is this notion
that statistical results in the sciences
seem to have gotten weaker over time.
And so, John Davis', you know, a
researcher at University of Illinois who
does work on anti-depressants, is quoted
in this and, is discussed as talking

about how a forthcoming analysis


demonstrating the efficacy of any
anti-depressant has gone down as much as
three fold in recent decades.
So this is that the effectiveness as
measured by clinical trials of these
antidepressants has gotten a lot weaker.
The article also talked about Anders
Moller who studied barn swallows and
discovered that the females were more
likely to mate with males that had long
symmetrical feathers.
And these, these findings sort of relied
on precise measurements of the symmetry.
And so, this was a pretty significant
discovery, but over the course of the
next five or six years, the effect size,
as discovered by himself and other
researchers shrank by 80%.
Okay.
A lot of the article talks about Jonathan
Schooler in 1990 who made a discovery of
an effect that he called verbal
overshadowing.
which was counterintuitive because it
showed that people that are asked to
describe a face, you know, using English,
that they've, that they've seen were
actually less likely to remember it than
those who had just seen the face.
And so, the, you know, talking about the
face somehow overshadowed the effect of
just seeing it alone.
Okay.
And once again, this effect seemed to get
weaker over time and it became
increasingly difficult to measure but you
including by Jonathan Schooler himself.
And, in fact, he's quoted as saying this,
this frustrated him.
Right?
He was having trouble replicating it.
Okay.
And then they also bring up someone who's
a little less respected.
Well a little off, fair amount unless
respected in the scientific community as
an historical example.
And its the person who actually coined
the term the decline effect, which is
brought up over and over again in the
article.
So, in the 1930s, Joseph Rhine tested
individuals with these card guessing
experiments in an attempt to measure the
effect of extrasensory perception or ESP.
And he's the one who actually coined that
term.
So, he had a few students that achieved
multiple streaks of very low probability,

you know, many, many cards in a row they


guessed right and so on.
But there was a decline effect, in the
same candidates, the same participants,
couldn't match the earlier performance,
okay.
And so, the article touches on what is
essentially the correct explanation
[LAUGH] to this effect.
And it also sort of brings up the
possibility of a bunch of quasi, you
know, mystical, incorrect explanations of
this, at least in my opinion.
So I want to tell you about, I wan-, I
want to, you know, in the next couple of
segments, in the next few segments, I
want to sort of explore this as a test
case for statistics, statistics and
statistical estimation.
And I want to use it as a vehicle to
introduce the fundamental concepts of
statistics and also some of the, somewhat
more advanced concepts of statistics,
especially as, as they relate to big
data.
Okay.
So to get started, let's talk about the
background here, and then we'll come back
to this specific article and, and explore
the reasons for why this, why the truth
wears off.
Okay.
So this is going to be, this is not
going to be a replacement for a
introductory college statistics course.
This is going to be a quick overview of
the terminology and the concepts you
should be familiar with.
Okay.
So we're talking about statistical
inference, here.
And so these are methods for drawing
conclusions about a popula-, general
population from sample data.
And there's two key methods that you can
use here: hypothesis tests and confidence
intervals.
And we're going to bring up confidence
intervals again later on, but I'm not
going to talk about them directly right
now.
All right.
So what is hypotheses testing?
Well, you're going to be comparing an
experimental group, to a control group.
And, there's always going to be a null
hypotheses.
And a null hypotheses is there's just no
difference between these two groups.
Right?

The one who received the treatment in


question, are no different than the ones
who did not.
you know, the, the new website generates
no more traffic than the old website,
than the control website, than the
default, and so on.
Okay, so that's the null hypothesis.
The alternative hypothesis is that there
is an effect.
That there's a statistically significant
difference between the two.
And so here difference is defined in
terms of some test statistic.
And you can, most of these examples
you'll find in an introductory course, or
really any course, there going to be
about comparing the means.
Right?
So the average affect in the control
group was different than the average
affect in the experimental group, okay.
Now, a lot of what statistics is about is
actually designing the experiment to
collect the data.
And in a data science regime, in a big
data regime, we're actually less
frequently in the con-, in, in in a
position to design these things in the
first place.
A lot of times we are dealing with data
that we did not Is how they collect.
Okay.
So that's maybe one difference between
classical statistics and the, the way I
want to present this material for
purposes of data science.
Okay.
That being said, it's important to
understand that careful experimental
design is really the most important
[LAUGH] thing there is in all this work,
right.
The, the, the analysis techniques are
second fiddle to the proper collection of
data, okay.
So this includes things like randomized
trials, blinded and double blinded.
And so, you know.
What is blinded, means that the,
participants themselves do not know which
group they're in.
And that's, pretty much non-negotiable,
right?
You can't tell people that they're
getting, a placebo drug versus the, the
actual drug or where they'll, they'll,
you know, the [LAUGH] they'll report
their symptoms differently as an effect.
randomize is also, would be non

negotiable except the fact that it's


difficult to achieve in practice, in some
cases.
So randomized would mean we we draw a
sample through some method and then we
assign them to the groups to the control
group and the experimental group with no
process whatsoever.
Right?
It's just purely random.
Okay.
So this framework expressed in just these
sort of few bullets at a high level is
unbelievably powerful.
Right?
It's completely universal to data
analysis.
It's really important to internalize
these points.
And we'll go into some detail on the
other aspects that aren't included in
these slides.
So, some examples you can dream up, you
know, that the measuring the effect of a
new ad placement on your website,
compared to the control group of the
existing placement measuring the effect
of a treatment against a sugar pill, or
the best existing treatment.
Okay.
And everything else you might imagine.
So, to summarize hypothesis testing, you
can organize the terminology into this
grid here where there's two possibilities
[MUSIC] for this true state of the world.
One is that the null hypothesis is true.
There is no difference between the
control group and the experimental group.
And the other is that the null hypothesis
is false.
That there is an effect that you're
measuring.
Okay.
And so, then, there's also two
possibilities for the outcome of your
statistical test.
In one case, you do not reject the null
hypothesis.
Right?
You find no evidence that there's any
difference between the groups.
And the other is that you reject the null
hypothesis.
You do find evidence that there's
differences between the groups.
Okay?
So if the null hypothesis is true, there
is no difference.
But you detect a difference.
That's a Type 1 error.

And the, rate at which that happens you


can, is, we will refer to as alpha.
And that might come up at times as we
may, as we have this discussion.
[INAUDIBLE] And if you make the correct
decision, then the probability of that is
1 minus alpha, when the null hypothesis
is true.
When the null hypothesis is false and you
fail to reject it, alright, there is an
effect and you fail to measure it, that's
Type 2 error and that's beta.
And when you do reject a null hypothesis
when it's false, right, you detect an
effect when there is an effect to detect
that's one minus beta.
And this is called this, the power of the
test, the statistical power.

You might also like