You are on page 1of 5

[MUSIC].

Okay, so we talked about publication bias


and on the y-axis of that plot we had
effect size.
So what is effect size?
Well, it, you know, if you recall p value
was just the probability of, in repeated
experiments getting an effect of at least
that size or more extreme.
Okay, and so all it tells you is Yes or
no whether it was significant according
to some threshold.
It doesn't tell you how you know
significant it was, or how how how
important it was.
Okay and that's what effect size tries to
capture.
So the effect size in a, a situation
where you're comparing the means of
experimental group, and control group.
And the effect size is the mean of the
experiment group, minus the mean of the
control group, divided by the standard
deviation.
And so this tells you not just
significant but how significant.
And this measure is used prolifically in
meta analysis to combine results from
multiple studies, and it's, useful to
help standardize across studies with
different parameters, different,
different sample sizes in particular for
example.
Okay, and so in general you gotta be
careful with this because averaging
results with different experiments can
produce nonsense.
If you sort of violate the assumptions
under which the experiments were
conducted but you know it can be done
okay.
So another caveat here is that there's
other definitions of effect size exist.
There's odds ratio and coorelation
coefficent but we're just going to talk
about, this one okay.
And so there's an argument here that the
effect size should also be reported along
with the p value even though this is very
rarely done.
And that argument is delivered by Robert
Coe in this paper presented at the Annual
Conference of the British Educational
Research Association in 2002, that's
cited here at the bottom of the slide.
Okay, so, a little more precisely, the
effect size is the standardized mean
difference, where the group one, you
know, mean of group one minus the group
of two divided by, we said, the standard

deviation.
And here I'm writing this as the standard
deviation sub pooled Okay.
And there's lots of ways to, compute a
value for this notion of pooled.
One simple way is to just take the
standard deviation of the control group.
And, you know, one by one arguing here,
hey, the variation's supposed to be the
same.
Because the sample you know your, the two
groups are drawn from the same
population.
Okay another way is to compute, you know,
the actual pool, pool standard deviation.
Which is an expression that looks like
this.
So, we're not going to be going into much
more detail than that.
But just to point out that the overall
notion is pretty simple and that its Not
unreasonable to use a straightforward
definition of, the standard deviation.
Okay, so what's a, what's a big effect
size?
Well because it is standardized, you
could, you could actually reason about
what might be big and what might be
small.
So this is again, one of these cases
where it sort of made up one the fly but
you can one heuristic by Jacob Cohen is a
small effect size is 0.2, a medium effect
size is 0.5 and a large effect size is
.8.
So for, you know.
Remember, this is dividing by the
standard deviation, so this is sort of
for every.
Bit of difference in mean.
How much variance are you accounting for?
Okay.
Finally, I'm not going into too much
detail about confidence interval but I
want to mention that the confidence
interval of effect size, just to give you
the intuition for this.
And advantage here is that these are
maybe easier to interpret in terms of
actual decision making.
So what does a 95% confidence interval of
the effect size mean?
Well, it means that we've repeated the
experiment 100 times, we expect that that
interval would include this particular
effect size measured in this experiment
95 out of 100 times.
A corelator of this is that if that
interval include, if the 95% confidence
interval includes 0.0, that's equivalent

to saying that the, that the result is


not significantly significant.
That means that, 90% of the time that
interval would include no effect
whatsoever.
Okay, and equivalently if it does, if
that interval does not include 0.0 Then
it is statistically significant.
Fine, so that's some more terminology
here.
The other notion we mentioned in talking
about publication bias was the idea of a
meta-analysis.
Okay, so this is looking back over
previous studies and combining their
results.
And so in 1978, Glass Sort of
statistically aggregated the findings of
375 psychotherapy studies to disprove a
claim that psycotherapy was useless.
And this was the coining of the term
meta-analysis.
And it's built on earlier ideas from
other statisticians including, Fisher,
who has this quote.
You know when a number of quite
independent tests of significance have
been made, it sometimes happens that
although few or none can be claimed
individually as significant, yet the
aggregate gives an impression that the
probabilities are on the whole lower than
would often have been obtained by chance.
Okay.
So this is individually a little weak but
we can aggregate them to get a more
powerful result.
Okay.
And so, reason I want to bring this up,
this idea of meta-analysis, it becomes
even more important in the context of
data science because you'll often be
working with data that you did not
yourself collect.
And so thinking of it as a meta-analysis
experiment is potentially useful.
And another point is you know that big
data may have become big, because it was
combined from multiple different sources.
And so, understanding when this is Okay
to do and when this isn't, is important.
So, how do we do this meta-analysis.
Well you, it's, it's pretty simple.
You just want to take a weighted average
of the independent studies.
Okay.
And you want to give, so how do you want
to define the weights?
Well, you can define it different ways,
but you want to give the weight to the

more precise studies when possible, the


ones that have more power.
So a very simple method is just weight by
the sample size.
All right, and so this expression here is
the number in your group, so the weight
for study i is the number of samples in
study i over the sum of all the samples,
the total sample size.
A more sophisticated way is to use the
inverse- Variance weight which is one
over the standard error squared.
And I'm not going to give you the formula
for standard error.
You can look it up.
But there's also, but there's lots of
variance here.
The main idea is to understand why it's
called inverse-variance, variance Is that
if the variants is very, very high on
a...a study, you want to give that lower
weight, right?
That means it wasn't a very precise
study, and this could be because the
sample size was low, or it could be for
other reasons.
Okay.
And the standard error is one common
method of...of associating with the
variants in this particular case.
But again, it's important to understand
the intuition, rather than just
memorizing the In the formulas.
Okay?
And then, again, as usual, there's a
caveat here.
This is, this is all for a fixed effect
model.
It assumes that every individual study is
measuring the same true effect.
And there is random effects models that
help account for the fact when they may
not be.
and we're not going to talk about random
effect models.
Okay.
So, finally, one more term that we talk-,
that was brought up in the context of
publication bias, was.
Well, sorry, the t-, I didn't bring up
the term.
But one more effect that you saw in the
plot was.
This funnel plot, alright if you
remember.
The funnel plot was high variance on one
side and it went to lower variance on the
other.
And so the general term for this that I
just wanted to introduce you to is called

heteroskedasticity, okay.
And so this is when the variance itself.
is not constant.
Alright, so here as an example the
variance is high.
The variance is high over here and low in
the middle, and high again over here.
Now, this is not necessarily a problem.
There are ways to correct for this.
But it's not necessarily a problem
because the estimates that you'll
generate are still unbiased.
Okay.
But it can increase overall error
estimates leading to a reduction
statistical power.
Right?
So, you end up with these really high
error numbers because all these guys
count against you.
Right?
Even though you're actually doing a
pretty good job in predictions.
So to say how this, how this plot was
generated.
This was again simulated data.
Where, you know, I intentionally sort of
varied the variants along the, along the
x axis.
Okay.
So we chose, chose some x values.
And then, and then sampled y values
according to some distribution that
varies in this way.
And here, I just took the exact same x
values, but repeated the sampling of the
y values many, many, many times.
So it gave me this clearer spread.
But you can see these, these solid bars
are because the same x values were used
across all the experiments.
Okay?
And the point here is that drawing the
regression line over and over again it
didn't change all that much.
Okay?
Right, we didn't get anything that looked
like this for example.
So again, if you, the problem here is
that you might increase the air and you
might lose statistical power because it
overlooks a real effect.

You might also like