Professional Documents
Culture Documents
2.
3.
4.
5.
6.
7.
8.
9.
10.
How does the usual 0.05 standard of statistical significance cause problems when
we try to find the genetic causes of disorders? You can use an example, but explain
the basic problem.
a. Since humans have so many genes, when we check for
differences between people with and without a certain
condition, like autism, the standards of statistical significance
arent quite narrow enough. Random variation, in the long run,
will give us a statistically significant association between the
presence or absence of a certain gene and autism. If we have our
computers check 10,000 genes, we will find 500 of them
associations with autism at the 0.05 level of significance.
11. What is the difference between statistical significance and meaningful results? Use
an example to show that you understand this distinction.
a. When researchers use the word significant, they are referring
to statistical significance which indicates a 1 in 20 chance of
being caused by random variation. However this word in
normal vocabulary means important, which is not what a
researcher means. With the two halves of the alphabet, we
might discover from a study of 5,000 people that there was a
statistically significant difference of 1/16 of an inch between the
two groups. However, thats not necessarily important.
12. Why is it important to have a random sample from the population we are studying?
a. If a sample is not a random sample it will be biased or
unrepresentative of the population as a whole.
13. What are the two conditions which a random sample must meet?
a. Each member of the population must have an equal chance of
being selected.
b. Any group of equal size must have equal chance of being
selected.
14.What is a convenience sample?
a. A convenience sample is a sample of members of a population
who are easily available- this sample is bad because it is often
made up of those like us, and lacks the degree of variation found
in the group.
15. What is a cluster sample? (This has nothing to do with random clusters on graphs.)
a. This is a randomly selected group, such as all those living on a
randomly chosen block- this sample is bad because it often
violates the second condition.
16.What is stratified sampling?
a. A technique designed to ensure that representatives of each
significant group in the population end up in the final sample.
Sometimes a sample end up over or underrepresenting a group
and math techniques are needed to ensure each subgroups are
properly weighted in the final group.
17. What are often the two most difficult steps of statistical research?
a. Getting access to the entire population
1980 to 2005, and this was not a large enough data set. People
were led to believe that prices would go up and morgages would
be a secure investment, when in reality the opposite was true
for both.
23. Why do large numbers of data sets (extending from height, to IQ, to parking
patterns, to popcorn popping) fit into bell curves?
a. The bell curve is common because if we are measuring
something which has many causes operating independently of
one another, we get a bell curve. This tends to happen when
there are a high number of independent causes, regardless of
what thing were studying.
24. What numerical coefficients of correlation are considered to indicate weak,
moderate, and strong correlation?
a. Weak= between -0.3 and +0.3
b. Moderate= between -+0.3 and -+0.5
c. Strong= greater than -+0.5
d. We square the numbers to emphasize the differences.
25. What are the two forms of the gamblers fallacy and what basic principle do those
forms of the gamblers fallacy ignore?
a. One form= the belief that a coin which has come up heads 10
times in a row has a better chance than even of coming up tails
the next time because it has to follow the laws of chance and
produce even results.
b. Other form=the belief that a coin that has come up heads 10
times in a row has some special property which gives it a better
than even chance of coming up heads the next time around (e.g.
luck)
c. The gamblers fallacy ignores the basic principle that these
objects (coins, dice) dont have memories, and the results of
one trial are not influenced by the previous trial.
26. What fallacy is being named- correlation is not causation?
a. Post hoc ergo propter hoc.
27. At a critical point in calculating the coefficient of correlation we figure out the
standard deviations of every piece of data in terms of the factors we are testing for
correlation. Explain why we must do this?
a. Since we are dealing with two different kinds of data and need
to compare them, when we convert both to standard deviation
we can compare the variations in SD of one to the variations of
the other.