You are on page 1of 9

General Terminology

Mean = Average (add up all the numbers/the number of numbers)


Median = Middle value in the list of numbers. List numbers in numerical order from smallest to
largest, and find middle value.
Mode = Value that occurs most often. If no number in the list is repeated, then there is no mode for
the list.

Normal Distribution
Normal distributions are important in statistics and are often used in the natural and social sciences
to represent real-valued random variables whose distributions are not known.
The normal distribution is sometimes informally called the bell curve. However, many other
distributions are bell-shaped (such as the Cauchy, Student's t, and logistic distributions).
The probability density of the normal distribution is:

The mean, median and mode of any normal distribution will always be equal.
A normal curve is symmetric about the curves centre always.
Half of the values that is 50% are less than the mean and half of the values are greater than the
mean.
Characteristics of Perfectly Normal Distribution
Symmetric about the mean.
Mean, median and mode are the same
Defined by mean () and standard deviation ()
Total area under the normal curve = 1 (like all valid PDFs)
68% of total area under the curve is within 1 standard deviation of the mean; 95% of total area is
within 2 standard deviations of the mean; 99.7% of total area is within 3 standard deviations of the
mean

Is My Data Normally Distributed?


Visualization of the data (histogram, quantile-quantile plot)
Example A:
Note: Q-Q plots take your sample data, sort it in ascending order, and then plot them versus
quantiles calculated from a theoretical distribution. The number of quantiles is selected to match
the size of your sample data. While Normal Q-Q Plots are the ones most often used in practice
due to so many statistical methods assuming normality, Q-Q Plots can actually be created for
any distribution.

Example B:
Skewness and kurtosis
Skewness and kurtosis are measures for how asymmetric (skewness) and pointed (kurtosis)
a Distribution is compared to a normal or standard normal distribution.
Lets again look at Samples A and B that were visualized above where A looked positively
skewed and B looked relatively normal in both the histogram and the Q-Q plot. If we have
installed the moments package, skewness for A and B are simply found by:

The values reveal that A is moderately (bordering on highly) skewed in the positive direction,
which is consistent with what we see in the histogram. Sample B, however, has a very low
skewness (0.125), and we would conclude based on a rule of thumb that B is relatively
symmetrically distributed again, consistent with what we saw in the histogram and Q-Q plot
for B.
Formal tests for normality
There are several statistical tests designed specifically to help make a decision about the
normality of your data.
Three of the most common and widely used formal tests for normality are: Anderson-Darling
test, ShapiroI-Wilk test, and the Lilliefors (KolmogorovI Smirnov) test.
In the three tests,the null and alternative hypotheses are the same (though the theory
behind each is very different): H 0: The random observations are sampled from a normally
distributed population (i.e., the data is normally distributed); H 1: The observations are from a
non-normally distributed population (i.e., the data is not normally distributed).

Percentiles
The kth percentile of a set of values divides them so that k % of the values lie below and (100 k)
% of the values lie above (25 th percentile=lower quartile; 50th percentile=median; 75th
percentile=upper quartile).

Quantiles
It is more common in statistics to refer to quantiles. These are the same as percentiles, but are
indexed by sample fractions rather than by sample percentages.
Example:
How is 33% Percentile Value Calculated?
Z-Score calculation

Example:
What mark would a student have to achieve to be in the top 10% of the class?
Top 10% of the class means to achieve the mark higher than 90% students. i. e. probability (P) is
0.9. For P=0.9, Z-Score=1.282.
Log-normal Distribution
In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution
of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-
normally distributed, then Y=ln(X) has a normal distribution.
Probability density function:

Mean and standard deviation calculations: (=mean for ln(variables); =Standard deviation for
In(variables))
2

=e + 0.5
2

e2 +
2

e
( 1)
()
=

Example: (ref:https://probabilityandstats.wordpress.com/2015/10/25/introducing-the-lognormal-
distribution/)

You might also like