You are on page 1of 23

Correlation and Regression Analyses

Measurement of more than one variable on


each experimental or observational unit.

Examples:
Age and weight of an animal

Height and stem diameter of a tree

Concentration of potassium in a pot of soil and


the weight of plants growing in the pot.
Univariate versus Bivariate analysis
We may study the mean and variance of one
or both variables individually (univariate),

but we will also be interested in studying any


relationship between them (bivariate)

What about Multivariate?

More than two variables


Through a study of relationships we can
quantify:
a) the changes in one variable that are
caused by changes in another one,

b) predict the level of one variable from a


measurement of another and

c) quantify the degree of association


between variables.
The calculations involved in each
are very similar but there are
functional and conceptual
differences between them
Covariance

A series of calculations that is common


to both techniques involves the
calculation of “covariance”

Covariance is a measure of the way that


two characteristics vary together.

or joint variation.
Example of covariance analysis
Age (Days) 0 10 20 30 40 50 60

Weight (kg) 32 38 40 57 62 64 78

Age and weight of a calf from birth to 60 days


Scatter plot or Scattergram
A study was made to determine if changes in the
concentration of inorganic phosphorous (P) in soil caused
changes in the Phosphorous contained in plants.

The following data are from 9 pots in a greenhouse experiment.


Soil P (ppm) Plant P (ppm)
1 64
4 71
5 54
9 81
11 76
13 93
23 77
23 95
28 109
Regression is used to quantify the
relationship between variables when one is
dependent on the other.

In our example, the weight of the calf


depends upon its age, so we call weight the
dependent variable and age the
independent variable.
We can fit a regression line to the diagram
which represents the average trend.

Based on calculations and the regression line,


we can predict the value of Y for a given X,
within the range of X values studied.
Regression line
• The regression equation given as
Y = a + bX
can be explained as follows:
a is the intercept, where the regression
line cuts the Y axis or the value of Y when
X is zero.
b is the slope of the line or the change in
Y for a unit change in X
or the
regression coefficient
CORRELATION ANALYSIS

Correlations are used to measure the


relationship between variables that are each
normally distributed and no dependence of
one variable on the other is known or
assumed.

Remember this contrasts with regression


where one variable is dependent on the other.
Correlation Coefficient

The correlation coefficient (r) measures the


degree of association between the two
variables.

r ranges in value from -1 through zero


to +1and has no units of measurement.

b values have units


The correlation coefficient r can be
calculated from the sums of squares and
products or from the variances and
covariance using the following formulae:

r
 (X - X) (Y - Y) or r 
Cov xy

 (X  X)  (Y  Y)
2 2 2
s s
x
2
y
Plant height (cm) Leaf length (cm)
54 17
44 18
70 20
61 19
78 22
33 14
48 16
80 21
75 23
52 20
31 17
71 21
69 17
55 16
50 16
Types of correlation

Pearson’s Product-moment correlation

Spearman’s rank correlation

Correlation for dichotomous data


Tests of significance in correlation analysis
• A: Test that correlation coefficient
equals zero

• B: Comparing observed correlation with a


reported value

• C: Comparing two estimates of r (r1 and r2)

• D: Determining common correlation


coefficient

You might also like