You are on page 1of 8

West Bengal university of Animal

and Fishery Sciences


Faculty of Fishery Sciences

An assignment ON :
BIVARIATE DATA ,SCATTER DIAGRAM, SIMPLE
LINEAR CORRELATION, MEASURE AND
PROPERTIES

Course no: FES-225

SUBMITTED BY

SUBMITTED TO

DEBIPRASAD KANTAL
FS-10/14
B.F.SC. 2ND YR. 2ND SEM

DR. S. SAHU
DEPT. OF FES

BIVARIATE DATA ,SCATTER DIAGRAM, SIMPLE


LINEAR CORRELATION, MEASURE AND
PROPERTIES
Introduction
When one measurement is made on each observation, univariate analysis is applied.
If more than one measurement is made on each observation, multivariate analysis is
applied. In this section, we focus on bivariate analysis, where exactly two
measurements are made on each observation. The two measurements will be called
X and Y . Since X and Y are obtained for each observation, the data for one
observation is the pair (X, Y ).
Definition of bivariate data
Bivariate data deals with two variables that can change and are compared to find
relationships.
If one variable is influencing another variable, then you will have bivariate data that
has an independent and a dependent variable. This is because one variable depends
on the other for change.
An independent variable is a condition or piece of data in an experiment that can be
controlled or changed.
A dependent variable is a condition or piece of data in an experiment that is
controlled or influenced by an outside factor, most often the independent variable.
This is very different from univariate data, which is one variable in a data set that is
analyzed to describe a scenario or experiment.
For example, if Mindy was studying for a college test and tracks her study time and
her test scores, she might see that the more time she spends studying, the better her
test scores become. Therefore, in this scenario, Mindy's test scores are the
dependent variable because they depend on the number of hours she studies.
Likewise, the number of study hours would be considered the independent variable.
For that reason, we can see the relationship in this bivariate data set.
Scatter diagram
Scatter diagrams are used to demonstrate correlation between two quantitative
variables.
A scatter diagram is a graphic representation of the relationship between two variables.
Scatter diagrams help teams identify and understand cause-effect relationship.
Often, this correlation is linear.
This means that a straight line model can be developed
If X and V denote the two variables under study, the scatter diagram is obtained by
plotting the. pairs of values of X and Y taking variables on Cartesian Co. ordinates. This
diagram gives an indication of whether the variables are related and if so, the possible
type of line or estimating equation which can describe the relationship.
If the scatter of points indicates that a line can better fit the data, then the relationship
between the variables is said to be linear. Scatter diagrams in Fig. (1) and Fig. (1) are
examples for linear relationship between X and Y. In Fig.1, X tends to increase as Y
increases; the relationship between the variables is said to be direct and linear. In Fig.2,
X decreases as Y increases, the relationship between the variables is said to be inverse
and linear.

If the scatter f points indicates that a curve car better fit the data, then the relationship
between the variables is said to be non-linear or curvilinear. Some curvilinear
relationships are shown in Fig.3 and Fig.4

If the scatter of points is as shown in Fig.5, then there is little or no relationship


between the variables.

Directions
1. Gather the data; determine the high and low values for each factor.
2. Decide which factor will be plotted on which axis.
a. Theorizing a cause and effect relationship, put the suspected cause on the
horizontal
axis, and the suspected effect on the vertical axis.
3. Draw and label the axes clearly.
a. Make the axes roughly the same length, creating a square plotting area.
b. Label each axis with increasing values from left to right, and from bottom to
top.

c. Label each axis to match the full range of values for that factor. In other
words, make the lowest numerical label slightly less than the lowest data
value, and the highest label slightly greater than the highest value. The data
should fill the whole plotting area.
4. Plot the paired data.
a. Use concentric circles (or offset dots) to indicate identical paired-data
points.
b. Differentiate distinct strata by using filled vs. unfilled symbols, or different
colors.
5. Title the chart and provide necessary annotations to describe what it shows.
6. Identify and classify the pattern of correlation shown by the plotting of the data.
7. Identify what you have learned; decide on your next steps.
Introduction to Correlation Analysis
The statistical methods discussed so far are primarily intended to describe a single
variable i.e., univariate populations. In this chapter the techniques that are useful in
studying the relationships that exist when the data on two or more variables is
available, are discussed.
If on the same individual, data on two variables say X and Y are listed, it is called a
bivariate population. In this bivariate population, for every value of X, there is a
corresponding value of Y. By treating these variables X and Y separately, measures
of central tendency, dispersion etc., can be worked out. In addition to these
measures it may be of interest to study the strength of relationship existing between
the variables and the nature of their relationship. The study of the former aspect is
referred to as correlation and the latter as regression analysis.
Correlation Classifications
Correlation can be classified into three basic categories:
Linear- Variables that are correlated through a linear relationship can display either
positive
or negative correlation
Non-linear- Two variables may be correlated but not through a linear model. This
type of model is called non-linear. The model might be one of a curve.
No correlation- Two quantitative variables may not be correlated at all

2:non-linear correlation

1:linear correlation

3: no correlation
Definition of simple linear correlation

Simple linear correlation is a measure of the degree to which two variables vary
together, or a measure of the intensity of the association between two variables.

The parameter being measure is (rho) and is estimated by the statistic r, the
correlation coefficient.
r can range from -1 to 1, and is independent of units of measurement.
The strength of the association increases as r approaches the absolute value of 1.0
A value of 0 indicates there is no association between the two variables tested
Strength of Correlation

The strength of a linear relationship is measured by the correlation coefficient


Correlation may be strong, moderate, or weak. You can estimate the strength be
observing the variation of the points around the line
Large variation is weak correlation
When the data is distributed quite close to the line the correlation is said to be strong
The correlation type is independent of the strength.
If | r | > 0.90 implies a strong linear association
For 0.65 < | r | < 0.90 implies a moderate linear association
For | r | < 0.65 this is a weak linear association
Measure of Simple Correlation
It is a statistical tool to study the degree of association or relationship existing between two
variables, when the relationship is linear or approximately linear. The degree of relationship
is quantified by a coefficient called the Karl Pearsons product moment correlation
coefficient or simply the correlation coefficient. It is denoted by r. The working formula for r
is given by,

In the above expression, X and Y denote the measurements on variables X and Y and n is
the number of pairs of observations i.e. the sample size.

Examples:2

Length and weight of juveniles of fish, Income and expenditure are some of the
examples for positive correlation
Rate of infection and yield, demand and supply are the some of the examples for
negative correlation
Growth & demand for fish, size of the shoe and number of intelligent boys / girls are
some of the example for no correlation.

When r = +1 there exists a strict linear relationship and the correlation between the variables
is said to be perfectly positive.
When r = -1 the relationship is linear and correlation between the variables is perfectly
negative.
The correlation coefficient equal to one (either positive or negative) indicates perfect
6

correlation between the variables. Perfect correlation rarely occurs in biological data though
values as high as 0.99 have been obtained in some cases. The closer the value of the
coefficient to one, the greater is the intensity or the degree of association between the
variables. Values of r near zero may arise when there is no relationship or when there is a
real relationship but it is not linear.
Properties of the correlation coefficient

It is a pure number without units or dimensions.


It lies between -1 and 1 i.e., -1 r 1.
The correlation coefficient is independent of the origin and the scale of measurement
of the variables.

The variables are said to be positively correlated if r is positive and negatively correlated if
r is negative. Positive correlation indicates that two variables are moving in the same
direction, i.e., as one increases the other increases or if one decreases the other decreases.
Negative correlation indicates that the two variables are moving in opposite direction i.e., as
one increases the other decreases

EXERCISE
1. What is data ?
2. What is bivariate data?
3. define univariate data ?
4. What are the types of variable ?
5. what is independent variable and dependent variable ?
6. What is a scattered diagram?
7. what are the basic categories of correlation ?
8. define linear correlation?
9. define non-linear correlation?
10. When does no correlation occur?
11. how to measure simple linear correlation?
12. What is the range of r for moderate correlation?
13. What do you mean by strength of correlation?
14. Classify correlation ?
15. What are the properties of the correlation coefficient ?
16. Give an example for positive correlation ?
17. Give an example for no correlation ?
18. Give an example for negative correlation ?
19. What are the type of strength of correlation ?
20. what is the range of r for strong correlation?

Q:2 WRIGHT TRUE / FALSE FOR THE FOLLOWING


SENTENCES
1. Values of r near zero may arise when there is no relationship or when there is a real
relationship but it is not linear.(true / false)
2. Negative correlation indicates that the two variables are moving in opposite direction
(true/false)
3. If more than one measurement is made on each observation, univariate analysis is
applied(true/false)
7

4. The variables are said to be positively correlated if r is negative and negatively


correlated if r is positive (true/false)
5. If | r | > 0.90 implies a moderately strong linear association(true/false)
6. For | r | > 0.65 this is a weak linear association(true/false)
7. When one measurement is made on each observation, univariate analysis is
applied.(true/false)
8. A dependent variable is a condition or piece of data in an experiment that is not
controlled or influenced by an outside factor, most often the independent variable.
(true/false)
9. Scatter diagrams are used to demonstrate correlation between two quantitative
variables.(true/false)
10. Linear- Variables that are correlated through a linear relationship can display either
positive or negative correlation(true/false)
11. Two variables may be correlated but through a linear model. This type of model is
called non-linear. (true/false)
12. Non-linear- model might be one of a curve.(true/false)
13. No correlation- Two quantitative variables may not be correlated at all.(true/false)
14. A value of 0 indicates there is no association between the two variables
tested.(true/false)
15. r can range from -1 to 1, and is independent of units of measurement. (true/false)
16. A value of 0 indicates there is no association between the two variables
tested(true/false)

Q:2 FILL IN THE BLANKS

1. When the data is distributed quite close to the line the correlation is said to be ------------------(ans-strong)
2. correlation coefficient is denoted by--------------(ans-r)
3. The strength of a linear relationship is measured by---------(ans-correlation
coefficient)
4. Range of correlation coefficient is -------------------(-1 r 1.)
5. ---------------------is the range for moderate linear association .(0.65 < | r | < 0.90)
6. When one measurement is made on each observation-------------- analysis is applied.(
univariate)
7. ----------------------------variable is a condition or piece of data in an experiment that can
be controlled or changed.( independent)
8. Two quantitative variables may not be correlated at all is called--------------- (no
correlation)

Q:3 MATCH THE FOLLOWING


column-I
1 one measurement is made on each
observation
2 | r | < 0.65
3 Karl Pearsons product moment
correlation coefficient
4 r = +1
5 data is distributed quite close to the line
the
6 r can range
ANS-(1-F),(2-E),(3-D),(4-C),(5-B),(6-A)

Column-II
A

-1 to 1

B strong correlation
C a strict linear relationship
D r
E weak linear association
F

univariate analysis

You might also like