Chi Square Test For Cross Tab - Session 9 & 10

Cross‐tabulation and Chi‐square test
Business Research Methodology

Business Research Methodology
Dr. Gunjan
Dr. Gunjan Malhotra
Assistant Professor
mailforgunjan@gmail com
mailforgunjan@gmail.com
Simple Tabulation for Ranking Type
Q
Questions
ti – Bivariate
Bi i t variables
i bl
• Suppose ‐ ordinal scale questions
• Q. Rank the 5 brands of refrigerators shown below on a

scale of 1 to 5 (1=Best and 5=Worst), according to your
opinion.
opinion
BRAND RANK
Whirlpool ___
Kelvinator ___
Godrej ___
Samsung ___
Videocon ___
Output table formulation
Output table formulation
Table 1
BRAND
BRAND RANK 1 RANK2 RANK3 RANK4 RANK5
RANK 1 RANK2 RANK3 RANK4 RANK5
Whirlpool x x x x x
Kelvinator x
Kelvinator x x x x x x
Godrej x x x x x
Samsung x
Samsung x x x x x x
Videocon x x x x x
Univariate tables
Univariate tables
• For constructing univariate tables ‐ take up one column at a
time and do separate frequency tables or charts. E.g.
BRAND No. of People
p who Ranked it No.1
Whirlpool 90
Kelvinator 60
Godrej 70
Samsungg 32
Videocon 45
TOTAL 297
• We can calculate %age on a total for each brand. E.g. 90/297
works out to .303
303 or 30.3%
30 3% who ranked Whirlpool as no.1.
no 1 and
so on.
Simple Tabulation for Rating Type Questions
Q. Rate the following attributes of LIRIL soap on a scale of 1 to 5 (1=
Very Unsatisfactory to 5=Very
Very Unsatisfactory to 5 Very Satisfactory).
Satisfactory).
Lather __________________________________
1
1 2 3 4 5
2 3 4 5
Fragrance __________________________________
1 2 3 4 5
• For each attribute, the number of people who rated it as 1, 2, 3, 4
or 5 can be tabulated in separate tables like:
RATING Lather
1 30
2 25
3 50
4 76
5 22
TOTAL 203
Alternatively, we can tabulate ratings for all attributes as follows ‐
RATING LATHER FRAGRANCE ATR.3 ATR.4 ATR.5
1
1 x x
x x x x x
2 x x x x x
3 x x x x x
4 x x x x x
5 x x x x x
Second Stage Analysis – Cross Tabulation
• A cross‐tabulation can be done by combining any two of the
questions and tabulating the data together. This is a 2
2‐variable
variable
cross tabulation.
• E.g. a cross‐tabulation
b l b
between Brandd Preference
f f brands
for b d off tea
and Region to which Respondent belongs.
BRAND
RAN Regionwise Buyers
uyers (No.)
North South East West Total
Brooke Bond 25 (50%) 20 20 15(30%) 80(40%)
Lipton 10(20%) 15 20 5(10%) 50(25%)
Tata 15(30%) 15 10 30(60%) 70(35%)
Total 50(100%) 50 50 50(100%) 200(100%)
– An extension of this could be adding percentages.

Calculating Percentages in a Cross Tabulation
•In the above example, we can compute percentages
• row‐wise,
row wise,
• column‐wise or
•on the total sample of 200.
•The general rule is to calculate percentages across the dependent

variable ((across Brand categories
g )).
• Assume that brand preference depends on the region to which

respondents belong. i.e. “Brand” ‐ dependent variable, and
“Region” ‐ independent variable.
• The interpretation is – “Out of 50 respondents from the Northern

Region, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy Tata
Tea”.
Chi‐square test
q
1. Univariate ‐ Chi‐square test for goodness of fit
q g
• Test for significance in the analysis of frequency distributions.
Test for significance in the analysis of frequency distributions.
• Each question represents a variable under study.
• Compare observed frequencies with expected frequencies
Compare observed frequencies with expected frequencies
2 Bivariate ‐ Chi
2. Bivariate Chi‐square
square test for relatedness or independence
test for relatedness or independence
– Chi‐Square
Chi‐Square allows testing for significant differences between
allows testing for significant differences between
groups.
[Two different questions in a questionnaire may represent two variables.]
q q y p
Chi‐square
Chi square test for Goodness of Fit
test for Goodness of Fit
• is used to analyze probabilities of multinomial
y p
distribution trials along a single dimension.
• The Chi
The Chi‐square
square test for goodness
test for goodness‐of‐fit
of fit test compares
test compares
the expected (theoretical) frequencies of categories
from a population distribution to the observed
from a population distribution to the observed
(actual) frequencies from a distribution to determine
whether there is a difference between what was
whether there is a difference between what was
expected and what was observed .
( O i − E i )²
x² = ∑ Ei
Example 1: Chi Square test for goodness of fit
‐ Equal expected frequency
• The
The table outlines the attitudes of 60 people towards US
table outlines the attitudes of 60 people towards US
military bases in Australia. A chi‐square test for
goodness of fit will allow us to determine if differences
goodness of fit will allow us to determine if differences
in frequency exist across response categories.
• Ho:
H There is no significant difference across frequency of
Th i i ifi t diff f f
attitudes towards military base in Australia.
Attitude towards
Attitude towards US Military
US Military Frequency of Response
Frequency of Response
bases in Australia (Observed frequencies)
In favour 8
Against 20
Undecided 32
Output 1: Chi‐Square test – equal expected
frequencies
Interpretation 1: Chi‐square test – equal
expected frequencies
df i
• The output shows that the chi‐square value is
significant (p < .05). (Ho: rejected).
g (p ) ( j )
• Therefore
Therefore, it can be concluded that
it can be concluded that there are
there are
significant differences in the frequency of attitudes
towards military base in Australia
towards military base in Australia.
• The results show that people are largely undecided
on this issue, chi‐square (2,N=60)=14.4, p < .05.
Example 2: Chi‐square test for goodness of fit
– Unequal expected frequencies
• Sometimes the expected frequencies are not
y g
evenly balanced across categories.
• E.g. the expected frequency for each category
was 15 15 and 30
was 15, 15 and 30.
Attitude towards Frequency of Expected
US Military bases
US Military bases Response Frequency of
Frequency of
in Australia (Observed responses
frequencies)
I f
In favour 8 15
Against 20 15
Undecided 32 30
Output 2: Chi‐square test – unequal
Interpretation 2: Chi‐square test – unequal
• The
The output shows that the chi‐square value is
output shows that the chi square value is
not significant (p = .079 > .05). (Ho = accepted)
• Therefore, it can be concluded that
, there is no
significant differences in the frequency of
• Th
The results show that people are largely
lt h th t l l l
undecided on this issue, chi‐square (2,N=60)=
5 067
5.067, p > .05.
05
Chi square test of Independence
Chi‐square test of Independence
• Qualitative Variables ‐
Qualitative Variables Nominal data
• used
used to test if the two variables are statistically
to test if the two variables are statistically
associated with each other significantly.
• Used to analyze the frequencies of two variables with
multiple categories to determine whether the two
multiple categories to determine whether the two
variables are independent.
• It is possible to do a cross‐tabulation (and a chi‐squared
test – with given table value, df, confidence level) for any
two nominal variables in the survey.
Example 1: Chi square test for cross tab
Example 1: Chi‐square test for cross‐tab
• Let us assume that we have conducted consumer
survey for a brand of detergent. One of the question
dealt with income category of the respondent.
Another asked the respondent to rate his purchase
intentions.
• Ho: There is no significant association between
Respondent Income and Purchase Intention
p
S. INCOME CODE INTENT INTCODE
No
No.
1 Less Than 5000 1 NONE 1
2 Less Than 5000 1 LOW 2
3 Less Than 5000 1 LOW 2
4 Less Than 5000 1 NONE 1
5 Less Than 5000 1 HIGH 3
6 5001-10000 2 LOW 2
7 5001-10000 2 HIGH 3
8 5001-10000 2 VERY 4
HIGH
9 5001-10000 2 HIGH 3
10 5001-10000 2 LOW 2
11 10001-20000 3 HIGH 3
12 10001-20000 3 VERY 4
HIGH
13 10001-20000 3 CERTAIN 5
14 10001-20000 3 HIGH 3
15 10001-20000 3 VERY 4
HIGH
16 Above 20000 4 HIGH 3
17 Above 20000 4 CERTAIN 5
18 Above 20000 4 VERY 4
HIGH
19 Above
Abo e 20000 4 CERTAIN 5
20 Above 20000 4 CERTAIN 5
Both variables are coded.
Income codes and their equivalent incomes are –
Code Income in Rs. per Month

1 Less than 5000
2 5001 to 10,000
3 10,001 to 20,000
4 Above 20,000
20 000
Purchase Intention codes are as follows –
Code Explanation (Value Labels for the Variable)

1 None – No intention to buy
2 Low – Low intention to buy
3 High – High intention
4 Very High – Very high intention
5 Certain – Certain to buy
INCOME Per Month by PURCHASE INTENTION
Income per Month in RS.---Æ

Purchase Code Less 5000- 10000- Above TOTAL
Intent than 10000 20000 20000
5000
None 1 2 0 0 0 2
Low 2 2 2 0 0 4
High 3 1 2 2 1 6
V. High 4 0 1 2 1 4
Certain 5 0 0 1 3 4
TOTAL 5 5 5 5 20
Cross‐tabulation of code (column‐income per
month) and Intcode (row – purchase intent).
Result 1: Chi Square test for cross tab
Result 1: Chi‐Square test for cross‐tab
Interpretation 1: Chi‐square test for cross‐tab
• The cross‐tabulation shows the number of respondents
falling into each cell (a cell is the combination of one
INCOME category with one PURCHASE INTENTION category).
• The first line of the chi‐squared test reads a significance

level of 0.097.
0 097 This means the chi
chi‐squared
squared test is showing a
significant association between these two variables at a 90
percent confidence level. (equivalent to 0.10 significance
level).
• Thus, we conclude that at 90 percent confidence level,

PURCHASE INTENTION and INCOME are associated
significantly with each other.
other This may lead us to conclude
that the price of the detergent is important in its purchase.
Example 2: Chi square test for Cross tabs
Example 2: Chi square test for Cross‐tabs
• Suppose
Suppose the researcher finds the association
the researcher finds the association
between educational background (independent
variable) of PGDM students and their performance
i bl ) f PGDM t d t d th i f
in terms of grade (dependent variable) secured.
• A bivariate cross‐tabulation has been done by
combining the above two variables and tabulating
g g
the data together.
• Here assumption is made by our group based on
Here assumption is made by our group based on
information extracted from the database
(performance) of B schools
(performance) of B‐schools.
• We want to test at 90% and 95% confidence
g
level, what is the level of significance of
association between EDUCATIONAL
BACKGROUND of PGDM students and their
BACKGROUND of PGDM students and their
PERFORMANCE in terms of GRADE.
• Further, the variables are coded.
• Educational background and their eqvivalent codes are
Educational background Code
Educational background
B.Com 1
BE
B.E. 2
B.Sc. 3
BBA
B.B.A. 4
B.A. 5
• Grade codes are as follows:
Grade Obtainend Grade Code
A 1
B 2
C 3
• These two variables were cross‐tabulated for
y
twenty‐five observations.
• A cross‐tabulation with a Chi‐squared test was
performed using SPSS package
performed using SPSS package.
Input data table
S.No. Roll No.
ll Background
k d Code
d Graded Grdcode
d d
1 1 B.Com 1 B 2
2 2 B.Com 1 C 3
3 3 B.Com 1 A 1
4 4 B.Com 1 C 3
5 5 B.Com 1 B 2
6 6 B.E. 2 A 1
7 7 B.E.
B.E. 2 A
A 1
8 8 B.E. 2 A 1
9 9 B.E. 2 B 2
10 10 B.E. 2 A 1
11 11 B Sc
B.Sc. 3 B
B 2
12 12 B.Sc. 3 B 2
13 13 B.Sc. 3 C 3
14 14 B.Sc. 3 C 3
15 15 B.Sc. 3 C 3
16 16 BBA 4 A 1
17 17 BBA 4 B 2
18 18 BBA 4 C 3
19 19 BBA 4 C 3
20 20 BBA 4 B 2
21 21 B.A. 5 C 3
22 22 B.A.
B.A. 5 C
C 3
23 23 B.A. 5 C 3
24 24 B.A. 5 C 3
25 25 B.A. 5 B 2
Output table 2: Grades Vs Entry Qualification
Output table 2: Grades Vs Entry Qualification
Result 2: Chi Square test for cross tab
Result 2: Chi‐Square test for cross‐tab
Interpretation 2: Chi‐Square test for cross‐tab
• The Chi‐square test revealed the significant association
between the educational background of the students
between the educational background of the students
and their performance in terms of grade.
• The significance level of 0.089 (Pearson’s) has been
achieved This means the Chi‐square
achieved. This means the Chi square test is showing a
test is showing a
significant association between the above two variables at
91.1% confidence level (100 – 8.9).
• Thus we conclude that at 90% confidence level, ,
educational background of PGDM students and their
performance in terms of grade are associated
significantly with each other, whereas this is not
significant at the 95% confidence level.
• From the obtained contingency coefficient (C) of 0.596, it
g y ( ) ,
can be inferred that the association between the
dependent and independent variable is significant, as the
value 0.596 is closer to 1 that to 0.
• From the Lambda asymmetric value (with grade code
dependent) of 0.286, we conclude that there is a moderate
level of association between the above two variables. This
lambda value tells us that there is a 28.6% reduction in
predicting the grade of student when we know his
educational background.
• This leads us to conclude that educational background
plays a vital role in the performance of the students of
PGDM course.
Example 3: Chi‐square test for cross tab ‐ 3
• A manufacturer was interested in assesing how children ages four, five
and six play with one of the manufacturer’s toys. Each child was asked
1
15 questions. Following the child’s completed interview, the parent was
i ll i h hild’ l di i h
asked the same 15 questions to validate the child’s answers. The
following table lists the number of responses to selected items from
g p
the survey. One hundred interviewers were conducted with both the
parent and the child. Notice that item response rates varied from
question to question. For each question, state at least one method that
ti t ti F h ti t t tl t th d th t
could be used to attempt to correct for this item nonresponse bias.
Question # Children # Parents
Responding Responding
Age of child 95 100
Location of Play 80 85
How much the child
How much the child 30 50
liked the toy
Result 3: Chi square test for cross tab
Result 3: Chi‐square test for cross‐tab
• Thank you…

Chi Square Test For Cross Tab - Session 9 & 10

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chi Square Test For Cross Tab - Session 9 & 10

Uploaded by

Copyright:

Available Formats

Cross‐tabulation and Chi‐square test

Business Research Methodology

• Q. Rank the 5 brands of refrigerators shown below on a

– An extension of this could be adding percentages.

•The general rule is to calculate percentages across the dependent

• Assume that brand preference depends on the region to which

• The interpretation is – “Out of 50 respondents from the Northern

Income codes and their equivalent incomes are –

Code Income in Rs. per Month

Purchase Intention codes are as follows –

Code Explanation (Value Labels for the Variable)

Income per Month in RS.---Æ

• The first line of the chi‐squared test reads a significance

• Thus, we conclude that at 90 percent confidence level,

You might also like