You are on page 1of 39

Cross‐tabulation and Chi‐square test

Business Research Methodology


Business Research Methodology

Dr. Gunjan
Dr. Gunjan Malhotra
Assistant Professor
mailforgunjan@gmail com
mailforgunjan@gmail.com
Simple Tabulation for Ranking Type
Q
Questions
ti – Bivariate
Bi i t variables
i bl
• Suppose ‐ ordinal scale questions

• Q. Rank the 5 brands of refrigerators shown below on a


scale of 1 to 5 (1=Best and 5=Worst), according to your
opinion.
opinion

BRAND RANK
Whirlpool ___
Kelvinator ___
Godrej ___
Samsung ___
Videocon ___
Output table formulation
Output table formulation

Table 1
BRAND
BRAND       RANK 1   RANK2   RANK3  RANK4  RANK5
RANK 1 RANK2 RANK3 RANK4 RANK5
Whirlpool     x                  x x x x
Kelvinator x
Kelvinator    x                  x x x x x
Godrej          x                  x x x x
Samsung x
Samsung      x                  x x x x x
Videocon     x                  x x x x
Univariate tables
Univariate tables
• For constructing univariate tables ‐ take up one column at a
time and do separate frequency tables or charts. E.g.
BRAND No. of People
p who Ranked it No.1
Whirlpool 90
Kelvinator 60
Godrej 70
Samsungg 32
Videocon 45
TOTAL 297
• We can calculate %age on a total for each brand. E.g. 90/297
works out to .303
303 or 30.3%
30 3% who ranked Whirlpool as no.1.
no 1 and
so on.
Simple Tabulation for Rating Type Questions 
Q. Rate the following attributes of LIRIL soap on a scale of 1 to 5 (1= 
Very Unsatisfactory to 5=Very
Very Unsatisfactory to 5 Very Satisfactory).
Satisfactory).
Lather          __________________________________
1
1              2             3             4                 5       
2 3 4 5
Fragrance      __________________________________
1              2             3             4                 5

• For each attribute, the number of people who rated it as 1, 2, 3, 4 
or 5 can be tabulated in separate tables like:
RATING                 Lather
1 30
2 25
3 50
4 76
5 22
TOTAL             203
Alternatively, we can tabulate ratings for all attributes as follows ‐

RATING    LATHER      FRAGRANCE    ATR.3    ATR.4    ATR.5
1
1            x                       x
x x x x x
2            x                       x x x x
3            x                       x x x x
4            x                       x x x x
5            x                       x x x x
Second Stage Analysis – Cross Tabulation
• A cross‐tabulation can be done by combining any two of the
questions and tabulating the data together. This is a 2
2‐variable
variable
cross tabulation.

• E.g. a cross‐tabulation
b l b
between Brandd Preference
f f brands
for b d off tea
and Region to which Respondent belongs.
BRAND
RAN Regionwise Buyers
uyers (No.)
North South East West Total
Brooke Bond 25 (50%) 20 20 15(30%) 80(40%)
Lipton 10(20%) 15 20 5(10%) 50(25%)
Tata 15(30%) 15 10 30(60%) 70(35%)
Total 50(100%) 50 50 50(100%) 200(100%)

– An extension of this could be adding percentages.


Calculating Percentages in a Cross Tabulation
•In the above example, we can compute percentages
• row‐wise,
row wise,
• column‐wise or
•on the total sample of 200.

•The general rule is to calculate percentages across the dependent


variable ((across Brand categories
g )).

• Assume that brand preference depends on the region to which


respondents belong. i.e. “Brand” ‐ dependent variable, and
“Region” ‐ independent variable.

• The interpretation is – “Out of 50 respondents from the Northern


Region, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy Tata
Tea”.
Chi‐square test
q
1.   Univariate  ‐ Chi‐square test for goodness of fit
q g

• Test for significance in the analysis of frequency distributions.
Test for significance in the analysis of frequency distributions.
• Each question represents a variable under study.
• Compare observed frequencies with expected frequencies
Compare observed frequencies with expected frequencies

2 Bivariate ‐ Chi
2.   Bivariate Chi‐square
square test for relatedness or independence
test for relatedness or independence

– Chi‐Square
Chi‐Square allows testing for significant differences between 
allows testing for significant differences between
groups.
[Two different questions in a questionnaire may represent two variables.]
q q y p
Chi‐square
Chi square test for Goodness of Fit
test for Goodness of Fit
• is used to analyze probabilities of multinomial 
y p
distribution trials along a single dimension.
• The Chi
The Chi‐square
square test for goodness
test for goodness‐of‐fit
of fit test compares 
test compares
the expected (theoretical) frequencies of categories 
from a population distribution to the observed
from a population distribution to the observed 
(actual) frequencies from a distribution to determine 
whether there is a difference between what was
whether there is a difference between what was 
expected and what was observed .

( O i − E i )²
x² = ∑ Ei
Example 1: Chi Square test for goodness of fit 
‐ Equal expected frequency
• The
The table outlines the attitudes of 60 people towards US 
table outlines the attitudes of 60 people towards US
military bases in Australia. A chi‐square test for 
goodness of fit will allow us to determine if differences
goodness of fit will allow us to determine if differences 
in frequency exist across response categories.
• Ho:
H There is no significant difference across frequency of 
Th i i ifi t diff f f
attitudes towards military base in Australia.
Attitude towards
Attitude towards US Military 
US Military Frequency of Response
Frequency of Response
bases in Australia (Observed frequencies)

In favour 8
Against 20
Undecided 32
Output 1: Chi‐Square test – equal expected 
frequencies
Interpretation 1: Chi‐square test – equal 
expected frequencies
df i

• The output shows that the chi‐square value  is 
significant (p < .05). (Ho: rejected).
g (p ) ( j )

• Therefore
Therefore, it can be concluded that
it can be concluded that there are 
there are
significant differences in the frequency of attitudes 
towards military base in Australia
towards military base in Australia.

• The results show that people are largely undecided 
on this issue, chi‐square (2,N=60)=14.4, p < .05.
Example 2: Chi‐square test for goodness of fit 
– Unequal expected frequencies
• Sometimes the expected frequencies are not 
y g
evenly balanced across categories.
• E.g.  the expected frequency for each category 
was 15 15 and 30
was 15, 15 and 30.
Attitude towards Frequency of  Expected  
US Military bases
US Military bases  Response Frequency of
Frequency of 
in Australia (Observed responses
frequencies)
I f
In favour 8 15

Against 20 15

Undecided 32 30
Output 2: Chi‐square test – unequal 
expected frequencies
Interpretation 2: Chi‐square test – unequal 
expected frequencies
• The
The output shows that the chi‐square value  is 
output shows that the chi square value is
not significant (p = .079 > .05). (Ho = accepted)

• Therefore, it can be concluded that
, there is no 
significant differences in the frequency of 
attitudes towards military base in Australia.
attitudes towards military base in Australia.

• Th
The results show that people are largely 
lt h th t l l l
undecided on this issue, chi‐square   (2,N=60)= 
5 067
5.067, p > .05.
05
Chi square test of Independence
Chi‐square test of Independence
• Qualitative Variables  ‐
Qualitative Variables Nominal data

• used
used to test if the two variables are statistically 
to test if the two variables are statistically
associated with each other significantly.  

• Used to analyze the frequencies of two variables with 
multiple categories to determine whether the two
multiple categories to determine whether the two 
variables are independent.

• It is possible to do a cross‐tabulation (and a chi‐squared 
test – with given table value, df, confidence level) for any 
two nominal variables in the survey.
Example 1: Chi square test for cross tab
Example 1: Chi‐square test for cross‐tab

• Let us assume that we have conducted  consumer 
survey for a brand of detergent. One of the question 
dealt with income category of the respondent. 
Another asked the respondent to rate his purchase 
intentions. 

• Ho: There is no significant association between 
Respondent Income and Purchase Intention
p
S. INCOME CODE INTENT INTCODE
No
No.
1 Less Than 5000 1 NONE 1
2 Less Than 5000 1 LOW 2
3 Less Than 5000 1 LOW 2
4 Less Than 5000 1 NONE 1
5 Less Than 5000 1 HIGH 3
6 5001-10000 2 LOW 2
7 5001-10000 2 HIGH 3
8 5001-10000 2 VERY 4
HIGH
9 5001-10000 2 HIGH 3
10 5001-10000 2 LOW 2
11 10001-20000 3 HIGH 3
12 10001-20000 3 VERY 4
HIGH
13 10001-20000 3 CERTAIN 5
14 10001-20000 3 HIGH 3
15 10001-20000 3 VERY 4
HIGH
16 Above 20000 4 HIGH 3
17 Above 20000 4 CERTAIN 5
18 Above 20000 4 VERY 4
HIGH
19 Above
Abo e 20000 4 CERTAIN 5
20 Above 20000 4 CERTAIN 5
Both variables are coded.

Income codes and their equivalent incomes are –

Code Income in Rs. per Month


1 Less than 5000
2 5001 to 10,000
3 10,001 to 20,000
4 Above 20,000
20 000

Purchase Intention codes are as follows –

Code Explanation (Value Labels for the Variable)


1 None – No intention to buy
2 Low – Low intention to buy
3 High – High intention
4 Very High – Very high intention
5 Certain – Certain to buy
INCOME Per Month by PURCHASE INTENTION

Income per Month in RS.---Æ


Purchase Code Less 5000- 10000- Above TOTAL
Intent than 10000 20000 20000
5000
None 1 2 0 0 0 2
Low 2 2 2 0 0 4
High 3 1 2 2 1 6
V. High 4 0 1 2 1 4
Certain 5 0 0 1 3 4
TOTAL 5 5 5 5 20
Cross‐tabulation of code (column‐income per 
month) and Intcode (row – purchase intent).
Result 1: Chi Square test for cross tab
Result 1: Chi‐Square test for cross‐tab
Interpretation 1: Chi‐square test for cross‐tab 
• The cross‐tabulation shows the number of respondents
falling into each cell (a cell is the combination of one
INCOME category with one PURCHASE INTENTION category).

• The first line of the chi‐squared test reads a significance


level of 0.097.
0 097 This means the chi
chi‐squared
squared test is showing a
significant association between these two variables at a 90
percent confidence level. (equivalent to 0.10 significance
level).

• Thus, we conclude that at 90 percent confidence level,


PURCHASE INTENTION and INCOME are associated
significantly with each other.
other This may lead us to conclude
that the price of the detergent is important in its purchase.
Example 2: Chi square test for Cross tabs
Example 2: Chi square test for Cross‐tabs 
• Suppose
Suppose the researcher finds the association 
the researcher finds the association
between educational background (independent 
variable) of PGDM students and their performance 
i bl ) f PGDM t d t d th i f
in terms of grade (dependent variable) secured.
• A bivariate cross‐tabulation has been done by 
combining the above two variables and tabulating 
g g
the data together. 
• Here assumption is made by our group based on 
Here assumption is made by our group based on
information extracted from the database 
(performance) of B schools
(performance) of B‐schools.
• We want to test at 90% and 95% confidence 
g
level, what is the level of significance of 
association between EDUCATIONAL 
BACKGROUND of PGDM students and their
BACKGROUND of PGDM students and their 
PERFORMANCE in terms of GRADE.
• Further, the variables are coded.

• Educational background and their eqvivalent codes are
Educational background Code
Educational background 
B.Com 1
BE
B.E. 2
B.Sc. 3
BBA
B.B.A. 4
B.A.  5

• Grade codes are as follows:
Grade Obtainend Grade Code
A 1
B 2
C 3
• These two variables were cross‐tabulated for 
y
twenty‐five observations.
• A cross‐tabulation with a Chi‐squared test was 
performed using SPSS package
performed using SPSS package.
Input data table
S.No. Roll No.
ll Background   
k d Code
d Graded Grdcode
d d
1 1 B.Com 1 B        2
2 2 B.Com 1 C        3
3 3 B.Com 1 A        1
4 4 B.Com 1 C        3
5 5 B.Com 1 B        2
6 6 B.E.     2 A        1
7 7 B.E.
B.E.     2 A
A        1
8 8 B.E.     2 A        1
9 9 B.E.     2 B        2
10 10 B.E.     2 A        1
11 11 B Sc
B.Sc.    3 B
B        2
12 12 B.Sc.    3 B        2
13 13 B.Sc.    3 C        3
14 14 B.Sc.    3 C        3
15 15 B.Sc.    3 C        3
16 16 BBA      4 A        1
17 17 BBA      4 B        2
18 18 BBA      4 C        3
19 19 BBA      4 C        3
20 20 BBA      4 B        2
21 21 B.A.     5 C        3
22 22 B.A.
B.A.     5 C
C        3
23 23 B.A.     5 C        3
24 24 B.A.     5 C        3
25 25 B.A.     5 B        2
Output table 2: Grades Vs Entry Qualification
Output table 2: Grades Vs Entry Qualification
Result 2: Chi Square test for cross tab
Result 2: Chi‐Square test for cross‐tab
Interpretation 2: Chi‐Square test for cross‐tab
• The Chi‐square test revealed the significant association 
between the educational background of the students
between the educational background of the students 
and their performance in terms of grade.

• The significance level of 0.089 (Pearson’s) has been 
achieved This means the Chi‐square
achieved. This means the Chi square test is showing a 
test is showing a
significant association between the above two variables at 
91.1% confidence level (100 – 8.9).

• Thus we conclude that at 90% confidence level, ,
educational background of PGDM students and their 
performance in terms of grade are associated 
significantly with each other, whereas this is not 
significant at the 95% confidence level. 
• From the obtained contingency coefficient (C) of 0.596, it 
g y ( ) ,
can be inferred that the association between the 
dependent and independent variable is significant, as the 
value 0.596 is closer to 1 that to 0. 

• From the Lambda asymmetric value (with grade code 
dependent) of 0.286, we conclude that there is a moderate 
level of association between the above two variables. This 
lambda value tells us that there is a 28.6% reduction in 
predicting the grade of student when we know his 
educational background.

• This leads us to conclude that educational background 
plays a vital role in the performance of the students of 
PGDM course. 
Example 3: Chi‐square test for cross tab ‐ 3
• A manufacturer was interested in assesing how children ages four, five 
and six play with one of the manufacturer’s toys. Each child was asked 
1
15 questions. Following the child’s completed interview, the parent was 
i ll i h hild’ l di i h
asked the same 15 questions to validate the child’s answers. The 
following table lists the number of responses to selected items from 
g p
the survey. One hundred interviewers were conducted with both the 
parent and the child. Notice that item response rates varied from 
question to question. For each question, state at least one method that 
ti t ti F h ti t t tl t th d th t
could be used to attempt to correct for this item nonresponse bias.
Question # Children  # Parents 
Responding Responding
Age of child 95 100

Location of Play 80 85
How much the child 
How much the child 30 50
liked the toy
Result 3: Chi square test for cross tab
Result 3: Chi‐square test for cross‐tab
• Thank you…

You might also like