You are on page 1of 16

Statistics Assignment 4

Classroom Fraud

Group 8
Pia Bakshi
Shruti Shukla
Srilakshmi Anumolu

Vikas Vimal

Introduction
The question at hand implores the reader to delve into the behavioral and the
statistical germination and implications of a fraudulent disposition of the teacher.
We have been asked to assess the two given data sets of two classes and determine
if any of the two poses as an apparent fraud collection of information.

Procedure
There are 22 students in classroom A and B who were expected to answer 44
multiple choice questions. The correct answers are symbolized by the use of
alphabets corresponding to the correct answer and the wrong answer is
represented by the values 1, 2, 3 or 4 corresponding to the wrong options a, b , c
and d, respectively. 0 symbolizes an unanswered question.

Before proceeding with the explanation, it is pertinent to communicate the


underlying assumption of our argument. We have assumed that the pattern of
answering the questions (No of right answers by a student) follows a normal
distribution. However, any fraudulent activity would make it appear otherwise.
As mentioned earlier, we have adopted two collaborative methods to arrive at a
conclusion1. Two Statistical approaches
- On the basis of the number of answers the students answered correctly
- On the basis of the number of students who got the correct answers
2. Behavioral approach

Through the mean and the standard deviation of the entire data, we can ascertain
the expected probability of an answer being correct, in the first case scenario and
the student being correct, in the second case.

On the basis of the number of


answers the students got correctly:
We have divided the total number of correct answers by each student into ranges
of 3 (3-6, 6-9, 9-12 and so on and so forth; where 3 symbolizes the lower limit and
6 the upper limit). Next, we found the probability of a student having answers
between the respective ranges if its a normal distribution (p).
To find the goodness of the fit of the probability distribution of the student giving
the correct answer in his/her respective range (Chi-square test) with respect to
normal distribution, we found the expected probability (np) by multiplying the
number of students (22) with the above mentioned p, where S n is the number of
students falling in a range.
(S nnp)2
=
np
2

DOF= 7
X2 (Upper limit @95% confidence) = 16.01
X2 (Lower limit @95% confidence) = 1.69

Observations:
Classroom A:
DOF= 7
X2 (Upper limit @95% confidence) = 16.01
X2 (Lower limit @95% confidence) = 1.69

X2 obtained from the data for Classroom B is 8.49. The expected range of X2 is
1.69 to 17.01. A value falling in this range corresponds with a value falling in the
95% confidence zone, thus we can reject the hypothesis in our case that the data
is fraudulent.

7.000
6.000
5.000
4.000

np
Sn

3.000
2.000
1.000
1

As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern .On the other hand, the obtained values from the data set
show a positive disparity (higher blue bar than red) indicating a higher number of
correct answers in a given range when compared to the expected value.
This leads us to believe that there is a small possibility of this data being
fraudulent.

Classroom B:
DOF= 7
X2 (Upper limit @95% confidence) = 16.01
X2 (Lower limit @95% confidence) = 1.69

X2 obtained from the data for Classroom B is 2.75. The expected range of X2 is
1.69 to 17.01. A value falling in this range corresponds with a value falling in the
95% confidence zone, thus we can reject the hypothesis in our case that the data
is fraudulent.

6.000
5.000
4.000
np

3.000

Sn

2.000
1.000
1

As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern. On the other hand, the obtained values from the data set
show a similarity with the expected normal distribution (red bars) and there is an
acute lack of disparity (the difference between the blue bars and the red bars)
suggesting the lack of the possibility of fraudulent manipulation of the data.

On the basis of the number of


students who got the correct
answers
We have divided the number of students into ranges of 3 (0-3, 3-6, 6-9, 9-12 and
so on and so forth; where 0 symbolizes the lower limit and 3 the upper limit) on
the basis of answering the questions.
Next, we found the probability of answerability by a certain number of students.
We broke down the data in boxes of three students, i.e., the number of questions
solved by 1,2 or 3 students falls in the range of 0-3, those solved by 4,5,or 6
students falls in the next box.
To find the final association between the number of students and the probability of
answerability, we performed the (Chi-square test) on the given data and the
expected data. We arrived at the expected data by analyzing the mean and std-dev
of the given data and assuming that it should follow a normal distribution.
We assessed the expected probability(np) by multiplying the number of
students(22) with the above mentioned p, where Sn is the quantitative ability of
answering correctly, falling in a range.
(S nnp)2
=
np
2

Observations:

Classroom A:

DOF= 6
X2 (Upper limit @95% confidence) = 14.449
X2 (Lower limit @95% confidence) = 1.237
The X2 obtained from the data is 3776.12.
The expected range of X2 is 1.69 to 17.01. A value falling outside this range
corresponds with a value falling outside the 95% confidence zone, thus we accept
that our data for classroom A is fraudulent.

Classroom A

Actual

Expected

No. of questions solved

Sn

upper limit

lower limit

np

(Sn-np)^2/np

no of questions solved by 0 - 3 students

0.11

5.06

0.22

no of questions solved by 3 - 6 students

0.33

14.53

6.25

no of questions solved by 6 - 9 students

0.36

15.93

5.00

no of questions solved by 9 - 12 students

12

0.15

6.66

0.27

no of questions solved by 12 - 15 student

11

15

12

0.02

1.06

93.53

no of questions solved by 15 - 18 student

18

15

0.00

0.06

766.85

no of questions solved by 18 - 21 student

21

18

0.00

0.00

2,904.00

n
X2

44
3,776

mean

6.29

Std Dev

2.93

18.00
16.00
14.00
12.00
10.00
8.00
6.00
4.00
2.00
0-3

3-6

6-9

Expected(np)

9-12

12-15

15-18

18-21

Actual(Sn)

As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern with a positive skew.

On the other hand, the obtained values from the data set show a negatively
skewed disparity (higher blue bar than green) indicating increased ability to
answer correctly in a given range when compared to the expected value. This leads
us to believe that there is a possibility of this data being fraudulent.

Classroom BX2 obtained from the data for Classroom B is 5.49.


DOF= 6
X2 (Upper limit @95% confidence) = 14.449
X2 (Lower limit @95% confidence) = 1.237
A value falling in this range corresponds with a value falling in the 95% confidence
zone, thus we cannot reject the hypothesis in our case that the data is fraudulent.
16.00
14.00
12.00
10.00
8.00
6.00
4.00
2.00
0-3

3-6

6-9

Expected(np)

9-12

12-15

15-18

18-21

Actual(Sn)

As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern with a negative skew.

On the other hand, the obtained values from the data set show a similar negative
skew (higher green bar than blue). This leads us to believe that there is no
statistically significant possibility of this data being fraudulent.

Classroom B

No. of questions solved

Actual(Sn)

S
n

Expected(np)

upper limit

lower limit

np

(Sn-np)^2/np

no of questions solved by 0 - 3 students

0.15

6.67

0.07

no of questions solved by 3 - 6 students

14

0.23

10.00

1.60

no of questions solved by 6 - 9 students

0.24

10.36

1.09

no of questions solved by 9 - 12 students

10

12

0.17

7.41

0.90

no of questions solved by 12 - 15 student

15

12

0.08

3.66

1.49

no of questions solved by 15 - 18 student

18

15

0.03

1.25

0.05

no of questions solved by 18 - 21 student

21

18

0.01

0.29

0.29

n
X2
mean

44
3,776
6.29

Std Dev

2.93

Behavioral Approach
We are basing our deduction on a simple premise- that the intention of the
examination is to reflect the classs academic prowess in the subject and, by
corollary, the teachers ability and efficiency.
If we approach the issue at hand sans statistical or quantitative dispositions and
venture into the psychological or behavioral connotations of the same, we arrive at
a simple logical argument- Poor performance in an examination by the class
reflects

poorly

on

the

teachers

efficiency

and

effectiveness,

too.

The question clearly states that there is definite fraudulent activity in the
examination conducted, either in the case of Classroom A or B. If at all, the
teacher was to manipulate the results, she/he would maneuver them in the
direction that is most favorable to her/him- that is an improved class performance.
In light of that, she/he would increase the cumulative results of her/his class and
as indicated in the graphs comparing actual performance with expected
performance, the results of A seem tampered.
Also, it is imperative to note that increased class performance (in terms of
marks/student output alludes to efficient performance on the part of the teacher.
Through a cumulative and culminating study of the data(based on the supposition
of the existence of Normal Distribution) it is possible to assume that there is a
chance of fraudulent activity and the study of data, hence obtained makes us
believe that this manipulation presented itself in case of Classroom A.

You might also like