You are on page 1of 36

FREQUENCY

DISTRIBUTIONS

How to organize, present and analyze data

Content of 60s Pop Songs

Yeah Actual Lyrics Baby Oooh

FREQUENCY DISTRIBUTIONS
Consider the following example
How How How How old is John? old is Mary? old is Frank?

old am I?

EXAMPLE: DISCRETE VARIABLE


On the basis of a sample with 40 values, representing the age (in years, thus discrete) of EHL students 40Ages manual
Count the number of times each age appears in the sample and chalk it up on the given diagram

ABSOLUTE FREQUENCY DISTRIBUTION


Here the y-values represent the frequency in absolute values

RELATIVE FREQUENCY DISTRIBUTION


Here the y-values represent the frequency in percentage

4 40 2 40

= 10%

3 40

= 5%

= 7.5%

THE MOST FREQUENT VALUE: THE MODE


The MODE is found by the Xcel function: MODE (ranges)
Result: 21 years
There are 8 21-year old students in this sample. This represents the LARGEST frequency, ie, the MODE The set of these 8 21-year old students is called the MODAL CLASS

SPECIAL CASE
This frequency distribution has two (nearly equal) peaks: Bi-modal distribution

THE MEDIAN VALUE: A DEMOCRATIC


VALUE
The median divides the data in two EQUAL parts:
50% of the datas values are BELOW the MEDIAN value 50% of the datas values are ABOVE the MEDIAN value Xcel function: MEDIAN (ranges)

POSITION OF THE MEDIAN


The MEDIAN value is 21.5 years (found by Xcel)
Notice that there are 20 students younger and 20 students older than the MEDIAN

WHAT IS THE MEDIAN ?


Median: the central data point of a data set after sorting.
If the data has an odd number of values its literally the data value in the center of the sorted data set. If the data set has an even number of values its the average of the two values closest to the center of the sorted data set.

Example: annual precipitations in Geneva between 1976 and 1993 (mm)


583 730 890 688 777 528 958 901 875 884 926 969 524 1258 756 850 619 939

After sorting

+1 To find the position of the Median : 2


524 528 583 619 688 730 756 777 850 875 884 890 901 926 939 958 969 1258
9.5 value out of 18 Center of the data set

Here:

850+875 2

= 862.5

10

THE AVERAGE (AVG) VALUE: A BALANCED MEASURE

: the values of the variable

Symbol

: SUM
values

Formula

: the SUM of ALL the given

n = number of values
Xcel function: AVERAGE (ranges)
NB: In many textbooks the average is called the mean. This gives the honest average a poor image, so it is not used in this course.

11

POSITION OF THE AVG


The AVG value is 21.65 (found by Xcel)
This point on the Age axis can be considered the CENTROID of this distribution, hence the idea of a balanced value.

12

QUICK QUIZ
You made a survey on 10 different families to see how many children they have. You obtained the following observations: 0, 0, 1, 1, 2, 2, 2, 3, 4, 5 Indicate whether each statement is true or false.
The The The The The mode is 5 average is 2.5 median is 2 variable is quantitative variable is quantitative continuous

13

THE AVG OF CLASSIFIED DATA


When data are classified or in any way grouped, we can calculate the average of the following

Formula: =

= the value of variable at the MIDDLE of the frequency class = the value of the frequency

40Ages computer

14

SYMMETRICAL DISTRIBUTIONS
In perfectly symmetrical frequency distributions, the relative positions of MODE, MEDIAN and AVG coincide

15

ASYMMETRICAL DISTRIBUTIONS
In a asymmetrical frequency distribution the relative positions of these three parameters appear as shown. This distribution is skewed to the right. The mirror image of this situation is also possible.
AVG MEDIAN MODE

16

THE RANGE OF A GROUP OF VALUES


Age distribution of 40 students

17

QUICK QUIZ
From the following frequency distribution, indicate whether each statement is true or false.
60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

The distribution is left skewed The mode is smaller than the median and the average Mode = Median = Average The mode is between 50 and 60 The average is higher than 5 The median is between 4 and 5

18

EXERCISE 1
You are given burger sizes of the last 20 burgers sold in one fast food. Answer the following questions.
What is the type of the variable Burger Size? Compute the range. Calculate the mode, median and average. Classify the data into 4 classes and compute the frequency distribution. Represent graphically the relative frequency distribution and comment it.

19

QUICK QUIZ
You are reported in the table below the number of clients that came to your restaurant the last 50 days.
Compute the missing values
xi 25 26 27 28 29 30 > 30 ni 5 fi 10.00% 12.00% 18.00% 22.00% 10.00% Fi 10.00% 32.00% 50.00% 72.00% 100.00%

Indicate whether each statement is true or false. x3= 27 clients The sample size is 50 clients f4 = 18% of days 28 clients came to your restaurant The median is 28 clients The average cannot be calculated

9 11 5

20

EXERCISE 2
Using data from the customer satisfaction feedback of one service, answer the following questions:
What is the type of the variable? Compute the absolute and relative frequency distribution. Graph the relative frequency and comment your results.

21

GRAPHICAL TOOLS
Use of different graphical representations depends on the nature (qualitative or quantitative) of the variable being studied.

Qualitative Variable
Circle diagram Bar chart

Quantitative Variable Discret


Bar chart Steam and Leaf Box Plot

Continous
Histogram Density Curve Box Plot

22

GRAPHICAL TOOLS: CIRCLE DIAGRAM


Represents the terms of the variable as a disc. Surfaces for each category are determined by angles that are proportional to observed frequencies.
i =360*fi

23

GRAPHICAL TOOLS: BAR CHART


Represents the various possible values of the variable according to their absolute or relative frequency.

24

GRAPHICAL TOOLS : STEM AND LEAF PLOTS


Annual precipitations in Geneva between 1976 and 1993 (mm):
583 730 890 688 777 528 958 901 875 884 926 969 524 1258 756 850 619 939

Procedure:
Separate each number into a stem and a leaf. Here, we choose the number of hundreds as the stem and the tens digit as the leaf Group the numbers with the same stems

Stem 5 6 7 8 9 10 11 12

Leaf 238 29 368 5889 03467

Remarks:
Stem and leaf plots simultaneously show data repartition and data itself The leaves are sorted in increasing order The most difficult step is the scale choice: tens/hundreds; sometimes 5/50; 2/20, etc
25

QUICK QUIZ
As a marketing consultant you observed 50 consecutive shoppers at a grocery store, and recorded how much money each shopper spent in the store.
The following graph provides this information.
0 1 2 3 4 5 6 2 7 7 8 9 0 1 2 3 3 4 4 4 5 5 5 5 7 7 8 8 9 0 0 1 1 1 1 4 6 7 9 9 1 2 3 3 4 5 6 8 9 1 4 6 2 2 4 4 9

Indicate whether each statement is true or false.


This graphical representation is called a histogram. The average expenditure cannot be calculated. The expenditures distribution is skewed to the left. The median is at 21.

1| 0 matches for 10 francs

26

QUICK QUIZ
The scores of a team from the last Statistics quiz are given in the stem and leafs graph below. The quiz was graded on 70pts.
Reading scale : 1 | 5 represent 15 points

Indicate whether each statement is true or false.


Team 2 is made out of 6 students. The range of the scores is 59. The highest obtained score is 70. The median is 32. 40% of the students totaled less than 30 points. The average cannot be calculated. The variable is quantitative discrete. 25% of the students have more than 36 points. The circle diagram could be a good graphical representation of the observations.

1 2 3 4 5 6

079 11368 0135677 1112

27

GRAPHICAL TOOLS: HISTOGRAM


Represents the distribution of the variable taking into account the frequency and amplitude of classes.
Distribution of employees wages according to the salary classes, Switzerland 2008
Monthly net salary, private and public sector (Confederation) together

28

GRAPHICAL TOOLS: BOX PLOT


Great visual representation of many important characteristics of a data set. Data needed:
Minimum and Maximum Average Median First and Third quartiles (Q1 and Q3)

29

BOX PLOT ILLUSTRATION

36

QUICK QUIZ
The Box Plot here under represents the Swiss Civil Aviation Airport traffic in 2009.

From the Box Plot above, indicate weather each statement is true or false. 75% of airports have an

annual traffic lower than 100'000 flights. Half of the airports have an annual traffic greater than 70'000 flights. The skew is positive. Two airports in particular have most traffic.

38

GRAPH EXAMPLES

39

GRAPH EXAMPLES
In October 2012, a well known newspaper published that the average salary in Switzerland is ranked 6th among 29 countries used for the study. Below is the reference graph published by the OFS (office ffral de la statistique). What can you conclude?

40

QUICK QUIZ
We would like to study the distribution of net monthly salary for Swiss employees in 2013. Relative frequencies per class are given in the table below:
Salary classification 0-3000 CHF 3000-4000 CHF 4000-5000 CHF 5000-6000 CHF 6000-7000 CHF 7000-8000 CHF 8000 and more CHF Total Relative frequency 2% 14% 24% 20% 13% 9% 19% 100%

Given this information, indicate whether each statement is true or false?


The data cannot be graphically represented in terms of relative frequency because the last class 8000 and more is open. The most suitable graph is the circle diagram because the variable "Salary" is Quantitative continuous. A histogram would be the best graphical representation of the data. The steam and leaf graph is not possible because the Variable "Salary" is classified.

41

EXERCISE 3
The life cycle of 20 bulbs from the company Superligth SA has been measured during a control. The results obtained are in the stem-and-leaf (see Excel file).
Find the quartiles of this distribution and compute the IQR. Find the average life cycle knowing that the sum of leafs are 18800 hours. Find the mode?

42

EXERCISE 4
Answer the following questions using the available exam grades distribution.
How many students attended the exam? Compute the 5-number summary of the exam results. What is the average grade? Draw the graph of the distribution and comment it.

43

You might also like