You are on page 1of 11

This article was downloaded by: [University of Tennessee, Knoxville]

On: 17 March 2013, At: 09:00


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

Journal of Social Service Research


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/wssr20

A Comparison of Psychometric Properties and


Normality in 4-, 5-, 6-, and 11-Point Likert Scales
a
Shing-On Leung
a
University of Macau, Macau, China
Version of record first published: 16 May 2011.

To cite this article: Shing-On Leung (2011): A Comparison of Psychometric Properties and Normality in 4-, 5-, 6-, and 11-Point
Likert Scales, Journal of Social Service Research, 37:4, 412-421

To link to this article: http://dx.doi.org/10.1080/01488376.2011.580697

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to
anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should
be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,
proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in
connection with or arising out of the use of this material.
Journal of Social Service Research, 37:412421, 2011
Copyright c Taylor & Francis Group, LLC
ISSN: 0148-8376 print / 1540-7314 online
DOI: 10.1080/01488376.2011.580697

A Comparison of Psychometric Properties and Normality in


4-, 5-, 6-, and 11-Point Likert Scales
Shing-On Leung

ABSTRACT. The Likert scale is very popular, but the question as to the number of scale points is
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

still controversial. This article studies the differences among 4-, 5-, 6-, and 11-point Likert scales
with a sample of 1,217 students in Macau, using the Rosenberg Self-Esteem Scale as the measuring
instrument. There is no major difference in internal structure in terms of means, standard deviations,
itemitem correlations, itemtotal correlations, Cronbachs alpha, or factor loadings. Findings indicate
that having more scale points seems to reduce skewness, and the 11-point scale, ranging from 0 to
10, has the smallest kurtosis and is closest to normal. Only the 6- and 11-point scales follow normal
distributions from KolmogorovSmirnov and ShapiroWilk statistics. Results on predictive validity
are inconclusive. This article discusses future applications and suggests the use of an 11-point scale as
it increases sensitivity and is closer to interval level of scaling and normality. Recommendations for
social workers and teachers are made to better assist when using self-reported measurement scales.

KEYWORDS. Likert scale, interval scale, normality, reliability

BACKGROUND between ends might detract from the interval na-


ture and hence that only end-defined labels were
The Likert (1932) scale is one of the most needed. Wyatt and Meyers (1987) used four dif-
widely used instruments for measuring opinion, ferently labeled 5-point scales and found that
preference, and attitude. It consists of a number these might differ in variability but not in means
of items with around 4 to 7 points or categories or reliability, with the latter dependent on item
each. Analysis can be based on individual items item correlations, which do not themselves de-
or a summation of items forming a scale. Some pend on variance. They also found that more
distinguish a Likert item from a Likert-type item, absolute endpoints might result in more frequen-
the former having bivalent and symmetrical la- cies concentrated in the middle, and vice versa,
bels about a middle or neutral point (Kislenko and suggested the use of less absolute labels.
& Grevholm, 2008). Though the Likert scale re- Different studies may choose to use different la-
mains popular, there are several issues that need bels, with a blank included as one of the options.
to be examined. First, all points can be labeled, In this article, only end-defined endpoints are
but sometimes only the endpoints may be iden- used.
tified. Originally, Likert (1932) labeled each op- Second, there is no agreement on the number
tion to be chosen, but Cummins and Gullone of scale points to be used. Most studies use 4
(2000) suggested that the naming of categories to 7 points, while some may be extended to 10

Shing-On Leung, Professor, University of Macau, Macau, China.


Address correspondence to: Shing-On Leung, Faculty of Education, University of Macau, Taipa, Macau,
China (E-mail: soleung@umac.mo).
412
Comparison of 4-, 5-, 6-, & 11-Point Likert Scales 413

or 11. Cummins and Gullone (2000) produced tremes resulting in the same value. The two ob-
a review of this issue and suggested that ex- viously carry different meanings (Kislenko &
panding beyond 5 or 7 points might increase the Grevholm, 2008). Clason and Dormody (1994)
sensitivity without affecting reliability. Borgers, summarized 95 articles in the Journal of Agri-
Sikkel, and Hox (2004) suggested the use of 4 cultural Education and found that, respectively,
points as an optimum after considering a number 54%, 32%, and 13% of these articles used de-
of options, the neutral point, and reliability. L. scriptive, parametric, and nonparametrics statis-
Chang (1994) used a model approach to evalu- tics. There were thus 2.5 (= 32/13) times as
ate 4- and 6-point scales after fitting empirical many studies treating Likert scales as continuous
data and concluded that the scale points had no scales than as ordinal-discrete scales. Goldstein
effect on criterion-related validity. The final de- and Hersen (1984) thought the interval assump-
cision might depend more on empirical settings. tion seemed unlikely, and Hart (1996) stated that
Allen and Seaman (2007) proposed that 7 points the respondents did not use a linear interval scale.
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

could be shown to reach the upper limits of re- Parker, McDaniel, and Crumpton-Young (2002)
liability. There is no common agreement on this compared a 5-point Likert scale with a visual
issue, but Likert (1932) and others recommended analogue scale with a linguistic anchor by us-
that we use the scale as widely as possible, for ing t-tests and analysis of variance, and no sig-
we could always collapse points into condensed nificant differences were found. These results
categories, but not vice versa (Allen & Seaman, suggested the use of the parametric procedure
2007). And when M. L. M. Chang (2010) ex- for Likert scales. Even though the interval na-
tended Somech and Rons (2007) organizational ture of individual items is rejected, the summed
citizenship behavior scale from 7 to 11 points, scale score may still be of the interval type, for
very significant results were found. This pro- the sum may be insensitive to the violation of
vides support for using an 11-point scale in this interval assumption at item level. Kislenko and
article and to compare it to the internal structure Grevholm (2008) referred to these as supporters
of 4-, 5-, 6-, and 11-point Likert scales. of measurement (assuming ordinal nature at item
Third is the discussion as to whether there level), and supporters of statistics (assuming in-
should be a middle or neutral point and whether terval nature at summed-scale level). However,
this neutral point should be used. Garland (1991) most studies insist that the set of items should
showed that social desirability bias may be re- first have passed some tests of internal consis-
duced by removing it and that retaining it might tency, such as Cronbachs alpha or factor analy-
distort the results. But removing it also intro- sis (Allen & Seaman, 2007). Therefore, the pur-
duces a forced choice in the scale (Allen & pose of this article is to evaluate and compare the
Seaman, 2007), as a respondent might be forced reliability and factor analysis of different Likert
to declare a stand instead of remaining neutral, point scales.
which in some political and sensitive cases may Fifth, beyond interval scales, the next ques-
not be desirable. For the most part, use or nonuse tion is whether normal distribution assumptions
could constitute a never-ending debate. hold, as this is the basis for many parametric
Fourth, there is the long-standing and contro- procedures such as t-tests, factor analysis, and
versial question of whether the Likert scale can other related tests. Both Clason and Dormody
be used numericallytechnically, whether it is a (1994) and Wu (2007) found it hard to see how
matter of ordinal or interval-scale measurement. normality in the Likert scale held. Wu (2007)
If we treat it as interval-scale data, analysis will proposed using Snells transformation to make
be more powerful and easier to interpret, but this it closer to normality, but after doing so, the nor-
type of usage may cause the findings to be misled mality was still rejected. Therefore, this article
or misrepresented (Allen & Seaman, 2007). For seeks to test the normality assumption for these
example, a mean of 3 on a 5-point scale labeled different Likert point scales.
from 1 to 5 may imply 20% evenly distributed When using the Likert scale, there is no com-
over 5 points, although this creates an extreme mon agreement on the number of points to be
U-shape distribution with 50% at the two ex- used, and normality assumptions remain the
414 S.-O. Leung

subject to debate. Therefore, the purpose of this needed. Teachers in the schools were responsi-
article is to compare 4-, 5-, 6-, and 11-point Lik- ble for distributing and returning the question-
ert scales in terms of internal structure, normal- naire to the researchers. Because it was difficult
ity, and predictive validity. The authors will also to distribute four versions of the scale evenly at
discuss the advantages of using an 11-point scale one time, we divided the process into two stages.
ranging from 0 to 10 in the Discussion section. First, the researchers distributed 5- and 11-point
questionnaires alternately so that the first, third,
fifth, etc., students seated in classrooms received
METHODS the 5-point questionnaire; second, fourth, sixth,
etc., students seated in classrooms received the
Instrument 11-point questionnaire. This resulted in two sub-
samples with similar characteristics (N = 271
The Rosenberg Self-Esteem Scale (RSES)
and 272, for 5- and 11-points, respectively). In
was selected as the measuring scale of choice
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

the second stage, we distribute 4-, 5-, and 6-


because it is widely used and has been trans-
point questionnaires in a similar way, resulting in
lated into many different languages. This study
three subsamples of similar characteristics and
used a Chinese version provided by Leung and
sizes (N = 231, 223, and 220, for 4, 5, and 6
Wong (2008) because in the sample, the respon-
points respectively). The final sample consisted
dents were all local Chinese in Macau. In using
of 1,217 students in Macau. The two subsamples
this translated version of the scale, however, one
of 5-point scales were found to have very similar
problem was found. A problem was noted in
characteristics, and hence, the two samples were
the translating of Item 8, which reads: I wish I
combined into one. Finally, there were four sam-
could have more respect to myself. Leung and
ples for the 4-, 5-, 6-, and 11-point scales from
Wong (2008) have used four different Chinese
the same sampling frame and of a similar nature
versions of Item 8 to alleviate the problem, but
for purposes of comparison.
none have proven to be satisfactory in terms of
itemitem correlations, itemtotal correlations,
reliabilities, and factor analysis. We will there-
fore include both full-length (10-item) and 9- RESULTS
item scales, the latter having Item 8 excluded.
As the purpose is to compare 4-, 5-, 6-, and Internal Structure
11-point scales, the same questionnaire was de-
Basic psychometric properties of the 10-item
signed in each case, different only in the num-
RSES are reported in Table 1.
ber of scale points. Because we include an 11-
In Table 1, the means and standard deviations
point scale and it is very difficult to find suit-
(SD) are transformed to a range of 0 to 100 re-
able labels for 11 points, only end-defined labels
gardless of any point scales. The means are the
are used on all scales for comparative purposes,
same, 55 (rounding to the nearest integer), and
consistent with the view that adding labels in
the SDs only differ by 1 (either 17 or 18). The
the middle may distract from the interval nature
skewness and kurtosis will be reported later as
(Cummins & Gullone, 2000); further, it is im-
they are related to normality.
possible to have the same labels for all scales.
As for itemitem correlations, the minima for
Questionnaires are designed so that they can be
all point scales are associated with Item 8, the
self-administered by respondents.
one Leung and Wong (2008) reported problems
Participants with. The Item pair 2 and 6 has maximum corre-
lations across all point scales, and the averages
Our sample consisted of secondary-school were similar.
students, Grades 7 to 12 (i.e., 1318 years of For itemtotal correlations, all minima are
age) in Macau. Students were able to read and also associated with Item 8. The maxima are
understand the questions presented to them with- associated with Items 2 and 6 for the 10- and
out difficulty and no further clarification was 4-point scales, respectively, and Item 9 for the
Comparison of 4-, 5-, 6-, & 11-Point Likert Scales 415

TABLE 1. Basic Psychometric Properties of the 10-Item Rosenberg Self-Esteem Scale of the 4-,
5-, 6-, and 11-Point Likert Scales

4-point 5-point 6-point 11-point

Mean 55 55 55 55
SD 18 18 17 17
Itemitem corr (min) r (8,10) = .13 r (4,8) = .19 r (8,10) = .21 r (7,8) = .08
Itemitem corr (max) r (2,6) = .65 r (2,6) = .69 r (2,6) = .62 r (2,6) = .63
Itemitem corr (average) .25 .29 .24 .28
Itemtotal corr (min) r (T,8) = .23 r (T,8) = .08 r (T,8) = .22 r(T,8) = .24
Itemtotal corr (max) r (T,6) = .75 r (T,9) = .77 r (T,9) = .71 r (T,2) = .73
Itemtotal corr (average) .57 .60 .56 .59
Cronbachs alpha .78 .81 .76 .80
Eigenvalue (first) 3.6 (36%) 4.0 (41%) 3.3 (34%) 3.8 (38%)
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

Eigenvalue (second) 1.5 (50%) 1.7 (58%) 1.8 (52%) 1.5 (54%)

5- and 6-point scales. The maximum itemtotal ages range from 34% to 41%. For the second
correlations range from .71 to .77 and show lit- factor, the corresponding figures range from 1.5
tle variation. The averages are also similar and to 1.8 and 50% to 58%.
range from .56 to .60. The Cronbachs alpha reli- The above results show that all point scales
ability of internal consistency for all point scales have very similar basic psychometric proper-
is very similar and ranges from .78 to .81. These ties in terms of means, SDs, itemitem corre-
values can justify RSES as an interval scale. lations, itemtotal correlations, reliabilities, and
To examine the internal structure further, we exploratory factor analysis. As Item 8 may affect
conduct exploratory factor analysis with the Sta- the results, we remove it and report the 9-item
tistical Package for the Social Sciences (SPSS) results in Table 2.
as default. The extraction and rotation methods When Item 8 is removed, the mean, 57, is the
are principal component analysis and varimax same for all scales, and the SDs are also similar
respectively. The 5-point scale has the highest (18 to 20). The minimum itemitem correlations
eigenvalue for both the first and second factor are no longer associated with Item 8. Item 4
and also the highest cumulative percentages of attaches with the minimum for 5-, 6-, and 11-
variations explained. The first eigenvalue ranges point scales. For the maximum, the results are
from 3.3 to 4.0, and the corresponding percent- the same as for the 10-item results, because it is

TABLE 2. Basic Psychometric Properties of the 9-Item Rosenberg Self-Esteem Scale of the 4-,
5-, 6-, and 11-Point Likert Scales

4-point 5-point 6-point 11-point

Mean 57 57 57 57
SD 19 20 18 18
Itemitem corr (min) r (3,9) = .13 r (4,6) = .16 r (4,5) = .09 r (4,6) = .02
Itemitem corr (max) r (2,6) = .65 r (2,6) = .69 r (2,6) = .62 r (2,6) = .63
Itemitem corr (average) .31 .38 .30 .35
Itemtotal corr (min) r (T,4) = .51 r (T,4) = .59 r (T,10) = .53 r (T,4) = .52
Itemtotal corr (max) r (T,6) = .75 r (T,9) = .76 r (T,9) = .71 r (T,2) = .73
Itemtotal corr (average) .62 .67 .61 .64
Cronbachs alpha .81 .85 .79 .83
Eigenvalue (first) 3.5 (39%) 4.1 (45%) 3.3 (37%) 3.8 (43%)
Eigenvalue (second) 1.3 (54%) 1.6 (63%) 1.6 (55%) 1.4 (59%)
416 S.-O. Leung

not related to Item 8. The averages range from they are negative statements. Item 8 is loaded
.30 to .38. differently across different scales. In its English
The minima of itemtotal correlation are also version, Item 8 is a negative statement. As in-
not associated with Item 8 and appear in Item dicated earlier, there is no appropriate Chinese
4 for the 4-, 5-, and 11-point scales, and Item version to be found (Leung & Wong, 2008), and
10 for the 6-point scale. The maxima appear in results for this item cannot therefore be counted.
Item 6 for the 4-point scale, Item 9 for the 5- As a result, the patterns of factor loadings, and
and 6-point scales, and Item 2 for the 10-point hence the internal structure, are largely the same
scale. The averages of itemtotal correlations across different scales.
vary little (from .62 to .67) but are higher than The above results indicate that the internal
the 10-item results. structure as reflected by basic psychometric
The Cronbachs alphas ranged from .79 to .85, properties and factor loadings is very similar in
higher than the 10-item results. In factor analysis different point scales. High reliabilities suggest
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

with the same default settings, the results were that RSES is interval scale data. However, can
similar to the 10-item results. The 5-point scale RSES be considered as normally distributed?
also has the highest eigenvalues (from 3.3 to
4.1) and percentages of variations (from 37% to Normality
45%). The 9-item results are very similar to the
10-item ones, and hence from now on, we only We use three steps to investigate the nor-
report results based on 10 items. mality of RSES: i) skewness and kurtosis, ii)
Another indication of internal structure is fac- KolmogorovSmirnov (KS) and ShapiroWilk
tor loadings from the factor analysis as con- (SW) tests for normality, and iii) normal QQ
ducted earlier. We use the minimum factor plot. Table 4 reports the skewness and kurtosis.
saliency criterion of |.40| for interpretations: The first part of Table 4 reports skewness and
Items were allocated to factors if loadings ex- kurtosis for 10 items individually on all scales.
ceeded |.40|. Factor loadings are reported in The second part reports the same figures for
Table 3. RSES and two single items: single-item self-
With the criterion, Items 1, 3, 4, 7, and 9 are esteem (SISE; Robins, Hendin, & Trzesniewski,
loaded in the second component in all scales, and 2001) and subjective social economic scale (Le-
they are positive statements. Items 2, 5, 6, and ung, 2009). The inclusion of these two items is
10 are also loaded in the first for all scales, and to give additional information on whether having

TABLE 3. Factor Loadings of Rosenberg Self-Esteem Scale of the 4-, 5-, 6-, and 11-Point Likert
Scales

First component Second component

4-pt 5-pt 6-pt 11-pt 4-pt 5-pt 6-pt 11-pt

1+ .418 .362 .142 .322 .509 .653 .659 .663


2 .767 .845 .728 .739 .142 .128 .220 .310
3+ .170 .215 .259 .151 .639 .707 .643 .715
4+ .159 .091 .029 .077 .547 .805 .758 .766
5 .640 .708 .757 .568 .286 .166 .090 .303
6 .812 .852 .808 .862 .232 .105 .138 .076
7+ .107 .189 .030 .138 .711 .731 .744 .725
8 .495 .248 .467 .440 .403 .475 .324 .262
9 .716 .833 .694 .701 .234 .202 .307 .356
10+ .131 .325 .111 .264 .712 .608 .585 .636

Note. Those items with a + sign are positively worded items. Those without are negative. Those loadings with absolute values higher than
|.40| are in bold and italic.
Comparison of 4-, 5-, 6-, & 11-Point Likert Scales 417

TABLE 4. Skewness and Kurtosis of 10 Items in RSES, SISE, and SSES of 4-, 5-, 6-, and
11-Point Likert Scales

Skewness Kurtosis

Item 4-pt 5-pt 6-pt 11-pt 4-pt 5-pt 6-pt 11-pt

1 0.29 0.22 0.43 0.12 0.39 0.20 0.06 0.02


2 0.19 0.19 0.16 0.13 1.13 1.09 1.13 1.01
3 0.33 0.22 0.07 0.02 0.46 0.28 0.55 0.19
4 0.11 0.10 0.06 0.12 0.61 0.46 0.42 0.27
5 0.27 0.30 0.19 0.23 0.93 0.90 1.12 0.76
6 0.19 0.31 0.15 0.14 1.01 0.98 1.08 1.02
7 0.34 0.15 0.24 0.33 0.56 0.55 0.67 0.31
8 0.31 0.20 0.26 0.24 0.90 0.67 0.85 0.55
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

9 0.72 0.62 0.57 0.47 0.64 0.66 0.83 0.94


10 0.42 0.35 0.55 0.42 0.70 0.59 0.38 0.61
RSES 0.29 0.18 0.08 0.14 0.15 0.21 0.16 0.04
SISE 0.18 0.12 0.13 0.28 0.81 0.40 0.67 0.32
SSES 0.29 0.19 0.39 0.06 1.22 1.38 1.36 0.97

Note. Those values closest to 0 are in bold and italic. RSES = Rosenberg Self-Esteem Scale; SISE = single-item self-esteem; SSES =
subjective socioeconomic status.

more points in a Likert scale produces a closer ing the closest approach to normality. Most im-
approach to normality for some single-item mea- portantly, the very small kurtosis value of 0.04
sures. for RSES implies a very close approach to nor-
Normal variates have zero skewness and kur- mality, and this is supported by normality tests.
tosis and smaller absolute values, implying a The KS and SW tests are used to conduct
closer approach to normality. The smallest ab- formal statistical assessment of normality. The
solute values (SAV) across all scales are in bold KS test is a nonparametric test of distribution to
and italic in Table 4. As for skewness, the 11- compare with a reference distribution, which is
point scale has SAV for five items of RSES. normal in this case; the larger the KS statistics,
Items 4 and 5 have SAV with the 6-point scale, the farther from normal, and vice versa. As a
and Items 7, 8, and 10 have SAV with the 5- nonparametric test, it is less powerful than the
point scale. Lack of any neutral point in the 4- parametric SW test, which is used specifically
and 6-point scales may result in more skewness. for normal distributions; contrary to the KS test,
Because all negative statements are reversed in the higher the SW values, the closer to normality.
coding, all items with more positive values will In the case of both tests, if significant values are
imply higher self-esteem. Most items are nega- less than .05, we will reject the null hypothesis
tively skewed, suggesting that more students rate that a normal distribution is involved. We use the
their self-esteem higher. As self-esteem may be, SPSS default settings for these two tests. Table
by its nature, negatively skewed among Chinese 5 reports KS and SW being applied to 10 items,
people, another way of proceeding is to look at RSES, SISE, and SSES.
the kurtosis. The arrangement of Table 5 is very similar to
In respect of kurtosis, the 11-point scale has that of Table 4, except for KS and SW statistics
SAV for almost all items, the exceptions being replacing skewness and kurtosis. For KS statis-
Items 6, 9, and 10. But for Items 6 and 10 on the tics, the smallest for all scales is in bold and
11-point scale, values are not far from the min- italic, with the 11-point scale having the small-
imum (1.02 vs. 1.01 for Item 6, and 0.61 est value for all items except Item 3. For SW
vs. 0.59 for Item 10). For the summated statistics, the largest value is in bold and italic,
RSES, SISE, and subjective socioeconomic sta- with the 11-point scale having the largest values
tus (SSES), the 11-point scale has SAV, indicat- for all items and the minor complication that in
418 S.-O. Leung

TABLE 5. KolmogorovSmirnov (KS) and ShapiroWilk (SW) Statistics of 4-, 5-, 6-, and 11-Point
Likert Scales

KS statistics SW statistics

Item 4-pt 5-pt 6-pt 11-pt 4-pt 5-pt 6-pt 11-pt

1 .28 .21 .21 .14 .86 .90 .91 .95


2 .19 .15 .14 .09 .87 .89 .91 .93
3 .27 .22 .15 .20 .86 .90 .93 .94
4 .24 .20 .18 .15 .87 .91 .93 .96
5 .22 .16 .14 .12 .87 .89 .91 .94
6 .21 .16 .13 .10 .87 .89 .91 .93
7 .25 .21 .15 .15 .87 .90 .93 .93
8 .23 .20 .14 .13 .87 .90 .91 .93
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

9 .24 .20 .18 .16 .81 .86 .87 .89


10 .22 .18 .16 .14 .85 .89 .89 .92
RSES .09 .07 .05 .05 .980 .987 .994 .994
SISE .22 .22 .16 .15 .88 .90 .93 .94
SSES .19 .19 .17 .15 .92 .93 .94 .95

Note. For KS statistics, those values closest to 0 are in bold and italic. For SW statistics, those values closest to 1 are in bold and italic. RSES
= Rosenberg Self-Esteem Scale; SISE = single-item self-esteem; SSES = subjective socioeconomic status.

Item 7 and RSES, the 6- and 11-point scales have more points at item level will mean more points
the same largest value. So, the 11-point scale is at scale level, and it is obvious that the 11-point
closest to normality. scale has the most points. Importantly, the 11-
Table 5 reports only the values of statistics point scale is the most closely aligned with a
without significance levels or p values. In fact, straight line and is therefore closest to normality
for all individual items, all tests show statistical among all point scales. All the results above are
significance valueshence, the null hypothesis related to internal structures, but we now move
that they come from normal is rejected. At item on to predictive validity, where external criteria
level, even though we use as many as 11 points, are present.
it is still far away. The only exception is the
summated RSES, for which significance values Predictive Validity
are reported in Table 6.
Table 6 reports the significance values in a For predictive validity, external criteria are
normality test of RSES for different point scales. needed. In this article, the areas involved are
The 4- and 5-point scales are statistically sig- socioeconomic status (SES), perceived aca-
nificant, but the 6- and 11-point scales are not, demic competence, relationship with others, and
implying that they follow a normal distribution.
Though the KS and SW tests are nonparametric
TABLE 6. KolmogorovSmirnov (KS) and
and parametric, respectively, they arrive at the
ShapiroWilk (SW) Statistics, Degree of
same conclusion.
Freedom, and Significant Values of Rosenberg
To supplement these results graphically, we Self-Esteem Scale for 4-, 5-, 6-, and 11-Point
conduct a normal QQ plot. If the distribution is Likert Scales
normal, the plot will form a straight line, while
any degree of departure from that line will indi-
Points KS statistics df Sig SW statistics df Sig
cate a departure from normal. The QQ plot is
shown in Figure 1. 4 .090 231 .000 .980 231 .002
In Figure 1, the 4-point scale has the fewest 5 .066 493 .000 .987 493 .000
data points, and the 11-point scale has the most. 6 .051 219 .200 .994 219 .582
11 .049 272 .200 .994 272 .408
Because RSES is made up of individual items,
Comparison of 4-, 5-, 6-, & 11-Point Likert Scales 419

FIGURE 1. Normal QQ Plots of 4-, 5-, 6-, and 11-Point Rosenberg Self-Esteem Scale (RSES)
Likert Scales
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

feelings toward school and family life. Corre- peers, and the 5-point scale is highest for vari-
lations between RSES and external criteria are able welcome by peers. In feelings toward school
reported in Table 7, where variables used in each and family life, the 4- and 5-point scales have the
area are listed in the first column. highest correlations with feelings toward family
The Table 7 results are very diverse. In SES, and social life, respectively. Overall, there is no
the 6-point scale gives the highest correlation in clear pattern for predictive validity, and no point
terms of income, fathers occupation, mothers scale has dominant predictive validity over oth-
occupation, and SSES. The 4- and 5-point scales ers.
have the highest correlations for fathers and
mothers education, respectively. SSES has a
much higher correlation than others. This im- DISCUSSION
plies subjective perception in SES correlates
better with subjective perception in self-esteem. In this article, using the Chinese RSES, there
In perceived academic competence, the 4-point are no differences among 4-, 5-, 6-, and 11-
scale has the highest correlation for general com- point Likert scales in terms of mean, SD, item
petence, but the 11-point has the highest correla- item correlation, itemtotal correlation, reliabil-
tions in the subjects Chinese, English, and math- ity, exploratory factor analysis, or factor loading.
ematics. In relations with parents (both mother However, the 6- and 11-point scales of RSES
and father), the 11-point scale is the highest, follow a normal distribution, while the 4- and 5-
while the 5-point scale is the highest for relation- point scales do not, which suggests that the more
ships with brothers and sisters. Both the 5- and closely scale points approach normality, the bet-
6-point scales have the highest correlation with ter. With regard to the existence of a neutral
420 S.-O. Leung

TABLE 7. Correlations With External Variables rious. Further, if fatigue is really a factor, we
of Rosenberg Self-Esteem Scale for 4-, 5-, 6-, might have observed some differences in our re-
and 11-Point Likert Scales sults. As that has not happened, there may in fact
be no fatigue problem.
4-pt 5-pt 6-pt 11-pt There is support for using an 11-point scale
in the literature. Cummins and Gullone (2000)
Socioeconomic status
argued that the sensitivity of most peoples per-
Income .17 .16 .19 .02
Fathers education .20 .08 .05 .01 ceptions should not be limited to five levels. The
Mothers education .06 .14 .07 .04 10- and 100-point formats are not new. Before
Fathers occupation .05 .01 .11 .00 Likert (1932), Freyd (1923) had noted that at
Mothers occupation .04 .14 .15 .05
that time, many scales were based on 10- or
Subjective SES .26 .18 .39 .27
Perceived academic competence 100-point formats. Hodge and Gillespie (2007)
General .39 .35 .29 .33 used the Phrase Completion Scale (PCS), also an
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

Chinese .31 .31 .20 .42 11-point scale with labels at endpoints, for three
English .25 .28 .22 .31
reasons: unidimensionality, univocality, and nor-
Mathematics .28 .28 .22 .30
Relationship with mality. They pointed out that PCS performed
Father .22 .23 .15 .24 better on these three aspects and that the tradi-
Mother .22 .18 .20 .28 tional 5-point Likert scale could only capture a
Brothers/sisters .23 .30 .20 .27
portion of the underlying attitudinal continuum
Peers .17 .23 .23 .21
Welcome by peers .33 .41 .26 .29 as it only focused on moderate levels of agree-
Feelings toward ment, which might not reflect those at the ends.
School life .24 .35 .31 .32 One minor complication is whether the range
Family life .38 .31 .27 .35
should start with 0 or 1. Another is whether the
Note. The highest correlations along a row are in bold and italic. ends should be relabeled 5 and +5, or oth-
SES = socioeconomic status. erwise. The author is in line with Hodge and
Gillespie (2007) in thinking that a 0 has a bet-
point, our findings do not show any differences, ter image of the complete absence of an attribute,
as there is no major difference among all point while a 10 fully indicates its presence. Because
scales. The 5- and 11-point scales possess a neu- a 0-to-10 range is very popular and easily com-
tral point, while the 4- and 6-point scales do not. prehended by most people, we prefer to keep the
It should be noted that the effects of a neutral present numberswe recommend an 11-point
point may be diluted by neighboring categories scale, or PCS, because it increases scale sensi-
in a long scale such as an 11-point scale, and this tivity, is closer to normality, and is easily under-
is an added advantage of using a longer scale. stood.
Many researchers question whether the Lik- This article provides evidence to support the
ert scale can be considered as an interval-level use of the 11-point Likert scale, or PCS, but
scale. High reliabilities for RSES in this study more evidence is needed through future research
suggest that at the summated scale level, this is to make more conclusive findings. In terms of
indeed the case. Skewness of individual items correlations, reliabilities, and factor structures,
and RSES seems to suggest they are not normal no differences were found among different point
but that they may be related instead to the na- scales, and this may need further investigation
ture of self-esteem. As the 11-point scale has the and explanation. This study only compares the
smallest kurtosis at both item and scale level, it 11-point scale with the 4-, 5-, and 6-point scales,
is closer to normality than otherwise. However, and the comparison between the 11-point scale
there are arguments against an 11-point scale. and the 7- to 10-point scales is limited. This em-
With so many points, more effort, and hence fa- pirical study can be further extended by utilizing
tigue, may be entailed in choosing for each item. measuring instruments other than RSES and in-
But a scale from 0 to 10 is a very popular and vestigating cultural as well as other demographic
natural system, which most people become used differences such as gender and age. In addition,
to, and the problem may accordingly not be se- the author is now conducting simulations from
Comparison of 4-, 5-, 6-, & 11-Point Likert Scales 421

both symmetric normal and other skewed dis- Garland, R. (1991). The midpoint on a rating scale: Is it
tributions to study the effects of the number of desirable? Marketing Bulletin, 2, 6670.
points in a Likert scale toward departure from Goldstein, G., & Hersen, M. (1984). Handbook of psycho-
normality. logical assessment. New York, NY: Pergamon Press.
Hart, M. C. (1996). Improving the discrimination of
Regarding the implications toward social ser- SERVQUAL by using magnitude scaling. In G. K. Kanji
vice practices, frontline social workers and (Ed.), Total quality management in action (pp. 267
teachers very often use the Likert scale to assess 270). London, England: Chapman and Hall.
students and adolescents on different psycholog- Hodge, D. R., & Gillespie, D. F. (2007). Phrase comple-
ical or educational constructs. In addition, youth tion scales: A better measurement approach than Lik-
may be asked to fill in questionnaires themselves ert scales? Journal of Social Service Research, 34(4),
as part of the assessment exercises. In both cases, 112.
Kislenko, K., & Grevholm, B. (2008, July). The Likert
either being assessed or self-reported, this article
scale used in research on affectA short discussion of
supports the use of the 11-point scale, not only terminology and appropriate analyzing methods. Paper
Downloaded by [University of Tennessee, Knoxville] at 09:00 17 March 2013

for its psychometric properties but for also its presented in the 11th International Congress on Mathe-
easy comprehension. matical Education, Monterrey, Mexico.
Leung, S. O. (2009). Subjective social economic status of
adolescence in Macau. International Journal of Adoles-
REFERENCES cence and Youth, 15, 145153.
Leung, S. O., & Wong, P. M. (2008). Validity and reliability
Allen, E., & Seaman, C. (2007). Likert scales and data of Chinese Rosenberg Self-Esteem Scale. New Horizons
analyses. Quality Progress, 40(7), 6465. in Education, 56(1), 6269.
Borgers, N., Sikkel, D., & Hox, J. (2004). Response ef- Likert, R. (1932). A technique for the measurement of atti-
fects in surveys on children and adolescents: The effect tudes. Archives of Psychology, 140, 553.
of number of response options, negative wording, and Parker, P. L., McDaniel, H. S., & Crumpton-Young, L. L.
neutral midpoint. Quality and Quantity, 38(1), 1733. (2002). Do research participants give interval or ordi-
Chang, L. (1994). A psychometric evaluation of 4-point nal answers in response to Likert scales? Retrieved from
and 6-point Likert-type scales in relation to reliabil- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.
ity and validity. Applied Psychological Measurement, 1.1.19.6352&rep=rep1&type=pdf
18(3), 205215. Robins, R. W., Hendin, H. M., & Trzesniewski, K. H.
Chang, M. L. M. (2010). A correlational study between (2001). Measuring global self-esteem: Construct val-
organizational citizenship and organizational justice idation of a single-item measure and the Rosenberg
among teachers in Macao (Unpublished masters the- Self-Esteem Scale. Personality and Social Psychology
sis). University of Macau, Macau, China. Bulletin, 27(2), 151161.
Clason, D. L., & Dormody, T. J. (1994). Analyzing data Somech, A., & Ron, I. (2007). Promoting organizational
measured by individual Likert-type items. Journal of citizenship behavior in schools: The impact of indi-
Agricultural Education, 35(4), 3135. vidual and organizational characteristics. Educational
Cummins, R. A., & Gullone, E. (2000). Why we should Administration Quarterly, 43(1), 3866.
not use 5-point Likert scales: The case for subjective Wu, C.-H. (2007). An empirical study on the transforma-
quality-of-life measurement. In Proceedings of the Sec- tion of Likert-scale data to numerical scores. Applied
ond International Conference on Quality of Life in Cities Mathematical Sciences, 1(58), 28512862.
(pp. 7493). Kent Ridge, Singapore: National Univer- Wyatt, R. C., & Meyers, L. S. (1987). Psychometric
sity of Singapore. properties of four 5-point Likert-type response scales.
Freyd, M. (1923). The graphic rating scale. Journal of Ed- Educational and Psychological Measurement, 47(1),
ucational Psychology, 14, 83102. 2735.

You might also like