You are on page 1of 44

EDUC684

Measurement and Psychological


Assessment
Today’s Agenda
• My contact information
• About you
• Review syllabus
• What is a Test/Assessment
• Concepts
• Uses for Tests
• Historical Perspective
My Contact Information
• Tim Victor
• Email: tvictor@gse.upenn.edu
• Phone: 484.432.3550
What is a Test or Assessment?
• The term test is used to refer to a
standardized and systematic procedure for
obtaining a sample of behavior.

• The term assessment generally refers a


broader process of collecting information.
12 Assumptions Underlying Testing and
Assessment
1. Psychological traits and states 7. Various sources of error are part of
exist the assessment process.
2. Psychological traits and states 8. Tests and other measurement
can be quantified and measured. techniques have strengths and
3. Various approaches to measuring weaknesses.
aspects of the same thing can be 9. Test-related behavior predicts non-
useful. test-related behavior.
4. Assessment (not just testing) can 10. Present-day behavior sampling
provide answers to some of life's predicts future behavior.
most momentous questions. 11. Testing and assessment can be
5. Assessment can pinpoint conducted in a fair and unbiased
phenomena that require further manner.
attention or study. 12. Testing and assessment benefits
6. Various sources of data society.
(information) enrich and are part
of the assessment process.
What is Measurement?
• Measurement is the process of ordering
observations along a scale.
• Even if you are not interesting in testing per
se, you should be interested in
measurement.
• Without sound measurements, any research
result is suspect.
Testing Concepts
1. A test focuses on a particular domain.
2. A test is a sample of behaviors, products,
answers, or performances from the domain
of interest.
3. A test is an aid in making inferences about
the larger domain of interest.
What is a Test?
• Standardized procedure for sampling & describing
behavior or other phenomenon
• Characteristics
– Standardized procedure
– Behavior sample
– Score/categories
– Norms
– Prediction of important behaviors
• Psychometrician
– A psychology or education specialist who develops and
evaluates tests
Standardized Procedures
• Uniform procedures for administering the test
• Assures that everyone across settings and
examiners takes the same test
• Assumes competent examiners
• Test developers/publishers formulate instructions
and produce stimulus materials
• Instructions must specify exactly how the test is to
be administered and under what conditions.
Sample of Behavior
• The sample of behavior is limited
• Behavior sample is useful if it allows one to
predict some important, real-world behavior.
– IQ à Academic Achievement
– MCAT à Medical School Performance
• Test items do not have to resemble the behavior
that you are trying to predict.
• The only psychometric requirement is that the
response to the item predicts the behavior of
interest.
Scores/Categories
• Most tests produce numeric scores or place
individuals in categories
• We measure abstraction, so test scores also
include error
– Yij = t.j + eij
• Yij is the observed response for person j of
their theoretical ith attempt
• t.j is jth respondent’s true ability
• eij is measurement error for that observation
Norms or Standards
• Norms
– A summary of test results for a large and representative
group of subjects
• An individual’s test performance is interpreted by
comparing it to the normative group
• Norms establish average or typical performance
• The Standardization Sample must be
representative of the population for whom the test
is intended
Prediction of Nontest Behavior
• The actual test items are usually of little interest in
and of themselves, but are of great interest if they
are useful in predicting real-world behavior.
• The items’ ability to do so depends solely on
validation research
• Do not be persuaded by test titles, marketing, etc.
Only well-done validity studies will determine the
usefulness of test scores.
Norm vs. Criterion-Referenced Tests
• Norm-references tests rely on a normative group
for interpretation of individual scores.
• Criterion-referenced tests measure an individual’s
performance relative to a specified standard or
criterion.
• For example, if Jane must get 16 out of 20
questions correct in order to advance to the next
reading level, then the criterion is 16. There is no
normative group to which Jane is compared.
Testing versus Assessment
• Testing involves the administration and
interpretation of tests
• Assessment refers to a more comprehensive
process. This process includes observations,
interviews, checklists, inventories, as well
as tests.
• Group versus Individual Tests.
Types of Psychological Tests
• Ability • Personality
– Intelligence – Personality
– Aptitude – Interest inventories
– Achievement – Behavioral procedures
– Creativity
– Neuropsychological
Intelligence Tests
• Sample a BROAD range of skills, each
requiring some degree of intellectual ability,
in order to estimate GENERAL level of
intelligence
• Word definitions, memory for designs,
comprehension items, reasoning, arithmetic,
etc.
• Usually the total score (IQ) is of the greatest
interest.
Aptitude Tests
• Single versus Multiple Aptitude Test
Batteries
• Usually used to predict success in an
occupation training course or educational
program.
• Most commonly used to determine college
admissions: SAT, GRE
• Useful because they predict academic
success.
Achievement Tests
• Measures success or accomplishment in a
subject; how much has the person learned?
• Aptitude tests are designed to predict future
learning. Achievement tests are supposed to
measure past learning.
• The difference between each type of test is
in how they are used.
Creativity Tests
• Instead of emphasizing the correct answer,
creativity tests probe for multiple and even unique
responses to problems.
• Divergent versus Convergent thinking.
• Creativity tests were immediately popular with
educators
• And less popular with psychometricians, who see
a fairly strong relationship between scores on
creativity and intelligence tests.
Neuropsychological Tests
• Used in cases of brain damage/dysfunction
• Multiple test battery
• Tests sensory, motor, cognitive-memory, personality-
behavior processes.
• Very lengthy administration time, usually takes 3—8 hours
• Specialized area of clinical psychology
• Primary use now is in describing functional capacities of
brain injured patients.
• Some have argued that these tests are rendered obsolete in
detecting and localizing brain damage by advanced
imaging techniques.
Personality Tests
• Measure traits, qualities, and behaviors
• Used to predict future behavior
• Many different forms, including:
– Questionnaires
– Projective
– Checklists
– Inventories
Interest Inventories
• Measures preferences for certain activities
and occupations

• Assumption is that interest patterns are


related to and predict job satisfaction
Behavioral Procedures
• Antecedents and consequences of behavior
• Number of occurrences over a specified
period of time.
• Checklists, rating scales, interviews, and
structured interviews.
Uses of Tests
• Overall – To make decisions about
persons
• Five general uses of test scores
1. Classification
2. Diagnosis and Treatment Planning
3. Self-Knowledge
4. Program Evaluation
5. Research
Classification
• Assigning a person to one category rather
than another

• Differential treatment
– People in one category will be treated
differently from people in other categories.
Types of Classification
• Placement
– Test scores are used to sort people into different
programs according to skills, e.g., the math
placement test
• Screening
– Quick, simple tests used to identify people who
might have a certain characteristic. Often result
in many misclassifications.
Types of Classification
• Certification
– Pass/Fail
• Usually passing confers privileges
– Usually intended to indicate minimum competency
– Driver’s exam is an example
Diagnosis
– Determining the nature and source of an abnormality
(behavioral or otherwise) and classifying it within an
acceptable diagnostic system, e.g., DSM-IV, ICD-9
– Should lead to some form of remediation or treatment
Who May Purchase (License) Tests?

• Many test purchases are restricted because:


– Unqualified persons may cause harm (some
“qualified” person cause harm too)
– Exposure to test content to potential examinees
invalidates the selection process
– If test content is made public, the test is
rendered useless.
Who May Purchase (License) Tests?
• Type A
– Tests, books, guides, and programs that are available to
any licensee
• Type B
– User must have earned a degree from accredited college
or university and have satisfactorily completed a course
in the interpretation of psychological assessments
• Type C
– User must fulfill all requirement qualifications required
of “B” users, plus possess an advanced degree in a
profession that provides specialized training in the
interpretation of psychological assessments.
Information Sources
• Mental Measurements Yearbook
• Test Critiques
• Tests in Print
• Test Publisher Catalogs
• The Internet
Standardization
• Test instructions and procedures must be
followed to the letter
• Modification of the test in any way will
likely modify the validity of the inferences
based on test scores.
• Some slight modifications are permissible
under special circumstances
Desirable Testing Conditions
• Examiners must be very familiar with test
materials and procedures

• Some instructional sets and procedures are


quite complex

• Rehearsal of procedure and preparation for


the unexpected
Desirable Conditions: Group Testing
• Incorrect timing – too much or too little
• Incorrect presentation of the directions
• Poor physical conditions:
– Inadequate lighting
– Room temperature
• Noise control
• Writing surface
• Penalties/Advantages for guessing?
Examiner Influences
• Rapport
– Establishing an atmosphere of comfort and trust, where
examinees are highly motivated and cooperative
– This is especially important for individual testing
– Failure to establish rapport may result in anxiety,
affecting ability and personality test results
– Examiners with different styles may get different IQ
values when testing similar children
Disabilities
• Impairments in hearing, vision, speech or motor
control can seriously distort test results
• Some people have multiple impairments which
can cause test results that dramatically
underestimate IQ
• Cases where unrecognized deafness has led to
misdiagnosis of mental retardation
• Speech problems – especially difficult to score
responses if the examiner cannot understand the
respondent.
Disabilities
• Motor problems, such as cerebral palsy, may
disadvantage examinees on nonverbal or
performance tests that require motor responses.
• Sometimes standardized tests have been adapted
for examinees with special needs
• In other cases, special tests have been developed
altogether
• My advice is to get an expert in testing the
physically disabled, someone who is more likely
to have the correct instruments and is aware of the
potential difficulties.
Historical Milestones in Measurement & Testing
2200 BC Chinese emperors set up civil service exams

1115 BC Formal examinations were used for candidates of public office in China

1219 University of Bologna employed oral & written law examinations

1400s Louvain University used oral exams to place students in the following
categories: Honors, Satisfactory, Charity Passes, and Failures
1510 Fitzherbert (1470-1538) proposed a test of mentality consisting of counting 20
pence, telling one's age, and identifying one's father
1500s During the late 1500s, Jesuits (the Catholic order founded by St. Ignatius of
Loyola) uniformly adopted written tests for student placement and
evaluation
~1800 Written examinations were commonly replacing oral exams because of
questions of fairness
1809 Gauss (1777-1855) developed a theory concerning errors in observations
Historical Milestones in Measurement & Testing
1845 Schools begin testing students in a uniform way: oral examinations
were state-of-the-art. Boston was the first district to use short-
answer tests throughout its schools
1869 Galton (1822-1911) published Classification of Men According to Their
Natural Gifts, which stimulated the study of mental inheritance and
individual differences; he is considered the founder of individual
psychology
1874 Portland, Maine began standardized testing, based on a citywide
curriculum and a test to measure whether students successfully
learned it
1879 Wundt (1832-1920) founded the first psychological laboratory in
Leipzig, Germany
1888 Cattell (1860-1944) opened a testing laboratory at the University of
Pennsylvania; helped to establish the foundations of mental
measurement in the USA
~1900 The College Entrance Examination Board (newly organized)
administered essay exams in rhetoric, Greek, and other pre-school
curriculum basics to divide students among universities [achieving
a better fit between prep-schools & Ivy League]
Historical Milestones in Measurement & Testing

1904 Pearson (1857-1936) formulated the theory of correlation

1904 Spearman (1863-1936) introduced a two-factor theory of intelligence that posited a


general factor (g) and specific factors (s)
1905 Binet (1857-1911) and Simon (1873-1961) developed a useful intelligence test for
screening school children
1906 E. L. Thorndike (1874-1949) studied animal intelligence, formulated laws of
learning, and developed principles of test construction
1909 Goddard (1866-1957) translated the Binet-Simon Scale from French into English

1915 American Educational Research Association founded

1916 Terman (1877-1956) published the Stanford Revision and Extension of the Binet-
Simon Intelligence Scale; with Merrill (1888-1978), in 1937 he issued a
revision called the Stanford-Binet Intelligence Scale; other revisions were
published in 1960 and 1972
Historical Milestones in Measurement & Testing

1917 Yerkes (1876-1956), with colleagues, published the Army Alpha and Army Beta
tests, which were group-administered intelligence tests used for the
assessment of military recruits in the USA
1923 Kelly (1884-1961), Ruch (1903-1982), and Terman (1877-1956) published the
Stanford Achievement Test
1920s The College Board began investigating the uses of MC items in college entrance
exams
1926 First administration of the Scholastic Aptitude Test, designed by Brigham,
professor of Psychology at Princeton
1933 Thurstone (1887-1955) proposed a multiple factor analytic approach to the study
of human abilities
1933 Tiegs (1891-1970) and Clark (1895-1964) published the Progressive Achievement
Tests, later renamed the California Achievement Tests
1933 Harvard University, through the efforts of President Conant and his assistant
Chauncey, used the SAT for their scholarship program
1936 Lindquist (1901-1978), with colleagues, published the Iowa Every-Pupil Tests of
Basic Skills, later renamed the Iowa Test of Basic Skills
Historical Milestones in Measurement & Testing

1930s By the late 1930s, MC tests replaced most of the College Board’s essay tests

1937 Bender (1897-1987) published the Bender Visual Motor Gestalt Test

1937 National Council on Measurement in Education founded

1938 Buros (1905-1978) published the first Mental Measurements Yearbook

1939 Wechsler (1896-1981) published the Wechsler-Bellevue Intelligence Scale;


revisions were issued in 1955 and 1981 under the titles Wechsler Adult
Intelligence Scale and Wechsler Adult Intelligence Scale--Revised

1948 ETS was established and could process 4,000 tests a day. Today, they can score
64,000 forms a day
1949 Wechsler Intelligence Scale for Children was published; a revision was issued in
1974 under the titles Wechsler Intelligence Scale for Children--Revised
Historical Milestones in Measurement & Testing

1950s Lindquist, at the University of Iowa, led the development of the optical scanner

1959 Guilford (1897- ) proposed a Structure of Intellect model of intelligence based on


factor analytic methods
1969 National Assessment of Educational Progress initiated, testing samples of
students aged 9, 13, and 17 years
1975 U.S. Public Law 94-142 passed, proclaiming the right to equal education for all
handicapped children
1979 Judge Peckham in California ruled in Larry P. v. Wilson Riles that intelligence
tests used for the assessment of Black children for classes for the educable
mentally retarded are culturally biased.
1980 Judge Grady in Illinois ruled in Parents in Action on Special Education v. Joseph
P. Hannon that intelligence tests are not racially or culturally biased and do
not discriminate against Black children
1986 R. L. Thorndike (1910- ), Hagen (1915- ), and Sattler (1931- ) published the
Stanford-Binet Intelligence Scale: Fourth Edition, in which a point-scale
format replace the age-scale format of the Stanford-Binet Intelligence Scale:
Form L-M.
Historical Milestones in Measurement & Testing

2000 A federal district court upheld the Texas graduation test in GI Forum v. Texas
Education Agency. The court held "While the [graduation test] does adversely
affect minority students in significant numbers, the [state] has demonstrated
an educational necessity for the test, and the Plaintiffs have failed to identify
equally effective alternatives…. The [state] has provided adequate notice of
the consequences of the exam and has ensured that the exam is strongly
correlated to material actually taught in the classroom. In addition, the test is
valid and in keeping with current educational norms. Finally, the test does not
perpetuate prior educational discrimination…. Instead, the test seeks to
identify inequities and to address them" (Phillips, 2000).

You might also like