You are on page 1of 14

Received: 12 June 2017 Accepted: 4 December 2017

DOI: 10.1002/pits.22100

RESEARCH ARTICLE

Examining learning rates in the evaluation of


academic interventions that target reading fluency
Benjamin G. Solomon1 Brian C. Poncy2 Devin J. Caravello1
Emily M. Schweiger1
1 University at Albany, State University of New

York Abstract
2 Oklahoma State University The purpose of the current study is to determine whether single-
Correspondence case intervention studies targeting reading fluency, ranked by tra-
Benjamin Solomon, PhD, University at Albany, ditional outcome metrics (i.e., effect sizes derived from phase dif-
Division of School Psychology, Department of ferences), were discrepant with rankings based on instructional effi-
Educational and Counseling Psychology, State
University of New York Education 232, 1400
ciency, including growth per session and minutes of instruction. Con-
Washington Ave. Albany, NY 12222. verging with prior findings, we found great variability in reported
Email: bgsolomon@albany.edu sessions and minutes of instruction across studies, as well as diver-
gences in rankings based on outcome variables. These findings raise
questions as to how literature syntheses on the topic of academic
intervention are interpreted and how selection of evidence-based
intervention occurs.

KEYWORDS
cumulative instructional time, effect sizes, oral reading fluency

A fundamental responsibility of trainers and school-based practitioners is the promotion and dissemination of
evidence-based intervention (EBI), which is in line with a national call for use of educational practices with a prepon-
derance of high-quality evidence (e.g., Individuals with Disabilities in Education Act, 2004). Such a focus has spurred
increased emphasis on the use of experimental designs to generate valid causal conclusions regarding treatment effec-
tiveness (Kratochwill et al., 2010; Shadish, Cook, & Campbell, 2001). However, the demand for a higher and consistent
standard regarding the identification of EBI has also put focus on how study findings are summarized and presented to
consumers. That is, academic growth under the presence of intervention can be summarized in different ways, which
can be misleading if nuisance variables are not controlled and comparative metrics equated over studies (Bramlett,
Cates, Savina, & Lauinger, 2010; Poncy et al., 2015; Skinner, Belfiore, & Watson, 1995; Wolery, Busick, Reichow, & Bar-
ton, 2010). Such confusion can lead to the selection of EBIs that are less than optimal given the interventions available
in the peer-reviewed literature.

1 THE ROLE OF CUMULATIVE INSTRUCTIONAL TIME (CIT) IN THE


DISCUSSION OF EBIs

Skinner et al. (1995) identified CIT, or the amount of time required for an intervention to be administered as described,
as an overlooked variable in the comparative analysis of EBIs. One intervention may appear more effective than

Psychol Schs. 2018;55:151–164. wileyonlinelibrary.com/journal/pits 


c 2017 Wiley Periodicals, Inc. 151
152 SOLOMON ET AL .

another partially or totally because its reported dosage, in terms of required session length and number of sessions,
is greater in one study than in that of a comparison intervention, raising the question as to what would happen had
CIT been equated. Practitioners may then elect to use interventions that have been shown to be effective, but increase
learning rates to an inferior level relative to another available intervention. To remedy this, the authors encouraged
shifting discussion of outcomes from phase differences or pre- to post-improvement to rates of learning during inter-
vention, i.e., the difference in skill performance per minute of instruction from one instructional environment, such as
the baseline, to the next, such as the intervention.
Consideration of CIT is most relevant in comparative research where two or more interventions are contrasted.
Classically, when an intervention is compared to a no-treatment group or baseline phase, CIT is not a concern because
the “business-as-usual” condition represents no time away from everyday instruction (Poncy et al., 2015). In this case,
although the CIT of the intervention condition is of interest, the rate of learning cannot be compared to the no-
treatment condition, where no time was spent in intervention. However, when two interventions are compared within
or across experiments, comparisons of CIT become meaningful, as both interventions represent time away from the
student's primary learning environment and the interventions can be precisely timed. Accompanying traditional out-
come metrics with such rate-based metrics allows for equitable comparisons across studies (Skinner, 2008; Skinner
et al., 1995; Skinner, Fletcher, & Henington, 1996). This is particularly important given that instructional time in schools
is at a premium (Fisher et al., 1978; Gettinger & Ball, 2008). A basic tenet of the EBI movement is that research should
increasingly move toward establishing the best intervention for a given instructional deficit. Therefore, comparative
studies that contrast multiple interventions should become more frequent as the technology of EBI moves forward
and discussion of rates of learning become more relevant.

2 APPLIED EXAMPLES OF EQUATING CIT

Supporting this conceptual argument is a line of applied studies that have targeted CIT as the time function by which
change in the dependent variable (DV) was measured. For example, Nist and Joseph (2008) specified growth along both
the x-axis of intervention sessions and CIT, comparing improvement in words mastered under the presence of flashcard
drill and practice and incremental rehearsal interventions. Under the context of traditional session-by-session growth,
incremental rehearsal was most effective. However, in the latter analysis, flashcard drill and practice was most efficient.
The authors theorized that this was because the use of known words in incremental rehearsal would draw out CIT as
these words do not represent learning opportunities.
Additional studies comparing growth by way of sessions and CIT appeared around this time (Cates et al., 2003;
Joseph & Nist, 2006; Volpe, Mulé, Briesch, Joseph, & Burns, 2011; Yaw et al., 2014). The message across these stud-
ies was fairly consistent: rate-based metrics need to be considered and often diverge from the outcomes commonly
reported, such as phase differences or rate of growth per session, not only when experimentally comparing interven-
tions but also within the broader context of EBI selection, literature review, and meta-analysis.

3 CONSIDERING CIT WITHIN META-ANALYSIS

The implications of CIT are particularly relevant in the context of meta-analysis. Meta-analysis is a framework for
pooling and comparing treatment effects across studies by standardizing the between groups or phase difference
within studies (i.e., the effect size [ES]), commonly between a no-treatment and treatment condition. Advances in
the technology by which single-case (SC) studies can be summarized using ESs have resulted in a sharp increase in
such studies. An ES is typically selected that reflects the difference between an adjacent intervention phase and the
baseline phase, mirroring their group design counterparts (i.e., Cohen's d, Hedges’ g), and a variety of parametric and
nonparametric techniques are used to accomplish this task (e.g., Parker, Vannest, Davis, & Sauber, 2011; Shadish,
SOLOMON ET AL . 153

Rindskopf, & Hedges, 2008). However, an assumption of comparative meta-analysis is that included studies are
sufficiently exchangeable, meaning that included ESs of the sample are random draws from the possible population
of all studies. Differences among studies may occur, but are not predictable prior to analysis (Cameron et al., 2015;
Higgens, Thompson, & Spiegelhalter, 2009). Large discrepancies in CIT likely spur a violation of this assumption.
Several meta-analyses of reading intervention studies using group designs have acknowledged the potential influ-
ence of CIT by conducting moderator analysis, typically categorizing intervention studies into groups based on arbi-
trary cut-offs for the number of sessions or hours of interventions reported (e.g., Flynn, Zheng, & Swanson, 2012;
Scammacca, Roberts, Vaughn, & Stuebing, 2015; Suggate, 2016). These studies are discrepant in their findings, with
some reporting the duration of intervention-moderated effects and others reporting that the duration of intervention
was not influential. For example, Berkeley, Scruggs, and Mastropieri (2010) reported that medium-length interventions
(1 week to 1 month) were more effective on average than short (<1 week) or long interventions (>1 month). However,
this traditional approach does not equalize the effect of CIT on other, more meaningful comparisons (i.e., comparative
effectiveness of different interventions). This is also in contrast to how Skinner et al. (1995) envisioned evaluating the
comparative quality of interventions: as operating on rates of growth based on time spent in intervention and fixing
dosage across all other dimensions of analysis. These estimates, pooled by intervention or a relevant moderator, may
obscure the instructional efficiency of individual interventions.
To demonstrate how CIT potentially serves as a nuisance variable in research synthesis, Poncy et al. (2015) reviewed
SC intervention studies targeting math fluency, including eight different types of interventions. These authors ranked
the associated outcome metrics by traditional ES (i.e., phase comparison) and when the rate of growth was measured
as ES/CIT. Their ranking of interventions from most to least effective varied substantially based on the metric used. The
authors concluded that meta-analysis and applied comparative study to increase math fluency is not straightforward;
in either case, learning rates should be considered. Poncy et al. (2015), however, had two limitations warranting further
research. First, these authors focused on the subset of the peer-reviewed literature that described math interventions
and associated digits correct per minute outcomes. It is possible that findings in literacy-based interventions are dif-
ferent, whereby traditional metrics of growth (the ESs) mirror rate-based metrics, and thus CIT is not a practical cause
for concern.
Second, Poncy et al. (2015) defined their rate metric as summative ESs divided by CIT. This is in line with most extant
SC meta-analyses, which generally have used more traditional ESs that reflect average phase differences or, more com-
monly, phase nonoverlap. However, doing so has limitations. These rates have no identifiable variance and can nei-
ther be tested for significance nor pooled using traditional meta-analytic procedures. They are primarily descriptive in
nature. Fortunately, these hurdles can be overcome by, alternatively, using simple growth models, such as those out-
lined by Moeyaert et al. (2015), who provided adjustments to traditional regression analysis to capture such effects
specifically for SC designs. Rather than dividing the calculated ES by CIT post hoc, a time-series regressor is specified
in an ordinary least squares regression (OLS) that mirrors the rate metric in question (e.g., CIT accumulated over the
course of the intervention [see Method section]). Unlike dividing ES by CIT, results of this method are easy to visual-
ize, result in standard OLS beta weights, and such effects can be pooled using traditional meta-analytic procedures.
As an example, Figure 1 yields growth estimates for subject two from Walcott, Marett, and Hessel (2014), which is
an included article from the current study. Here, we demonstrate how regressions isolating (1) average growth across
phases, (2) growth per session of intervention, or (3) growth per 15 minutes of intervention results in different esti-
mates of effect and, therefore, different interpretations of the outcome data.

4 CURRENT STUDY

Isolating the instructional efficiency of an intervention has direct implications for researchers and practitioners engag-
ing in the testing and selection of EBI. The current study builds upon Poncy et al. (2015) by exploring CIT atten-
uation across the body of reading fluency intervention work for traditional growth and rates of learning (CIT) and
154 SOLOMON ET AL .

F I G U R E 1 Linear regression methods by which to capture study effects. The upper panel demonstrates change mea-
sured as a phase difference. The lower right panel shows change captured for session-by-session growth. The bottom
left panel envisions growth by means of cumulative instructional time, specified as per 15 minutes of instruction

session-by-session growth across SC studies. The purpose of the current study is to investigate the extant literature
for reading fluency interventions, isolating and comparing various rates of learning using growth models. Our research
questions were:

1. Do studies ranked by traditional outcome metrics (i.e., phase differences) differ in order when ranked as rate metrics
(learning rate defined session-by-session or by CIT)?
2. What is the statistical relationship among these outcome metrics?

5 METHOD

5.1 Criteria for study inclusion


We conducted ERIC, PsychInfo, and EBSCO searches using the key-term combinations “oral reading fluency” (ORF)
and “intervention” and “words correct per minute” (WCPM) and “intervention” in December 2016. An updated search
was completed in September 2017. We identified 197 unique peer-reviewed articles. These articles were subjected to
the following rule-out criteria:
SOLOMON ET AL . 155

1. The article describes an SC experimental reading intervention. Experimental is defined as the study describing the
evaluation of at least two conditions entailing at least three demonstrations of behavioral change at separate time
points (Kratochwill et al., 2010). Intervention is defined as any modification of literacy instruction beyond described
typical classroom instruction for a subset of students.
2. A baseline condition is included that entailed no treatment. All studies that met this qualification also included for-
mative ORF as a DV, which otherwise would have served as an additional rule-in criteria.
3. The duration of the intervention session is reported in minutes or seconds as well as the number of sessions.

Identified articles from the Boolean searches were first screened for appropriateness based on their title and
abstracts. Remaining articles were then sequentially screened by rule-in criteria as described previously. Figure 2
details this process and associated reliability (kappa) from an independent rater for each criterion. An additional three
SC articles were removed upon further inspection because ORF progress monitoring occurred on trained probes (e.g.,
final read of repeated readings), whereas all other included studies examined generalization probes. The final pool of
articles consisted of 18 SC studies. These studies were then organized into homogenous groups (see Table 1). An inde-
pendent researcher reviewed these categories and then grouped studies blind to the coding of the first author. Agree-
ment between the raters was 100%.

5.2 Oral reading fluency


In the current study, we used ORF WCPM as an outcome metric. Doing so allowed us to investigate effects over both
standardized and unstandardized metrics. Furthermore, ORF is a straightforward capstone skill that entails little con-
cern regarding instructional scope. This would be a concern for discrete instructional sets such as letter-sounds or
sight-word acquisition, where one study may use, for example, five letter sounds as an outcome, while another may use
20, affecting outcome metrics (Poncy et al., 2015).

F I G U R E 2 Article search and elimination process. Numbers in parentheses represent the number of articles elimi-
nated at each step
156 SOLOMON ET AL .

TA B L E 1 Summary of included interventions

Intervention Description
Repeated reading (RR) group A target probe is presented to a student who reads it multiple times with
some level of corrective feedback. Some minor variations in the primary
procedure exist.
RR+ Additional components are added that serve different purposes and may
significantly increase cumulative instructional time. Examples include
listening passage preview or phrase drill error correction.
Peer administered Interventions included peer-assisted learning strategies (Fuchs et al.,
2001) that use explicit instruction and repetition of passages that are
administered by a peer student.
Direct instruction Broad-based explicit instruction with time devoted to teaching multiple
evidence-based pillars of reading. For example, phonics (15 minutes),
fluency (10 minutes), and vocabulary (5 minutes).
Earobics Interactive computer-based instruction that focuses on multiple skills
with a general focus on phonics.
Folding in flashcarding A flashcard-based acquisition intervention entailing presentations of
known and unknown words. Self-graphing was also used as reported
presently.
Read naturally software edition RRs with additional comprehension question and vocabulary preview.
Because the intervention is computer administered, it was separated
from the RR studies.
Video modeling Students view themselves reading prior to practicing fluency passages in
conjunction with performance feedback and goal setting.

6 PROCEDURES

6.1 Data recovery


GraphClick (Version 3.0.3; 2012) was used to recover data from article PDFs. This program retrieves raw-scale data
after the user manual defines axes on the copied digital graph. Several studies have reported that GraphClick is both
reliable and valid for the purpose of extracting SC data (Flower, McKenna, & Upreti, 2016; Rakap, Rakap, Evran, & Cig,
2016).

6.2 Outcome variables


We calculated average phase differences, equivalent to a traditional ES, as ORFi = 𝛽 0 + 𝛽 1 phase + 𝜀i , where 𝛽 1
is the statistic of interest and subscript i refers to the value of ORF at time point i. The corresponding model for
session-by-session growth was ORFi = 𝛽 0 + 𝛽 1 phase + 𝛽 2 session + 𝛽 3 phase × session + 𝜀i . Here, 𝛽 3 , the interaction,
is the statistic of interest, representing differences in growth slopes from baseline to intervention, and 𝛽 1 and 𝛽 2
represent main effects for phase and time, respectively. To generate standardized ESs, raw ESs were standardized by
the standard deviation (SD) of the residuals of the OLS-estimated line of best fit and then adjusted for small sample
bias using Hedges’ (1981) approximation, 1- [3/4m-1]m, where m equals the degrees of freedom for the model, m = i – p,
i equals the number of measurement occasions, and p equals the number of predictors, as recommended by Moeyaert
et al. (2015) and Van den Noortgate and Onghena (2008), which is designated g. Subject-level effects were weighted by
the inverse of their variance and pooled within studies, resulting in study-level effects. When multiple studies existed
within an intervention category, we pooled studies, weighting by sample size, to increase interpretability (Hunter &
Schmidt, 2004).
To calculate improvement in ORF as growth by CIT, we replaced the 𝛽 2 session with 𝛽 2 CIT in the above regression,
where time is measured as accumulated instructional time. So that this model resulted in coefficients that were
interpretable (i.e., not infinitesimal), we specified growth in ORF per 15 minutes of CIT rather than per minute. For
SOLOMON ET AL . 157

example, if an intervention was reported to take 20 minutes per session across four sessions, the new time predictor
increased accordingly: 20/15, 40/15, 60/15, 80/15, or 1.33, 2.67, 4.00, 5.33. This model is an effective mathematical
representation of Skinner et al.’s 1995 difference in the rate of learning. Target metrics included bphase , gphase, bsession ,
gsession , bCIT , and gCIT . Each initial adjacent comparison from baseline to intervention was included. All SC studies were
multiple baseline designs. If reported instructional session time varied by subject or session, the mean value of the
reported range was used.
Eight different intervention types were identified. We aggregated results into tables and figures as described
later. Next, and most importantly, we calculated correlations across outcome DVs, correcting for family-wise error,
so as to explore covariance between growth per minute of intervention, growth per session, and the traditional phase
difference ES. In this analysis, each study participant was treated as an individual subject. A related secondary analysis
investigated whether the additional predictors (i.e., the slope and interaction term) explained additional variance in
ORF scores beyond that of a linear model with only the phase predictor. Therefore, the current study analyzed results
at two levels: by intervention type and by subject, collapsed across interventions.

7 RESULTS

7.1 Summary of included studies


Included studies are summarized in Table 2. The average number of baseline sessions, summed across participants, was
X̄ = 33.76, SD = 19.45, and the average intervention phase length, summed across subjects, was X̄ = 59.00, SD = 34.25.
The average number of subjects per study was X̄ = 4.82. The average instructional time per session per student and the
number of intervention sessions was X̄ = 21.95 minutes, SD = 10.83, and X̄ = 12.20, SD = 6.46, respectively. Therefore,
the average CIT per subject was X̄ = 270.35 minutes, SD = 208.13, ranging from as few as 40 minutes of instruction
spanning 2 weeks to as long as 1,110 minutes spanning most of the school year.

7.2 Summary across metrics


Results serve to contrast variability across metrics when looking at effects within the context of average phase change,
session-by-session growth, and growth per 15 minutes of delivered instruction (e.g., CIT). Individual metrics are first
discussed to facilitate understanding of how such shifts may alter discussion of EBI in practice, followed by comparisons
across time units of analysis at the subject level.

7.3 Average growth across phases (traditional estimation)


Here, distributional information represents the linear regression with one predictor, phase, representing the average
improvement in ORF from baseline to intervention, which is reported in Table 3 (gphase , bphase ). Table 4 summarizes
ordinal intervention ranking positions across all metrics. This estimator is comparable to traditional SC ESs, which com-
monly quantify the difference or nonoverlap value of the average intervention performance against average baseline
performance. The weighted mean for the standardized effect across intervention categories was gphase = 2.14, 95% CI
[2.12, 2.16], whereas the mean for the unstandardized effect was bphase = 16.75 WCPM [16.73, 16.77]. Folding-in flash-
carding and repeated readings plus additional components yielded the two highest standardized effects (gphase = 3.69,
2.67, respectively), whereas repeated readings plus additional components and video modeling yielded the two high-
est unstandardized effects (bsession = 25.19, 22.56, respectively). Earobics was the lowest ranked intervention across
both metrics (gphase = 1.21, bsession = 5.09). Note that variations across unstandardized and standardized metrics are
expected because the latter accounts for variance associated with error.
158 SOLOMON ET AL .

TA B L E 2 Summary of included studies

Study Intervention Classification CIT Sessions


Albers & Hoffman, 2012 FI and PF FI 17.5 21
Allen-Deboer, Malmgren, & Corrective reading DI 30.0 30
Glass, 2006
Barton-Ardwood, Wehby, & Horizons and peer-assisted PEER 30.0 7
Falk, 2005 learning strategies
Bennett, Gardner, Cartledge, Computer-administered RR RNSE 25.0 10
Ramnath, & Council, 2017
Bray et al., 1998 Video self-modeling VM 3.5 8
De la Colina et al., 2001 RR, PF, and goal setting RR+ 45.0 36
Harris, Oakes, Lane, & Sonday, great leaps, DI 30.0 8.50
Rutherford, 2009 response-cost
Hofstadter-Duke & Daly, Peer-mediated listening PEER 30.0 16
2011 passage preview and RR
Hua, Hendrickson, & Reread-adapt and RR+ 15.0 21
Therrien, 2012 answer-comprehend
Keyes, Cartledge, Gibson, & Computer-administered read RNSE 32.5 20
Robinson-Ervin, 2016 naturally
Klingbeil, Moeyaert, Archer, Peer-mediated FI PEER 10.0 21
Chimboza, & Zwolski, 2017
Lo, Cooke, & Starling, 2011 Group RR RR-G 17.5 19
Musti-Rao, Hawkins, & Group RR RR-G 10.0 11
Barkley, 2009
Neddenriep, Fritz, & Carrier, RR, feedback, and error RR+ 30.0 12
2010 correction
Oddo, Barnett, Hawkins, & Group RR RR-G 10.0 24
Musti-Rao, 2010
O'Shaughnessy & Swanson, Phonological awareness DI 30.0 18
2000 training
Ross & Begeny, 2015 Modeling, RR, feedback, and RR+ 7.0 5
reward
Walcott et al., 2014 Earobics Earobics 20.0 16

Note: RR-G, group-administered repeated readings; PEER, peer-administered interventions; RR+, individual repeated readings
plus additional components; CR, corrective reading; FI, folding-in; VM, video modeling; RNSE, read naturally software edition;
DI, comprehensive direct instruction; PF, performance feedback.

TA B L E 3 Results across outcome metrics and intervention categories

Bphase Bsession BCIT 𝜷 phase (CI) 𝜷 session (CI) 𝜷 CIT (CI)


RR-G 20.29 0.89 0.09 2.02 (1.98, 2.05) 0.12 (0.08, 0.15) 0.23 (0.18, 0.28)
RR+ 25.19 2.55 0.08 1.56 (1.53, 1.59) 0.14 (0.11, 0.17) 0.25 (0.20, 0.30)
Earobics 5.09 1.23 0.06 1.21 (1.08, 1.35) 0.40 (0.21, 0.58) 0.36 (0.07, 0.65)
PEER 6.44 0.15 0.05 1.43 (1.33, 1.52) 0.09 (-0.02, 0.20) 0.01 (-0.11, 0.10)
DI 12.65 1.33 0.04 1.82 (1.79, 1.86) 0.32 (0.30, 0.35) 0.55 (0.47, 0.63)
VM 22.56 2.84 0.71 2.43 (2.36, 2.49) 0.36 (0.29, 0.44) 1.02 (0.91, 1.13)
FI 18.77 1.04 0.06 3.69 (3.56, 3.83) 0.26 (0.06, 0.46) 0.14 (-0.16, 0.44)
RNSE 16.87 1.85 0.13 2.17 (2.12, 2.21) 0.29 (0.23, 0.34) 0.05 (-0.02, 0.11)

Note: CIT, cumulative instructional time; CI, 95% confidence interval; RR-G, group-administered repeated readings; PEER,
peer-administered interventions; RR+, individual repeated readings plus additional components; CR, corrective reading; FI,
folding-in; VM, video modeling; RNSE, read naturally software edition; DI, comprehensive direct instruction; Numbers in
parentheses represent confidence intervals.
SOLOMON ET AL . 159

TA B L E 4 Ordinal intervention rankings by metric

Rank PhaseUST SessionUST CITUST PhaseST SessionST CITST


1 Earobics Peer Peer Earobics Peer Peer
2 Peer RR-G RNSE Peer RR-G RNSE
3 DI FI FI RR-G FI FI
4 RNSE Earobics RR+ DI RNSE RR-G
5 FI DI DI RNSE VM RR+
6 RR-G RNSE Earobics VM Earobics Earobics
7 VM RR+ RR-G RR+ RR+ DI
8 RR+ VM VM FI DI VM

Note: RR-G, group-administered repeated readings; PEER, peer-administered interventions; RR+, individual repeated read-
ings plus additional components; CR, corrective reading; FI, folding-in; VM, video modeling; RNSE, read naturally software edi-
tion; DI, comprehensive direct instruction; UST, unstandardized effect; ST, standardized effect. Interventions ranked lowest to
highest.

7.4 Rate of growth per session


Growth is now reported as the interaction of phase and time where bsession and gsession are the average
growth per session of intervention, controlling for baseline slope (Table 3). Average growth across subjects was
0.41 [0.39, 0.43] and 1.45 WCPM per session [1.43, 1.48] across standardized and unstandardized metrics,
respectively. Earobics, formally the least effective intervention, moved to a middle position across both metrics
(gsession = 0.40, bsession = 1.23). Video modeling moved to the highest position for the unstandardized metric (bsession
= 2.84), and peer-administered interventions moved to the lowest ranked intervention across metrics (gsession = 0.18,
bsession = 0.33).

7.5 Rate of growth by minute


Here, rate of growth per 15 minutes of instruction is reported across intervention categories. The average rate was
1.16 WCPM [1.14, 1.19], and 0.28, [0.26, 0.31], per minute of intervention for the unstandardized and standardized
effects, respectively. Aggregated by intervention category, video modeling remained efficient across both metrics (gCIT
= 1.02, bCIT = 0.71). However, its separation from the other interventions increased substantially, whereas the bulk of
the other intervention categories clustered together. Interestingly, folding-in flashcarding yielded a pattern of dimin-
ishing rank across metrics. For example, although ranked first by phase, it was ranked fourth by session and sixth by
CIT for the unstandardized effect, bCIT = 0.14 WCPM. A similar pattern was noted for read naturally software edition
across the standardized metric.

7.6 Correlations among outcome variables


We examined Spearman's 𝜌 correlations across metrics to ascertain to what degree rankings of subjects held across
outcome metrics, as reported in Table 5. A nonparametric approach was taken because of high kurtosis and skew of
certain variables. Of note is that effects based on phase differences correlated at a generally low level with effects
derived from session-by-session growth or growth by CIT, ranging from 𝜌 = 0.02 to 𝜌 = 0.33, thus suggesting effect
ranking changed dramatically based on metric. Relationships between session-by-session growth and growth by CIT
were larger, ranging from 𝜌 = 0.49 to 𝜌 = 0.63, and were all statistically significant. As expected, relationships across
unstandardized and standardized effects for the same metrics were very high.
In a secondary analysis, we conducted hierarchical regressions on each subject's data to determine whether there
was a change in adjusted R2 values from the base model, ORFi = 𝛽 0 + 𝛽 1 phase + 𝜀i , to the full model, ORFi = 𝛽 0 + 𝛽 1
phase + 𝛽 2 session + 𝛽 3 phase × session + 𝜀I . On average, the adjusted R2 increased 15.49% from the base to the full
160 SOLOMON ET AL .

TA B L E 5 Spearman 𝜌 correlations among outcome variables

PhaseUST SessionUST CITUST PhaseST SessionST CITST


PhaseUST –
SessionUST 0.33b –
CITUST 0.10 0.50b –
PhaseST 0.65 0.12 0.09 –
a a
SessionST 0.02 0.83 0.51 0.17 –
CITST -0.04 0.49 0.94a 0.13 0.63a –

Note: UST, unstandardized effect; ST, standardized effect. Family-wise error controlled with the Holm's procedure.
a
Significant at 0.05 level.
b
Significant at 0.01 level.

model. This suggests that the full model was, on average, a better solution to explaining the data, even when corrected
for the additional predictors. Results based on CIT were identical.

8 DISCUSSION

The purpose of the current study is to replicate and extend the findings of Poncy et al. (2015) by investigating and com-
paring instructional rates across reading fluency interventions by both sessions and minutes of reported intervention
exposure using growth models. Poncy et al. (2015) found that estimates of effect evaluated by means of traditional
ESs were discrepant with rates of learning, measured as ES/CIT. These authors therefore cautioned that the varied lev-
els of CIT required to administer academic interventions may obscure rates of learning when engaging in comparative
intervention research and EBI selection. Similar results were found in the present study. We caution, as did Poncy et al.
(2015), that the purpose of the current study was to evaluate metrics of growth, not the relative quality of specific
interventions. Intervention effects were aggregated to demonstrate variability across metrics.

8.1 Instructional time


One of the most notable findings from this study was simply the wide range of sessions and CIT across studies. The
largest CIT reported was 1,110 minutes of instruction (De la Colina, Parker, Hasbrouck, & Lara-Alecio, 2001), while the
shortest duration was 40 minutes (Bray, Kehle, Spackman, & Hintze, 1998). More relevant, for studies conducted post–
No Child Left Behind Act of 2001 (2002), when EBI became a focal consideration in school practice, the average CIT was
270.35 minutes with similarly wide dispersion, SD = 208.13, min = 95 min, max = 893 min. It is difficult to envision any
scenario, whether it be through meta-analysis or literature review, in which these wide discrepancies would not limit
the validity of decisions regarding best practice either at the statistical or practical level unless they were explicitly
recognized.

8.2 Measures of growth


The results of the current study converged with the findings of Poncy et al. (2015) in demonstrating that growth,
measured as phase differences—analogous to more common estimators (e.g., nonoverlap metrics, Cohen's d)—confers
little information regarding rates of growth. Rate of growth is a significant concern of school-based interventionists
given the brief amount of time students are available for intervention (Fisher et al., 1978; Gettinger & Ball, 2008; Skin-
ner et al., 1995). The purpose of academic intervention is to manipulate rates of growth (i.e., increase the steepness)
of learning over time via a modification of the instructional setting (Skinner et al., 1995). Presently, included studies
yielded concerning levels of discordance across growth metrics. For example, read naturally software edition shifted
SOLOMON ET AL . 161

from one of the highest to one of the lowest ranked interventions based on the type of growth examined for the stan-
dardized effect. At the same time, video modeling increased in rank when CIT was taken into account across both rate
metrics and unstandardized and standardized effects. This particular example serves as an excellent demonstration of
why acknowledging CIT is important. Given this intervention's brief CIT (eight sessions at 4 minutes each), the rate
of learning, 0.71 WCPM/15 minutes, was well beyond that of many other interventions even though, by nature of
the experimental design, its phase difference was average for the pool of studies. Simply acknowledging the number
of sessions—information easily acquired from any SC study—would better approximate learning rate in this case. In
the context of traditional ESs used in meta-analysis, this subtlety would likely be overlooked. The analysis conducted
presently acknowledges this impressive effect given the brevity of the intervention.
Correlational analysis also supports these conclusions. Although growth per session and growth per minute corre-
lated at a significant level, the relationships were far from perfect (i.e., 𝜌 = 1.00), which is in line with the findings of Nist
and Joseph (2008) and other similar studies, which found practical variation in growth rates based on whether growth
over sessions or CIT were examined. Of concern, the relationship between phase differences and growth by session
or per 15 minutes of instructional time was practically nonexistent. The one exception was the significant relationship
between the unstandardized phase and phase × session effect. However, even this correlation, 𝜌 = 0.33, was practically
low. If rate of growth per instructional minute is an ideal target of analysis (Skinner et al., 1995), the current analysis
demonstrated that traditional outcome metrics do not reflect that growth. Consumers of EBI should not make infer-
ences regarding intervention efficiency based on descriptions of phase differences. In the context of individual studies
or meta-analysis, it is important that rate-based metrics accompany more traditional analyses and that both be dis-
cussed. Accounting for session-by-session growth is a more precise estimate, quite achievable using the growth mod-
els described presently, and increasingly common (Moeyaert, Ferron, Beretvas, & Van den Noortgate, 2014; Moeyaert
et al., 2015; Shadish, Kyse, & Rindskopf, 2013); however, this metric does not fully overlap with CIT.

8.3 Limitations
The current study must be considered in light of a number of limitations. For one, although each article included
reported session time to the minute, there is undoubtedly some errors in these reports. Given the importance of CIT
and the findings of the current study, we encourage researchers to report CIT and rate-based metrics with greater fre-
quency. Similarly, we reiterate that the current study's focus was on the nature of various outcome variables, not the
comparative quality of EBIs. Some interventions were represented by relatively few studies. We caution that readers
should not draw conclusions regarding EBI from these rankings.
Rate-based outcome metrics may be overly conservative in the event that a researcher purposefully extends an
intervention phase to solidify their demonstration of operant control of reading fluency behavior. A treatment effect
may asymptote, thereby resulting in a reduction in the rate of learning for every additional intervention session that
occurs. Relatedly, the use of rate-based metrics carries a strong assumption of linearity, that is, improvement in fluency
occurs identically for each additional minute or session of dosage. We therefore stress that rate-based metrics and
summaries of CIT accompany traditional estimates of effect. When this information is discrepant, it offers researchers
an opportunity to explain why this is. Future research could also experiment with calculating CIT from the beginning
of the intervention phase to the point at which the interventionist identifies the student as achieving the instruc-
tional goal, with further observations explicitly identified as a demonstration of effect (Poncy et al., 2015). There is
some precedent for this but far more research into such analytic models is required (see Sullivan, Shadish, & Steiner,
2015).
Unlike prior meta-analyses of literacy interventions, we purposefully chose not to include any additional modera-
tors of effect. Given the multitude of interventions examined and moderators analyzed in prior studies with paradoxi-
cal findings, it is unlikely that the influence of the moderators could be parsed out from the effect of the interventions
themselves, nor where such moderators are of interest in addressing the research questions. Nonetheless, moderator
analysis may prove of value in future comparative studies examining rate-based metrics.
162 SOLOMON ET AL .

9 CONCLUSION

The purpose of the current study is to examine how traditional measures of effect correspond with the rate-based
metrics advocated for by Skinner and colleagues across different research designs. Converging with the findings of
Poncy et al. (2015), we found substantial shifts in rankings based on outcome metrics, in addition to wide ranges in CIT
across articles. We therefore advise those conducting comparative intervention synthesis to explicitly acknowledge
CIT either through parallel analysis with traditional estimators, which is a different approach than that of simple mod-
erator analysis, or as a primary outcome variable. This is particularly relevant for studies focusing on skill acquisition;
slope, not level, differences are expected. The current study also demonstrates the flexibility of using growth models,
of which there is an increasingly wide assortment, to model SC intervention effects within or across studies.
Finally, we recommend practitioners locate information on the required CIT of interventions and, when possible,
compare rates of learning under the presence of intervention in addition to or instead of simple phase differences. This
is not to say that interventions that take less time to administer are inherently better. Furthermore, there are additional
factors associated with intervention success that practitioners may want to consider in selecting EBI, such as long-term
retention (Nist & Joseph, 2008; Zaslofsky, Scholin, Burns, & Varma, 2016) or ease of administration. Nonetheless, we
recommend that those comparing EBI treat rate of learning as a primary outcome variable, just as phase differences
have historically been reported and interpreted. Considering such information will result in the selection of interven-
tions that are most efficient in resolving instructional deficits, resulting in the optimal use of instructional resources in
the school.

ORCID
Benjamin G. Solomon http://orcid.org/0000-0002-8457-1112

REFERENCES
*Studies included in the analysis.
*Albers, C. A., & Hoffman, A. (2012). Using flashcard drill methods and self-graphing procedures to improve the reading perfor-
mance of English language learners. Journal of Applied School Psychology, 28, 367–388.
*Allen-DeBoer, R. A., Malmgren, K. W., & Glass, M. (2006). Reading instruction for youth with emotional and behavioral disor-
ders in a juvenile correctional facility. Behavioral Disorders, 32, 18–28.
*Barton-Ardwood, S. M., Wehby, J. H., & Falk, K. B. (2005). Reading instruction for elementary-age students with emotional and
behavioral disorders: Academic and behavioral outcomes. Exceptional Children, 72, 7–27.
*Bennett, J. G., Gardner, R., III, Cartledge, G., Ramnath, R., & Council, M. R., III. (2017). Second-grade urban learners: Preliminary
findings for a computer-assisted, culturally relevant, repeated reading intervention. Education and Treatment of Children, 40,
145–186.
Berkeley, S., Scruggs, T. E., & Mastropieri, M. A. (2010). Reading comprehension instruction for students with learning disabili-
ties, 1995–2006: A meta-analysis. Remedial and Special Education, 31, 423–436.
Bramlett, R., Cates, G. L., Savina, E., & Lauinger, B. (2010). Assessing effectiveness and efficiency of academic interventions in
school psychology journals: 1995–2005. Psychology in the Schools, 47, 114–125.
*Bray, M. A., Kehle, T. J., Spackman, V. S., & Hintze, J. M. (1998). An intervention program to increase reading fluency. Special
Services in the Schools, 14, 105–125.
Cameron, C., Fireman, B., Hutton, B., Clifford, T., Coyle, D., Wells, G., … Toh, S. (2015). Network meta-analysis incorporating
randomized controlled trials and non-randomized comparative cohort studies for assessing the safety and effectiveness of
medical treatments: Challenges and opportunities. Systemic Reviews, 4, 147–155.
Cates, G. L., Skinner, C. H., Steuart Watson, T., Meadows, T. J., Weaver, A., & Jackson, B. (2003). Instructional effectiveness and
instructional efficiency considerations for data- based decision making: An evaluation of interspersing procedures. School
Psychology Review, 23, 601–616.
*De la Colina, M. G., Parker, R. I., Hasbrouck, J. E., & Lara-Alecio, R. (2001). Intensive intervention in reading fluency for at-risk
beginning Spanish readers. Bilingual Research Journal, 25, 503–538.
SOLOMON ET AL . 163

Fisher, C., Filby, N., Marliave, R., Cahen, L., Dishaw, M., Moore, J., & Berliner D. (1978). Teaching behaviors, academic learning
time, and student achievement: Final report of phase III-B of the Beginning Teacher Evaluation Study. San Francisco, CA: Far West
Laboratory for Educational Research and Development.
Flower, A., McKenna, W. J., & Upreti, G. (2016). Validity and reliability of GraphClick and DataThief III for data extraction.
Behavior Modification, 40, 396–413.
Flynn, L. J., Zheng, X., & Swanson, H. L. (2012). Instructing struggling older readers: A selective meta-analysis of intervention
research. Learning Disabilities Research & Practice, 27, 21–32.
Fuchs, D., Fuchs, L. S., Thompson, A., Svenson, E., Yen, L., & Al Otaiba, S., … Saenz, L. (2001). Peer-assisted learning strategies in
reading: extensions for first-grade, kindergarten, and high school. Remedial and Special Education, 22, 15–21.
Gettinger, M., & Ball, C. (2008). Best practices in increasing academic engaged time. In A. Thomas, & J. Grimes (Eds.), Best prac-
tices in school psychology V (pp. 1043–1058). Bethesda, MD: The National Association of School Psychologists.
GraphClick [Computer software]. (2012). Retrieved from http://www.arizona-software.ch/graphclick
*Harris, P. J., Oakes, W. P., Lane, K. L., & Rutherford, R. B., Jr. (2009). Improving the early literacy skills of students at risk for
internalizing or externalizing behaviors with limited reading skills. Behavioral Disorders, 34, 72–90.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statis-
tics, 6, 107–128.
Higgens, J. P. T., Thompson, S. G., & Spiegelhalter, D. J. (2009). A re-evaluation of random- effects meta-analysis. Journal of the
Royal Statistical Society. Series A, 172, 137–159.
*Hofstadter-Duke, K. L., & Daly, E. J., III. (2011). Improving oral reading fluency with a peer- mediated intervention. Journal of
Applied Behavior Analysis, 44, 641–646.
*Hua, Y., Hendrickson, J. M., & Therrien, W. (2012). Effects of combined reading and question generation on reading fluency
and comprehension of three young adults with autism and intellectual disability. Focus on Autism and Other Developmental
Disabilities, 27, 135–146.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research synthesis (2nd ed.). Thousand
Oaks, CA: Sage.
Individuals with Disabilities Education Act, 20 U.S.C. § 1400 (2004).
Joseph, L. M., & Nist, L. M. (2006). Comparing the effects of unknown-known ratios on word reading learning versus learning
rates. Journal of Behavioral Education, 15, 69–79.
*Keyes, S. E., Cartledge, G., Gibson, L., Jr., & Robinson-Ervin, P. (2016). Programming for generalization of oral reading fluency
using computer-assisted instruction and changing fluency criteria. Education and Treatment of Children, 39, 141–172.
*Klingbeil, D. A., Moeyaert, M., Archer, C. Y., Chimboza, T. M., & Zwolski, S. A., Jr. (2017). Efficacy of peer-mediated incremental
rehearsal for English language learners. School Psychology Review, 46, 122–140.
Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M & Shadish, W. R. (2010). Single-case
designs technical documentation. Retrieved from http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf
*Lo, Y., Cooke, N. L., & Starling, A. L. P. (2011). Using a repeated reading program to improve generalization of oral reading
fluency. Education and Treatment of Children, 34, 115–140.
Moeyaert, M., Ferron, J. M., Beretvas, N., & Van den Noortgate, W. (2014). From a single- level analysis to a multilevel analysis
of single-case experimental designs. Journal of School Psychology, 52, 191–211.
Moeyaert, M., Ugille, M., Ferron, J. M., Onghena, P., Heyvaert, M., Beretvas, S. N., & Van den Noortgate, W. (2015). Estimating
intervention effects across different types of single-subject experimental designs: Empirical illustration. School Psychology
Quarterly, 30, 50–63.
*Musti-Rao, S., Hawkins, R. O., & Barkley, E. A. (2009). Effects of repeated readings on the oral reading fluency of urban fourth-
grade students: Implications for practice. Preventing School Failure, 54, 12–23.
*Neddenriep, C. E., Fritz, A. M., & Carrier, M. E. (2010). Assessing for generalized improvements in reading comprehension by
intervening to improve reading fluency. Psychology in the Schools, 48, 14–27.
Nist, L., & Joseph, L. M. (2008). Effectiveness and efficiency of flashcard drill instructional methods on urban first-graders’ word
recognition, acquisition, maintenance, and generalization. School Psychology Review, 37, 294–308.
No Child Left Behind Act of 2001, 20 U.S.C. § 6319 (2002).
*Oddo, M., Barnett, D. W., Hawkins, R. O., & Musti-Rao, S. (2010). Reciprocal peer tutoring and repeated reading: Increasing
practicality using student groups. Psychology in the Schools, 47, 842–858.
*O'Shaughnessy, T. E., & Swanson, H. L. (2000). A comparison of two reading interventions for children with reading disabilities.
Journal of Learning Disabilities, 33, 257–277.
164 SOLOMON ET AL .

Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-U.
Behavior Therapy, 42, 284–299.
Poncy, B. C., Solomon, B., Duhon, G., Skinner, C., Moore, K., & Simons, S. (2015). An analysis of learning rate and curricular scope:
Caution when choosing academic interventions based on aggregated outcomes. School Psychology Review, 44, 289–305.
Rakap, S., Rakap, S., Evran, D., & Cig, O. (2016). Comparative evaluation of the reliability and validity of three data extraction
programs: UnGraph, GraphClick, and Digitize It. Computer in Human Behavior, 55, 159–166.
*Ross, S. G., & Begeny, J. C. (2015). An examination of treatment intensity with an oral reading fluency intervention: Do inter-
vention duration and student–teacher instructional ratios impact intervention effectiveness. Journal of Behavioral Educa-
tion, 24, 11–32.
Scammacca, N. K., Roberts, G., Vaughn, S., & Stuebing, K. K. (2015). A meta-analysis of interventions for struggling readers in
grades 4–12: 1980–2011. Journal of Learning Disabilities, 48, 369–390.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference.
Belmont, CA: Wadsworth.
Shadish, W. R., Kyse, E. N., & Rindskopf, D. M. (2013). Analyzing data from single-case designs using multilevel models: New
applications and some agenda items for future research. Psychological Methods, 18, 385–405.
Shadish, W. R., Rindskopf, D. M., & Hedges, L. V. (2008). The state of the science in the meta-analysis of single-case experimental
designs. Evidence-Based Communication Assessment and Intervention, 2, 188–196.
Skinner, C. H. (2008). Theoretical and applied implications of precisely measuring learning rates. School Psychology Review, 37,
309–314.
Skinner, C. H., Belfiore, P. J., & Watson, T. S. (1995). Assessing the relative effects of interventions in students with mild disabil-
ities: Assessing instructional time. Journal of Psychoeducational Assessment, 20, 346–357.
Skinner, C. H., Fletcher, P. A., & Henington, C. (1996). Increasing learning rates by increasing student response rates: A summary
of research. School Psychology Quarterly, 11, 313–325.
Suggate, S. P. (2016). A meta-analysis of the long-term effects of phonemic awareness, phonics, fluency, and reading compre-
hension interventions. Journal of Learning Disabilities, 49, 77–96.
Sullivan, K. J., Shadish, W. R., & Steiner, P. M. (2015). An introduction to modeling longitudinal data with generalized additive
models: Applications to single-case designs. Psychological Methods, 20, 26–42.
Van den Noortgate, W., & Onghena, P. (2008). A multilevel meta-analysis of single-subject experimental design studies. Evidence
Based Communication Assessment and Intervention, 2, 142–151.
Volpe, R. J., Mulé, C. M., Briesch, A. M., Joseph, L. M., & Burns, M. K. (2011). A comparison of two flashcard drill methods target-
ing word recognition. Journal of Behavioral Education, 20, 117–137.
*Walcott, C. M., Marett, K., & Hessel, A. B. (2014). Effectiveness of a computer-assisted intervention for young children with
attention and reading problems. Journal of Applied School Psychology, 30, 83–106.
Wolery, M., Busick, M., Reichow, B., & Barton, E. E. (2010). Comparison of overlap methods for quantitatively synthesizing
single-subject data. The Journal of Special Education, 44, 18–28.
Yaw, J., Skinner, C. H., Maurer, K., Skinner, A. L., Cihak, D., Wilhoit, B., … Booher, J. (2014). Measurement scale influences in the
evaluation of sight word reading interventions. Journal of Applied Behavior Analysis, 47, 360–379.
Zaslofsky, A. F., Scholin, S. E., Burns, M. K., & Varma, S. (2016). Comparison of opportunities to respond and generation effect as
potential causal mechanisms for incremental rehearsal with multiplication combinations. Journal of School Psychology, 55,
71–78.

How to cite this article: Solomon BG, Poncy BC, Caravello D, Schweiger EM. Examining Learning Rates
in the Evaluation of Academic Interventions that Target Reading Fluency. Psychol Schs. 2018;55:151–164.
https://doi.org/10.1002/pits.22100

You might also like