You are on page 1of 8

Big Data, Data Mining, and Machine Learning: Value Creation for Business

Leaders and Practitioners. Jared Dean.


2014 SAS Institute Inc. Published 2014 by John Wiley & Sons, Inc.

C HA P T E R 7
Incremental
Response
Modeling

A
standard part of marketing campaigns in many industries is to offer
coupons to encourage adoption of the new goods or service. This
enticement is often essential for success because it provides some
additional incentive to persuade customers to switch products if they are
currently satisfied with the goods or service in question. For example,
if I am a longtime buyer of laundry detergent XYZ, to get me to try a
competitors laundry product detergent ABC, I will need some incen-
tive great enough to get to move outside my comfort zone of buying
detergent XYZ on my trips to the store. This incentive or inducement
could be superior performance, but I will never know that the perfor-
mance is superior unless I try the product. Another inducement could
be value. Detergent ABC cleans just as well as detergent XYZ, and I get
a larger quantity of detergent ABC for the same price as detergent XYZ.
This strategy, like superior quality, is also predicated in me trying the
product. All strategies that will successfully convert from detergent XYZ
to detergent ABC require me to try the product, and the most common

141
142 BIG DATA, DATA MINING, AND MACHINE LEARNING

way to do that is to give me a coupon. Coupons are very common for


most household products and packaged foods. These coupons can, in
some cases, be successful in changing buying behavior from one product
to another product. Now consider this alternative case: I have been buy-
ing detergent XYZ regularly, but I am not very happy with it and I have
decided to switch to detergent ABC because of a new ad I saw while
watching TV. Now when I receive a coupon for detergent ABC that I had
planned to buy anyway, I purchase my detergent ABC, but the manu-
facturer yields less profit because I purchased a product I had already
planned to at a lower price than I was already willing to pay.
This is what incremental response modeling methods do: deter-
mine those people who purchased the product because of an induce-
ment (coupon) in order to maximize the profit for the organization.

BUILDING THE RESPONSE MODEL

The results of a marketing campaign will fall into one of three groups:
people who will not purchase the product regardless of the incentive, those
who switched products because of the campaign, and those who were go-
ing to switch already and can now purchase the product at a reduced rate.
Of these three groups, we would like to target with our marketing only
those who switched just because of the coupon. There is a fourth category
that I will discuss here and not refer to again. This is the group referred
to as sleeping dogs; they are mostly encountered in political campaigns,
not marketing campaigns. Sleeping dogs are those people who purchase
your product or subscribe to your political view but by including them in
a campaign they respond negatively and leave your brand.
Public radio is another example to demonstrate incremental re-
sponse. Public radio, supported by listeners, has fund drives several
times throughout the year to raise money to support the programm-
ing. Many listeners donate as soon as they hear the fund drive begin
because they want to support the programming and feel a duty to do
so. Another group probably would not donate to the station except
for the appeal of a coffee mug, tote, video collection, or some other
gift to reward them for their generosity that sways them to pick up
the phone or go to the website and contribute. The problem for public
broadcasting is it cannot discern between the two groups. If it could
INCREMENTAL RESPONSE MODELING 143

Control Treatment
Purchase = Yes
Purchase = Yes
Incremental
Response
Purchase = No
Purchase = No

Figure 7.1 Control versus Treatment

discern between the groups, it could save the costs associated with
those giveaway items and therefore reduce overall cost.
The method behind incremental response modeling is this: Begin
with a treatment and control group. These groups should follow
a methodology from the clinical trial literature. The treatment is a
coupon. Note that you can use multiple treatments, but here we will
discuss only the binary case.
Once you have divided your sample into the two assigned groups,
administered either the treatment or the control, and then gathered the
results, you can begin to apply this methodology. Figure 7.1 shows the
difference between the control group and treatment group. The treat-
ment group received the coupon for detergent ABC while the control
group received no coupon. You can see that the coupon was effective in
increasing sales (based on the size of the box labeled Purchase=Yes),
but the top section in the treatment group represents all people who
purchased the detergent. The ones in the box labeled Incremental Re-
sponse purchased because of the coupon and the rest would have pur-
chased regardless. Therefore, the treatment group did not maximize
profit because the detergent was sold at a lower price than could have
otherwise been demanded. It is rare but possible that in some cases the
treatment group, those who received the coupon, could actually gener-
ate less profit than the control group, doing nothing.

MEASURING THE INCREMENTAL RESPONSE

A traditional approach to accessing the incremental response is using a


differencing technique from two predictive models. Take the likelihood
to purchase from the predictive model built for the treatment group:
Pt ( y = 1 | x)
144 BIG DATA, DATA MINING, AND MACHINE LEARNING

The likelihood to purchase for the control group is similar:


Pc ( y = 1 | x)

The incremental response likelihood can be calculated as the difference


PD = Pt ( y = 1 | x) Pt ( y = 1 | x)

Then sort the resulting PD from largest to smallest, and the top
deciles are the incremental responders. Any predictive model can be
employed in the differencing technique, such as the regressionbased
differencing model and the treebased differencing model.
An improved method is to look only at the control group, the peo-
ple who did not get the coupon, and classify each person as an outlier
or not. Several techniques can be used for classifying outliers when
you have only one group. A method that has good results classifying
outliers is oneclass support vector machines (SVMs).
Recently a new method was suggested that uses an outlier detection
technique, particularly the oneclass SVM. The suggested method uses
the control group data to train the model and uses the treatment group
as a validation set. The detected outliers are considered as incremental
responses. This new method shows much better results than the dif-
ferencing technique. The technique is illustrated with plots below, but
more details can be found in the paper by Lee listed in the references.
In Figure 7.2, we see a graphical representation of the points from
the control group that have been identified as outliers. The dots closer
to the origin than the dashed line are classified as part of the negative
class, and other dots up and to the right of the dashed line are classi-
fied to the positive class. The points in the negative class are considered
outliers (those between the origin and the dashed line). They receive
this designation because there are particular features, or a combination
of many features, that identify them as different from the overall group.

Origin

Figure 7.2 Outlier Identification of Control Group


INCREMENTAL RESPONSE MODELING 145

Origin

Figure 7.3 Separation of Responders and Nonresponders in the Control Group

One reason to identify some of the points in the control group


as outliers is to narrow the region that the control group identifies
so that we can better determine which observations are incremental
responders when we apply the model to the treatment group. To apply
this to our example, we would look at the people who purchased
detergent ABC and then, using a oneclass SVM model, identify
some of those as outliers. This would leave us a set of ranges for each
attribute of our customers that we can use to identify them as people
who would purchase detergent ABC without a coupon, as shown in
Figure 7.3.
The region of the responders for the control group as shown by
the larger circle in Figure 7.3 this region can then be projected to
the treatment group. Applying this region to the treatment group
will identify the incremental responders, those individuals who
purchased detergent ABC because of the coupon. This is illustrated
graphically in Figure 7.4. This is only a representation in two dimen-
sions. In practice, this would be in dozens, hundreds, or even thou-
sands of dimensions.
Figure 7.4 shows the projection of the control group, those who
bought detergent ABC without a coupon after the outliers were
removed using a oneclass SVM model and projecting the region to
the treatment group, those who bought detergent ABC after being
sent a coupon. To interpret this plot, examine the different parts.
The first group to identify is the treatment group that falls inside the
oval; these are responders who were unaffected by the coupon. This
means that for those people inside the upper oval, the coupon did
not influence their purchasing choice (buy detergent ABC regard-
less of the promotion either to buy detergent ABC or to not buy).
The data points outside of the oval in the treatment response group
146 BIG DATA, DATA MINING, AND MACHINE LEARNING

Treatment

Control

Figure 7.4 Projection of Control Group to Treatment Group

are the potential incremental responders. Those are the people who
purchased because of the treatment; in this specific example, the
coupon for detergent ABC. I used the word potential above be-
cause there is no definitive way in real life to objectively measure
those people who responded only as a result of the treatment. This
can be tested empirically using simulation, and that work has illus-
trated the effectiveness of this method. Figure 7.5 is an example of
a simulation study.
Figure 7.5 shows 1,300 responders to the treatment group. This
includes 300 true incremental responders. The method described above
identified 296 observations as incremental responders, and 280 of

Respondents
Nonresponders

5
x2

6 4 2 0 2 4 6
x1

Figure 7.5 Simulation of 1,300 Responders to Coupon Offer


INCREMENTAL RESPONSE MODELING 147

Table 7.1 Classification of Incremental Responders

Correctly Classied Falsely Classied


Responders 280 20
Nonresponders 986 16

those identified were true positives. This is all the more impressive
because, as you can see, there is no simple way to use straight lines
and separate the gray true incremental responders from the black
nonincremental responders. This leaves 20 true responders who were
not identified and 16 who were incorrectly identified. See Table 7.1 for
a tabular view.
This simulation yields a 5.4% error rate, which is a significant im-
provement over the differencing method explained at the beginning
of the chapter. Incremental response modeling holds much promise
in the areas of targeted advertising and microsegmentation. The abil-
ity to select only those people who will respond only when they re-
ceive the treatment is very powerful and can contribute significantly
to increased revenue. Consider the typical coupon sells the goods or
service at 90% of the regular price (a 10% discount). Every correctly
identified true incremental responder will raise revenue 0.9 and every
correctly identified nonincremental responder (those who are not in-
fluenced by the treatment either to purchase or not) will raise revenue
by 0.1 because those items will not be sold at a discount needlessly.
Then add in the nominal cost of the treatmentad campaign, postage,
printing costs, channel management. We have the following revenue
adjustments:
Incremental Revenue = 0.9r + 0.1n campaign costs

where
r = incremental responders who will purchase the product if they
receive the coupon but otherwise will not
n = nonresponders who will not buy the product even if they
received the coupon
By taking the simulation example but increasing the error rate to
nearly double at 10%, you can see the advantage of using incremental
response:
148 BIG DATA, DATA MINING, AND MACHINE LEARNING

Incremental response revenue = 333 units = .9 ( 270 ) + .1( 900 ) fixed costs
compared to:
Control only = 100 units = .9 ( 0 ) + .1(1000 ) campaign costs
Treatment only = 270 units = .9 (300 ) + .1( 0 ) campaign costs

costs of each scenario, the treatment will have the largest effect be-
cause a coupon is being sent to each of the 1,300 people. This will
be followed by the incremental response group, where coupons are
sent only to those predicted to respond; and finally the control group,
where there are no campaign costs because of the lack of campaign.
Treatment campaign costs > Incremental reponse campaign costs >
Control campaign costs = 0
When the campaign costs are added to the calculations, the in-
cremental response is an even better option to either the treatment
or the control group. This increasing amount of information that is
made available will over time reduces the error rates, yielding even
larger revenues for those organizations that leverage this powerful
technique.

You might also like