You are on page 1of 4

Big Data, Data Mining, and Machine Learning: Value Creation for Business

Leaders and Practitioners. Jared Dean.


2014 SAS Institute Inc. Published 2014 by John Wiley & Sons, Inc.

C HA P T E R 15
Case Study of
Mobile Application
Recommendations

W
ith the rapid growth of smartphones over the past five years, a
new market for smartphone applications has emerged and with
it stiff competition for mind share. All vendors in this space de-
sire the ability to make recommendations on the applications (apps)
that users will like. The goal is to ensure that recommendations are
not seen as spam (unwanted solicitations) but instead as great advice,
thus moving from simply suggesting content to a highly prized role as
a trusted advisor. The business problem in this case is very straight-
forward: Given the apps users have on their mobile phones, what apps
are they likely to use? This problem poses several challenges, the first
being the size of the data. With hundreds of millions of cell phone
users and each one being almost unique in app purchases, finding
good recommendations is difficult. The other issue is the level of detail
about the apps. For this business challenge, the data was a set of bi-
nary variables. The binary state could be defined a number of different
ways: Did they have the application installed? Had they ever used the
application? Had the application been used in the last time period?
225
226 BIG DATA, DATA MINING, AND MACHINE LEARNING

Table15.1 Sparse Customer and Item Data Sheet

Customer Item1 Item2 Item3 Item4. . . ItemN


Customer1 1 1
Customer2 1
Customer3 1
Customer4. . . 1
CustomerN 1

Regardless of how you frame the problem, the output is a prob-


ability to purchase the app (the purchase price may be free) based on
a set of binary input variables.
This problem is divided up into several stages, each requiring dif-
ferent skill sets. The first stage is data collection and aggregation. Tradi-
tional predictive modeling through data mining generally requires that
input data be rectangular, and missing values pose severe problems for
some algorithms. In this case, having a list of potential applications to
recommend and all possible apps that could have been purchased as
variables and all the customers as rows in the data (see Table 15.1) cre-
ates a very sparse data set. (There are lots of missing values.) Normally
imputation would be an approach to deal with these missing values,
but because the data is so sparse and the variables are binary, imputa-
tion is very difficult without creating bias and skewing results. This
high level of missing values makes neural networks and logistic regres-
sion poor techniques for this type of problem. Decision trees can better
handle missing values, but they tend to be unstable. The techniques of
factorization machines or stochastic gradient descent generally behave
better for these types of problems.
The second problem is data size. Because making recommenda-
tions while considering only a subset of the data is suboptimal, the
quality of the recommendations is called into question. Therefore,
sufficient computing hardware, especially main memory, must be al-
located for the problem size along with software that is designed to
perform in a parallel and distributed environment. The parallel nature
allows all the CPU cores on the machine to be utilized, and the distri-
bution property allows for multiple machines to work together solving
the problem. This distributed nature also allows for expansion of your
CASE STUDY OF MOBILE APPLICATION RECOMMENDATIONS 227

hardware as the problem size increasesthis often happens after a


successful implementation: New, larger problems emerge that, if they
can be solved, will provide additional business value and are therefore
very desirable.
The parallel and distributed design of the software also allows
for a critical component: speed. It is not useful to solve a problem
in a day if the answer is needed in a few seconds (what most people
are willing to wait) and the answer cannot be precomputed. This
problem is often overcome by doing offline training. The recommen-
dation model is trained and refined using historical data and then
deployed so that new eventsa user coming to an app store, for ex-
ampleis scored and apps are recommended. This pattern of offline
training and model scoring solves the problem of quick recommen-
dation but introduces the concern about the useful life of a model, or
model decay. Having the ability to quickly train, deploy, and retrain
is the ideal situation for recommendations that need to be near real
time or faster. Alternatively, a servicebased architecture could be
used where the model is trained and then held in memory. When a
recommendation is needed, an application program interface(API) is
employed with the new record, and a recommendation is made us-
ing a contentbased filtering or collaborative filtering method. These
methods produce a small subset of the entire collection of apps avail-
able in the virtual store based on criteria and then update the overall
table so that a future recommendation request will use all available
information.
For this case study there were about 200 apps to be considered.
The first step was to look for clusters that would help reduce the
problem space. This clustering reduced the number of input variables
from 200 down to a few dozen. After the clustering was completed,
the cluster variables were added to the list of potential input variables.
Other techniques included singular value decomposition, and principal
component analysis to find relationships between apps. After the
enrichment of the data with the additional features were created,
variable selection techniques, both supervised and unsupervised, were
used to eliminate variables that did not provide useful information.
This was followed by multiple iterations of different tree models. The
trees were boosted and ensembles were created in a variety of ways to
228 BIG DATA, DATA MINING, AND MACHINE LEARNING

produce powerful and stable models. The models were evaluated against
a holdout sample that had been partitioned earliernot randomly,
as is typical, but by a time window. To randomly sample would have
biased the recommendations because users were in both training and
validation data, but with a time partition the model could be tested
under realworld conditions, because once the model is developed
and deployed, it must be updated with an active learning paradigm or
retrained before excessive decay.

You might also like