MARSv 2

MARS
User Guide
Salford Systems, 2001

Copyright
Copyright 2001, Salford Systems; all rights reserved worldwide. No part of this publica-
tion may be reproduced, transmitted, transcribed, stored in a retrieval system, or
translated into any language or computer language, in any form or by any means,
electronic, mechanical, magnetic, optical, chemical, manual or otherwise without the
express written permission of Salford Systems.
Limited Warranty
Salford Systems warrants for a period of ninety (90) days from the date of delivery that,
under normal use, and without unauthorized modification, the program substantially
conforms to the accompanying specifications and any Salford Systems authorized
advertising material; that, under normal use, the magnetic media upon which this
program is recorded will not be defective; and that the user documentation is substan-
tially complete and contains the information Salford Systems deems necessary to use
the program.
If, during the ninety (90) day period, a demonstrable defect in the programs magnetic
media or documentation should appear, you may return the software to Salford Sys-
tems for repair or replacement, at Salford Systems option. If Salford Systems cannot
repair the defect or replace the software with functionally equivalent software within
sixty (60) days of Salford Systems receipt of the defective software, then you shall be
entitled to a full refund of the license fee.
Salford Systems cannot and does not warrant that the functions contained in the
program will meet your requirements or that the operation of the program will be
uninterrupted or error free.
Salford Systems disclaims any and all liability for special, incidental, or consequential
damages, including loss of profit, arising out of or with respect to the use, operation, or
support of this program, even if Salford Systems has been apprised of the possibility of
such damages.
Trademarks
MARS is a trademark of JerIll, Inc and is exclusively licensed to Salford Systems.

CART is a registered trademark of California Statistical Software, Inc., and is exclu-
sively licensed to Salford Systems. DBMS/COPY is a trademark of Conceptual
Software. All other trademarks and registered trademarks mentioned herein are the
property of their respective owners.
iii
Table of Contents
CHAPTER 1: INTRODUCTION TO MARS .......................................................... 1
ABOUT THIS USER GUIDE ...................................................................................... 3
CHAPTER 2: INSTALLING AND STARTING MARS ............................................... 5

SYSTEM REQUIREMENTS ....................................................................................... 5
INSTALLATION PROCEDURE ..................................................................................... 5
STARTING MARS FOR WINDOWS ............................................................................ 6
PREPARING YOUR DATA FOR MARS ...................................................................... 6
Accessing your data regardless of the original file format ................................... 6
Setting up dynamic link to DBMS/COPY .............................................................. 6
Making sure that you are using the most up-to-date file access drivers ................. 6
Using ODBC ......................................................................................................... 6
Variable Names .................................................................................................... 7
Character or Text Data .......................................................................................... 7
CHAPTER 3: MARS BASICS ...................................................................... 9

THE MODELERS CHALLENGE ................................................................................ 9
Global Parametric Modeling vs. Local Nonparametric Modeling ........................... 10
Smoothing Techniques ......................................................................................... 11
Kernel Smoothes and Density Estimators .......................................................... 12
Bias-Variance Tradeoff in Global and Local Modeling .......................................... 12
Fatal Flaw in Nonparametric Modeling: The Curse of Dimensionality ................... 13
MARS SMOOTHING, SPLINES AND KNOTS SELECTION ............................................. 14
MARS BASIS FUNCTIONS .................................................................................. 17
Mirror-Image Basis Functions ............................................................................. 24
Generation of Basis Functions ............................................................................ 26
HANDLING OF CATEGORICAL PREDICTOR VARIABLES ................................................. 27
CLASSIC FORWARD KNOT PLACEMENT REPORT ....................................................... 28
INTERPRETING MARS BASIS FUNCTIONS ............................................................... 29
HANDLING OF MISSING VALUES ............................................................................ 31
CONSTRUCTION OF INTERACTION TERMS ................................................................. 32
HOW A VARIABLE IS ENTERED LINEARLY ................................................................ 34
TESTING, VALIDATION AND PROTECTION AGAINST OVERFITTING .................................... 36
How many degrees of freedom in a MARS model? .............................................. 36
Generalized Cross-Validation: GCV Measure of Mean Square Error ................... 37
Impact of the DF Charged Per Basis Function .................................................... 37
How Final Model Size Varies With DF ................................................................ 38
iv
CHAPTER 4: SETTING CONTROL PARAMETERS & REFINING MODELS .................... 39

MAXIMUM NUMBER OF BASIS FUNCTIONS TO ALLOW ................................................ 39
Maximum Basis Functions in Data Mining Applications ...................................... 40
FORCING VARIABLES INTO THE MODEL ................................................................... 40
FORBIDDING TRANSFORMATIONS OF SELECTED VARIABLES ........................................ 40
PENALTY ON ADDED VARIABLES ............................................................................ 41
MINIMUM NUMBER OF OBSERVATIONS BETWEEN KNOTS (MINIMUM SPAN) ................... 41
ALLOWING OR DISALLOWING SPECIFIC INTERACTIONS ................................................ 42
MARS SEARCH INTENSITY .................................................................................. 43
Search Intensity or Speed Parameter ................................................................. 43
CHAPTER 5: READING & INTERPRETING MODEL RESULTS .................................. 45

BASIS FUNCTION AND MODEL CODE ..................................................................... 45
MODEL SUMMARY ............................................................................................. 46
ANOVA TABLE ................................................................................................ 47
VARIABLE IMPORTANCE TABLE .............................................................................. 49
FINAL MODEL SUMMARY .................................................................................... 50
BASIS FUNCTIONS .............................................................................................. 50
GAINS CHART ................................................................................................... 51
BINARY GAINS CHART ........................................................................................ 52
PREDICTION SUCCESS TABLE ............................................................................... 53
BINARY THRESHOLD TABLE ................................................................................. 54
TWO- AND THREE-DIMENSIONAL CURVE AND SURFACE PLOTS .................................. 55
Exporting and Printing 2-D and 3-D Plots ............................................................ 57
THE MARS MODEL SELECTOR ........................................................................... 57
SAVING AND OPENING SELECTOR FILES ................................................................. 59
CHAPTER 6: HANDS-ON TOUR OF GRAPHICAL USER INTERFACE ......................... 61

GETTING STARTED .............................................................................................. 61
SETTING UP THE MODEL ..................................................................................... 61
Variables Dialog .................................................................................................. 61
Interactions Dialog .............................................................................................. 64
Select Dialog ...................................................................................................... 65
Options and Limits Dialog ................................................................................... 65
Testing Dialog ..................................................................................................... 67
EDIT OPTIONS ................................................................................................... 69
Reporting Dialog ................................................................................................. 69
Random Number Dialog ...................................................................................... 69
Directories Dialog ............................................................................................... 70
SAVING A MARS MODEL ................................................................................... 71
APPLYING A MARS MODEL TO DATA .................................................................... 71
INTERACTIVE EXAMPLE ........................................................................................ 73
v
MARS Report Writer ................................................................................................. 77

Default Options ................................................................................................... 78
Pre-configured Reports ....................................................................................... 78
Printing and Saving Reports ................................................................................ 79
THE DATA VIEWER ............................................................................................. 80
ON-LINE HELP .................................................................................................. 81
CHAPTER 7: COMMAND-LINE CONTROL AND BATCH MODE ............................... 83

ALTERNATIVE CONTROL MODES IN MARS FOR WINDOWS ......................................... 83
Command-Line Mode .......................................................................................... 84
Command Log .................................................................................................... 84
Creating and Submitting Batch Files ................................................................... 84
COMMAND-LINE SYNTAX AND OPTIONS .................................................................. 85
Example Command Files ....................................................................................... 85
COMMAND REFERENCE ....................................................................................... 87
ADDITIVE ........................................................................................................... 87
APPLY ............................................................................................................... 88
BOPTIONS ......................................................................................................... 89
CDF .................................................................................................................... 91
DESCRIPTIVE .................................................................................................... 92
ESTIMATE .......................................................................................................... 94
EXCLUDE ........................................................................................................... 95
FORMAT ............................................................................................................. 96
HELP .................................................................................................................. 97
HISTOGRAM ...................................................................................................... 98
IDVAR ................................................................................................................. 99
INTERACT ......................................................................................................... 100
KEEP ................................................................................................................ 101
KNOT ................................................................................................................ 102
LIMIT ................................................................................................................. 103
LINEAR ............................................................................................................. 104
LOPTIONS ......................................................................................................... 105
MODEL ............................................................................................................. 106
NAMES ............................................................................................................. 107
NEW ................................................................................................................. 108
OPTIONS .......................................................................................................... 109
OUTPUT ............................................................................................................ 110
PAGE ................................................................................................................ 111
PRINT ................................................................................................................ 112
QUIT .................................................................................................................. 113
REGRESSION ................................................................................................... 114
REM .................................................................................................................. 115
SEED ................................................................................................................ 117
SELECT ............................................................................................................ 118
vi
SEQUENCE ...................................................................................................... 119

STORE .............................................................................................................. 120
SUBMIT ............................................................................................................. 121
USE .................................................................................................................. 122
WEIGHT ............................................................................................................ 123
XYPLOT ............................................................................................................. 124
APPENDIX I. BASIC PROGRAMMING LANGUAGE ........................................... 125

GETTING STARTED ............................................................................................ 126
MISSING VALUES ............................................................................................. 133
MORE EXAMPLES ............................................................................................ 134
FILTERING OR SPLITTING THE DATA SET ................................................................ 135
ADVANCED PROGRAMMING FEATURES .................................................................. 136
PROGRAMMING COMMAND REFERENCE ................................................................. 87
IFTHEN Statement ......................................................................................... 137
LET Statement ................................................................................................... 138
ELSE Statement ................................................................................................ 139
FOR...NEXT Statement ...................................................................................... 140
DIM Statement ................................................................................................... 141
DELETE Statement ........................................................................................... 142
GOTO Statement ............................................................................................... 143
STOP Statement ............................................................................................... 144
APPENDIX II. FURTHER READING AND REFERENCES .......................................... 145

INDEX .................................................................................................. 147
1
Chapter 1: Introduction to MARS
W elcome to MARS, the worlds first truly successful automated regression modeling
tool. Multivariate Adaptive Regression Splines was developed in the early 1990s
by world-renowned Stanford physicist and statistician Jerome Friedman, but has become
widely known in the data mining and business intelligence worlds only recently through
our seminars and the enthusiastic endorsement of leading data mining specialists. We
expect that you will find MARS an essential component in your data analysis tool kit.
MARS is an innovative and flexible modeling tool that automates the building of accurate
predictive models for continuous and binary dependent variables. It excels at finding
optimal variable transformations and interactions, the complex data structure that often
hides in high-dimensional data. In doing so, this new approach to regression modeling
effectively uncovers important data patterns and relationships that are difficult, if not
impossible, for other methods to reveal.
Although regression is one of the most widely used tools in statistical analysis, it is
almost never used in data mining. Nevertheless, a well-crafted regression model can be
ideal for predictive modeling and data mining because of the following important
characteristics:
n A regression model predicts the outcome variable by forming a weighted sum of
the predictor variables; thus, the predicted outcome changes in a smooth and
regular fashion as the inputs change.
n This is in contrast to a decision tree where a small change in a predictor could

either move a prediction to a different node in the tree or result in no change in the
prediction at all.
n When scoring a database, regression models typically produce unique scores

for each record. Decision trees assign the same score to all records arriving at a
specific node; thus, the smaller the decision tree, the fewer the number of unique
scores assigned.
n In a regression, it is often possible to read the effect of a predictor variable on

the outcome by examining its slope coefficient; experienced modelers can often
read the effects of interactions directly from the model itself.
Developing a good regression model is usually an extremely time intensive activity requiring
considerable modeling expertise, even for small databases. For the large databases
common in data mining projects, model-building challenges have deterred data miners
from using this otherwise very effective tool. However, with the advent of MARS, regression
models can now be routinely and automatically developed for the most complex data
structures.
2 Chapter 1: Introduction to MARS
Given a target variable and a set of candidate predictor variables, MARS automates all
aspects of model development and model deployment, including:
variable selection: separating relevant from irrelevant predictor variables

transforming predictor variables exhibiting a nonlinear relationship with the target
determining interactions between predictor variables
handling missing values with new nested variable techniques
conducting extensive self-tests to protect against overfitting
MARS enables you to rapidly search through all possible models and to quickly identify
the optimal solution. Because the software can be exploited via intelligent default settings,
for the first time analysts at all technical levels can easily access MARS innovations.
MARS essentially builds flexible models by fitting piecewise linear regressions; that is,
the nonlinearity of a model is approximated through the use of separate regression slopes
in distinct intervals of the predictor variable space. An example of a piecewise linear
regression is shown below.
The slope of the regression line is allowed to change from one interval to the other as the
two knot points are crossed. The variables to use and the end points of the intervals for
each variable are found via a fast but very intensive search procedure. In addition to
searching variables one by one, MARS also searches for interactions between variables,
allowing any degree of interaction to be considered.
The optimal MARS model is selected in a two-stage process. In the first stage, MARS
constructs an overly large model by adding basis functions the formal mechanism by
which variable intervals are defined. Basis functions represent either single variable
transformations or multivariable interaction terms. As basis functions are added, the
model becomes more flexible and more complex, and the process continues until a user-
specified maximum number of basis functions is reached.
3
In the second stage, basis functions are deleted in order of least contribution to the model
until an optimal model is found. By allowing for any arbitrary shape for the function as
well as for interactions, and by using this two-stage model selection method, MARS is
capable of reliably tracking the very complex data structures that often hide in high-
dimensional data.
The MARS output contains an easy-to-deploy regression model that can be simply applied
to new data from within MARS or exported as C-, SAS- or XML/PMML-compatible code.
To facilitate interpretation of the model, the output includes interpretive summary reports
as well as exportable two- and three-dimensional curve and surface plots.
The best way to learn about MARS is, to use it so lets start now!
About this User Guide

The remainder of this User Guide is organized as follows:
Chapter 2. Installing and Starting MARS

Chapter 3. MARS Basics
Chapter 4. Setting Model Control Parameters and Refining Models
Chapter 5. Interpreting Model Results
Chapter 6. Hands-On Tour of the Graphical User Interface
Chapter 7. Using Command-Line Control and Batch Mode
Appendix I. Guide to the BASIC Programming Language
Appendix II. Further Reading and References
Because MARS is such a new tool, this User Guide assumes no prior knowledge of the
methodology underlying MARS or familiarity with the output. The main body of this User
Guide, Chapters 3, 4 and 5, contains an extensive discussion of how to use the technique
and how to interpret the results.
If you have already installed MARS and would like to begin immediately, see
Chapter 6 for hands-on advice. Later you can check for the details provided
in the rest of the manual.
MARS for Windows incorporates alternative control modes that extend the programs
features and capabilities. In addition to controlling MARS with the graphical user interface
(GUI), you can also issue commands at the command prompt or submit a command file.
Chapter 6 provides a hands-on tour to introduce you to MARS for Windows GUI, menus
and dialogs. Chapter 7 describes the situations in which you may want to take advantage
of the two alternative modes of control and provides a guide to using the command-line
and batch file features.
4 Chapter 1: Introduction to MARS
The current release of MARS for UNIX platforms is entirely command-line driven but all the
graphical results can be displayed on a PC using the MARS model viewer. Because the
Windows version has command analogs that are automatically displayed in the Command
Log window, users running MARS on UNIX platforms may find it instructive to start with a
Windows version. This will enable you to use the GUI to set up your MARS models, view
and learn the commands in the MARS Command Log, and save the command file (which
you can subsequently submit on the UNIX platform). In this manner, the command log
can be used as a tutor: What would the commands have been to accomplish what I just
did in the graphical user interface? Windows MARS can also be used to display all UNIX
results. See the sections on UNIX MARS for details.
MARS also offers an integrated BASIC programming language that allows you to define
new variables, modify existing variables, access mathematical, statistical and probability
distribution functions and define flexible criteria to control case deletion. BASIC commands
are implemented through the command interface, either interactively or via batch command
files. Appendix I provides a detailed guide to using BASIC.
If you would like to read more about MARS and its history, Appendix II contains suggestions
for further readings. In particular, for a detailed technical discussion of the MARS
methodology, see Friedman, 1991a (also available on the MARS distribution CD as a .pdf
file).
5
Chapter 2: Installing and Starting MARS
T his chapter provides system requirements and instructions for installing and starting
MARS for Windows 95/98/NT/2000. For guidance on installing MARS software on a
UNIX platform, see the documentation accompanying the software.
System Requirements
To install and run MARS for Windows, the minimum hardware you need includes:
80486 processor or higher,
64 MB of random-access memory (RAM),
hard disk with 15 MB of free space for program files,
additional hard disk space for scratch files (with the required space contingent
on the size of the input data set),
CD-ROM drive, and
Windows 95/98, Windows 2000, Windows ME, or Windows NT 4.+.
For optimal performance, we strongly recommend that MARS run on a Pentium machine
with 64 megabytes of memory or more. Because MARS is CPU intensive, the faster your
CPU, the faster MARS will run.
Installation Procedure
To install MARS:
1. Launch Windows.
2. Insert the CD into the CD drive.
3. From the Start menu, select Run.
4. In the Run dialog box, type the letter of your CD drive followed by :\SETUP and
press Enter.
5. Respond to the questions about where you want to install MARS.
6. Click Finish to exit the Installer.
READ ME if you are installing MARS as an upgrade to a previous version!

If you are upgrading to MARS from an earlier version, the installer creates by default a new
folder containing the new MARS. Your current MARS files are not affected.
READ ME if reading or writing .SYS Files By default, Windows hides file types with
.sys extensions. To allow Windows to let you see SYSTAT 1.0-7.0 files (which also have
a .sys extension), select Folder Options from the View menu in Windows Explorer. Click
the View tab and check Show All Files.
READ ME if running Windows NT 4.+ If you are running Windows NT, permissions
may require modification so that MARS can write temporary files to the hard drive.
6 Chapter 2: Installing and Starting MARS
Starting MARS for Windows

Start MARS by double-clicking the MARS icon or by selecting MARS from the Program
Manager in the Start menu. The first time you launch MARS, you will be prompted for a
serial number. Enter the serial number provided with your distribution CD and press
Enter.
MARS for Windows takes advantage of Windows' preemptive multitasking

ability, so you can start a MARS run and then switch to other Windows
tasks. Be aware that performance in MARS and your other active applications
will decrease as you open additional applications. If MARS is running slowly
you may want to close down other applications.
Preparing Your Data for MARS

Accessing your data regardless of the original file format
All Salford Systems data analysis software can access data in over 80 file formats. To
turn on this capability select the File pulldown menu and make sure that the Use DBMS/
COPY" item is checked. Selecting the item turns the check mark on and off. If MARS
cannot locate the DBMS/COPY directory it will ask you to find it in an explorer-type
dialog; MARS will then remember the location for future use.
Setting up dynamic link to DBMS/COPY

To link MARS with DBMS/COPY, select Use DBMS/COPY from the File menu. When
you open your first file, a dialog box appears that instructs you to specify the location of
DBMS/COPYs directory on your machine. The directory location is retained by MARS
so you will not be prompted for this information again in subsequent MARS sessions
unless you move and/or rename the DBMS/COPY folder.
Making sure that you are using the most up-to-date file access drivers
If you have previously installed DBMS/COPY you may have an older version that does not
support recently introduced file formats. If MARS detects an older version during the
install procedure it will place a newer version in a DBMSCOPY directory beneath your
MARS directory. We recommend that you use the DBMS/COPY live update over the
web to keep your file access software current (start up DBMS/COPY, Choose "Skip This"
on the introductory menu and select "DBMSCOPY on the Web" from the help menu).
Using ODBC
To access data stored in a SQL-based system, use the ODBC Query Builder in DBMS-
COPY to extract the data you need and save it as a flat file file in your preferred format.
See the accompanying DBMS-COPY Users Guide for further instructions.
7
Variable Names
Variable names should not begin with an underscore, a numeric value or any other non-
alphabetical character (e.g., $, &, ^) and should not contain blank spaces. When you try
to compute a model using an illegal variable, MARS will issue the following error message:
ABOUT HERE MARS EXPECTS A NUMERIC VARIABLE
For MARS 1.x and 2.0 variable names also should not exceed eight characters. MARS
will truncate anything that exceeds this limit. If this results in duplicate variable names,
the names will automatically be modified so that each variable has a unique name. The
last character in each duplicate name is replaced with a sequential number beginning
with one. Later versions of MARS will support long variable names (up to 36 characters).
Character or Text Data
MARS 2.0 and earlier versions of MARS will not permit text variables to be used as either
target or predictor variables. You will need to convert text variables to numeric equivalents
before they can appear in models. A later release of MARS will permit use of text
variables directly; please check with Salford Systems for availability.
8 Chapter 2: Installing and Starting MARS
9
Chapter 3: MARS Basics
T his chapter describes whats under the hood, beginning with why MARS engine is
both unique and innovative. Because MARS is such a new tool, we assume no prior
knowledge of the adaptive modeling methodology underlying MARS. To put this
methodology into context, the first section discusses the modelers challenge and
addresses how MARS meets this challenge. The remaining sections provide detailed
explanations of how the MARS model is generated, how MARS handles categorical
variables and missing values, how the optimal model is selected and, finally, how testing
regimens are used to protect against overfitting.
The Modelers Challenge

The modelers task, to accurately predict y given some variable x, can be thought of as
approximating the generic function, y = f(x) + noise. In tackling this task, the primary
challenges are four-fold:
1) Which predictor (independent) variables should be used?

2) How do the predictor variables combine to generate y; i.e., what is the
mathematical form for f(x)?
3) What is the underlying functional form for each predictor, e.g., log, square root,
power, inverse, S-shape?
4) What interaction terms are needed and what degree of interaction?
The conventional approach employed in classical statistical modeling is the process of

trial and error. Based on domain expertise and past experience, the modeler:
specifies a plausible model,

generates plausible competing models,
diagnoses residuals,
tests natural hypotheses (e.g., do specific demographics affect behavior?),
assesses performance via goodness of fit, lift curves, Lorenz curves, etc.,
compares the performance to other models on similar problems, and
revises the model in light of these tests, diagnoses, and comparisons.
Because estimating the model is typically the last step in the analysis process, time
often does not permit adequate attention to be paid to each of these steps. A model is
declared final because improvements in the model become negligible or, as is usually
the case with large databases, time has simply run out. The quality of the final results is
usually highly dependent on the skill of the modeler; however, even expert modelers can
overlook important effects and be fooled by data anomalies (e.g., multivariate outliers,
data coding errors, etc.).
10 Chapter 3: MARS Basics
The modern approach to modeling is different from this conventional approach in that it is
much more data rather than user driven. Modern analysis tools, which began to emerge
in the 1980s, are very computer intensive, usually combining intelligent algorithms and
brute force searches. Examples of such tools include neural nets, genetic algorithms,
rule induction and decision trees.
Some of the modern methods, given a target variable and a set of candidate predictor
variables, let the data dictate the functional form. For example, Generalized Additive
Models (GAM) determine the functional form for a variable. Other methods are even more
fully automated. CART and MARS, for example, automatically determine both variable
selection and functional form.
Of course, it is important not to be overly data driven. A priori knowledge is very valuable
and can help shape a model when several alternatives are all consistent with the data.
Domain expertise can help detect errors (e.g., price increases should reduce quantity
demanded) and user-imposed constraints can yield better models.
Ultimately, the intended use of the final model will influence how it should be developed.
For example, if predictive accuracy is the sole criterion by which a model is to be assessed,
its complexity and comprehensibility are irrelevant. Therefore, some modern methods,
such as boosted decision trees or neural nets, that do not yield easy to understand
models, can be utilized.
On the other hand, if understanding the data generation process is important, a model
that can be easily understood is desirable. In this situation, the modeler wants to be able
to tell a story and use the insights gained to make decisions; thus, a single decision tree
or regression model can be used to yield results that represent the data and to assist in
understanding the underlying data patterns and relationships.
Global Parametric Modeling vs. Local Nonparametric Modeling

Global parametric models, such as linear and logistic regression, while relatively easy
and quick to compute, are only accurate if the specified model is a reasonable
approximation to the true underlying function. Typical parametric models have limited
flexibility, usually performing best when simple. Further extensions to model specification,
such as polynomials in predictors, can mistrack. If the true function is sufficiently complex,
a good approximation may, in reality, be impossible.
For smaller data sets, of course, the modeler is forced to use parametric modeling
techniques. When data points are scarce, all points will influence almost every aspect of
the model. The best example is the simple linear regression where all data points are
used to help locate the regression line.
In summary, the key strengths of global parametric models are that they can be very
accurate when developed by expert modelers and they are efficient vis--vis data use.
Their weaknesses include vulnerability to outliers and subtleties missed by the modeler.
11
Nonparametric models, in contrast, are developed locally rather than globally. The extreme,
of course, is to simply reproduce the data exactly; however, this is not a useful model! A
simplification or summary of the data is needed smoothing is one example of such
simplification. For example, if the objective is to summarize how a target variable y
behaves in a small region of data containing low values of X1, a single value for the entire
region (e.g., median or mean) can be used or a curve, surface or regression can be fitted
to the region. Then, developing a separate summary in the remaining regions of the X1
predictor space paints a picture illustrating how y behaves in the entire range of X1
values. Painting the complete picture may require some cross-region smoothing to join
the functions in the neighboring regions.
Smoothing Techniques
Common smoothing techniques, available in several statistical packages, include:
n running mean
n running median
n Distance Weighted Least Squares (DWLS, fitting a new regression at each value
of X and then down-weighting points by their distance from the current value of X)
n LOWESS (locally-weighted regression smooths)
n LOESS (same as LOWESS but down-weights outliers from local regressions)
Almost all smoothes require the choice of a tuning parameter, typically a window size
indicating how large a fraction of the data to use when evaluating a smooth for any value
of X. The larger the window size, the less local and the more smooth the result. For
example, a median smooth using a 100% window and a very flexible smooth using 5% of
the data are illustrated below. The 5% window appears to impose too little structure on
the data points while the 100% window appears to impose too much structure.
Median Smooth MV on LSTAT Super Flexible Smooth MV on

Using 100% Window LSTAT Using 5% Window
60 60
50 50
40 40
MV
MV
30 30
20 20
10 10
0 0
0 10 20 30 40 0 10 20 30 40
LSTAT LSTAT
The goal of non-parametric modeling is to predict y as a function of X. To estimate the

expected value of y for a specific set of Xs, data records with that specific set of Xs must
be available. If too few (or no) data points with that specific set of Xs are available, the
modeler is forced to make do with data points that are close. Data points that are not
quite so close can be down-weighted and used; however, bringing points from further
away will introduce bias and raises the following questions: How should a local neighborhood
be defined? (In other words, what is close?) How should data points far away from a
specific combination of Xs be down-weighted?
The size of the neighborhood can be selected by the user or can be determined via
experimentation or cross validation. A large number of weighting functions, or kernels,
are available for down-weighting far-away observations. The specific kernel used, however,
is less important than the neighborhood or bandwidth size.
Kernel Smoothes and Density Estimators

A generic kernel density estimator is represented by:
K( ) is the weighting function, also known as the kernel function, that integrates to 1. The
function is typically a bell-shaped curve like the normal, so the weight declines with the
distance from the center of the interval.
The bandwidth, or size of the window about X, is represented by b. For some kernels,
data outside the window have a weight equal to zero. Within the data window, K( ) could
be a constant. The smaller the window, the more local the estimator.
The fitted value of y corresponding to predictor data value X is:
As evident by the formula, kernels are used to weight all observations in the neighborhood.
The fitted value of y corresponding to X is a weighted average of y-values. Note that X
need not be an observed data value.
Bias-Variance Tradeoff in Global and Local Modeling

The more global a model, the more likely it is to be biased for at least some regions of X
(i.e., the expected result is systematically too high for some values of X and too low for
other values). However, given that global models make use of all the data, they should
have low variance (i.e., stable results from sample to sample).
13
The more local a portion of a model, the higher the variance is likely to be because the
amount of relevant data is small; however, the localization is faithful to the data and thus
minimizes bias. Given that simple global models tend to be stable but biased, and more
complex local models tend to have the reverse properties, the challenge to is find the
optimal balance between bias and variance (i.e., to minimize mean squared error [MSE]).
A classic example from insurance risk assessment illustrates this tradeoff. The assessor
is estimating the risk that a restaurant located in a small town will burn down. Because
it is located in a small town and, thus, few observations are available, the assessor
borrows data from neighboring towns. These data are perhaps less relevant, but are
used in the absence of any other information.
Quantifying the Bias-Variance Tradeoff A squared error loss function typical for any
approximation to the non-linear function f(x) can be defined as:
Variance + Bias2
The variance portion measures how different the model predictions would be from training
sample to training sample; in other words, it answers the question, how stable are the
results? The bias portion measures the tendency of the model to systematically mistrack.
Note that MSE is sensitive to outliers, so alternative criteria may be more robust.
The bias-variance tradeoff is real; thus, the modeler will often want to permit some bias in
the model. The availability of repeated observations for every possible value of the predictor
vector x is the only way to completely avoid bias.
Fatal Flaw in Nonparametric Modeling: The Curse of Dimensionality

Most of the research in fully-nonparametric models focuses on functions with 1, 2 or 3
predictor variables. In Multivariate Density Estimation (Wiley, 1992), Scott suggests a
practical limit of five dimensions. More recent work may have pushed this up to eight
dimensions, but attempting to use these ideas directly in the context of most market
research or data mining contexts is hopeless.
For example, suppose we decide to look at only two regions for each variable in a database,
values below average and values above average. Given two predictors, four regions will
need to be investigated: low/low, low/high, high/low, and high/high. Similarly, with three
variables, eight regions will need to be investigated, with 4 variables, 16 regions, etc.
Now consider 35 predictor variableseven with only two intervals per variable, 235 (or 34
billion) regions, most of which will be empty, will need to be examined!
Given the number of records in most data sets, it is infeasible to approximate the function
y=f(x) by summarizing y in each distinct region of x. For some variables, two regions may
not be enough to track the specifics of the function. If the relationship of y to some xs is
different in 3 or 4 regions, for example, the number of regions requiring examination is
even larger than 34 billion with only 35 variables. Given that the number of regions cannot
be specified a priori, specifying too few regions in advance can have serious implications
for the final model.
A solution is needed that accomplishes the following two criteria:
judicious selection of which regions to look at and their boundaries

judicious determination of how many intervals are needed for each variable
(e.g., if a function is very squiggly in a certain region, many intervals are
required, whereas if a function is a straight line, only one interval is needed)
Given these two criteria, a successful method will essentially need to be ADAPTIVE to
the characteristics of the data. Such a solution will probably ignore quite a few variables
(affecting variable selection) and will take into account only a few variables at a time (also
reducing the number of regions). Even if the method selects 30 variables for the model, it
will not look at all 30 simultaneously. Such simplification is accomplished by a decision
tree at a single node, only ancestor splits are being considered; thus, at a depth of six
levels in the tree, only six variables are being used to define the node.
MARS Smoothing, Splines and Knots Selection

The two classical splines used in mathematics, discussed here for historical reasons,
are:
1) interpolating splines a spline passes through every data point (curve drawing),
and
2) smoothing splines the curve needs to be close to the data points.
To estimate the most common form, the cubic spline, a uniform grid is placed on the
predictors and a reasonable number of knots are selected. A cubic regression is then fit
within each region. This approach, popular with physicists and engineers who want
continuous second derivatives, requires many coefficients (four per region) to be estimated.
Normally, three constraints, which dramatically reduce the number of free parameters,
can be placed on cubic splines:
1) curve segments must join,

2) continuous first derivatives at knots (higher degree of smoothness), and
3) continuous second derivatives at knots (highest degree of smoothness).
15
Piece-wise linear regression splines, the simplest version of splines, have been well
known for some time. Instead of fitting a single straight line to the data, the regression is
allowed to bend. For example, a MARS spline with three knots is illustrated on the left
with the actual data shown on the right:
50 60
50
40
ESTIMATE
40
MV
30 30
20
20
10
10 0
0 10 20 30 40 0 10 20 30 40
LSTAT LSTAT
A key concept underlying the spline is the knot. A knot marks the end of one region of
data and the beginning of another. Thus, the knot is where the behavior of the function
changes. Between knots, the model could be global (e.g., linear regression).
In a classical spline, the knots are predetermined and evenly spaced, whereas in MARS,
the knots are determined by a search procedure. Only as many knots as needed are
included in a MARS model. If a straight line is a good fit, there will be no interior knots. In
MARS, however, there is always at least one pseudo knot that corresponds to the
smallest observed value of the predictor; this topic is revisited below.
With only one predictor and one knot to select, placement is straightforward: test every
possible knot location and select the model with the best fit (i.e., the smallest SSE). An
additional constraint requiring a minimum amount of data in each interval can also be
imposed to prevent one knot from being placed too close to another.
In determining the exact knot location, all possible values on the real line cannot be
considered without considerable computer resources, and, in reality, only actual data
values are examined. It is also advantageous to allow points between actual data values
(e.g., mid-point) to be examined. For example, a better fit might be obtained if a change
in slope is allowed at a mid-point rather than at an actual data value.
Finding the one best knot in a simple regression is a straightforward search problem:
simply examine a large number of potential knots and choose the one with the best R-
squared. However, finding the best pair of knots requires far more computation, and
finding the best set of knots when the actual number needed is unknown is an even more
challenging task.
MARS finds the location and number of needed knots in a forward/backward stepwise
fashion. A model which is clearly overfit with too many knots is generated first, then
those knots that contribute least to the overall fit are removed. Thus, the forward knot
selection will include many incorrect knot locations, but these erroneous knots will even-
tually (although this is not guaranteed), be deleted from the model in the backwards
pruning step.
Strictly speaking, there may be no true set of knot locations as the true function may in
fact be smooth. For example, the true flat top function illustrated below on the left has
two knots at X=30 and X=60. The observed data, displayed on the right, contain random
error. The best single knot is at X=45 and this is the knot MARS finds first.
0 10
0
-10
-10
-20
-20
YACT
Y
-30 -30
-40
-40
-50
-50
-60
-60 -70
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
X X
As the number of knots allowed in a forward search is increased, MARS finds the following
approximations to the flat-top function:
50 10 10
0 0
-10 -10
0
-20 -20
M0Y
M1Y
M2Y
-30 -30
-40 -40
-50
-50 -50
-60 -60
-100 -70 -70
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90100
X X X
10 10 10
0 0 0
-10 -10 -10
-20
-20 -20
-30
M3Y
M4Y
M5Y
-30 -30
-40
-40 -40
-50
-50 -60 -50
-60 -70 -60
-70 -80 -70
-80 -90 -80
0 10 20 30 40 50 60 70 80 90100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90100
X X X
17
MARS Basis Functions
Thinking in terms of knot selection works very well to illustrate splines in one dimension;
however, this context is unwieldy for working with a large number of variables simultaneously.
Both concise notation and easy to manipulate programming expressions are required. It
is also not clear how to construct or represent interactions using knot locations.
In MARS, basis functions are the machinery used for generalizing the search for knots.
Basis functions are a set of functions used to represent the information contained in one
or more variables. Much like principal components, basis functions essentially re-express
the relationship of the predictor variables with the target variable.
The hockey stick basis function is the core building block of the MARS model and is often
applied to a single variable multiple times. The hockey stick function maps variable X to
new variable X*:
max (0, X -c), or

max (0, c - X)
where X* is set to 0 for all values of X up to some threshold value c and X* is equal to X for
all values of X greater than c. (Actually X* is equal to the amount by which X exceeds
threshold c.) For example, consider a predictor variable X, ranging from 0 to 100. Eight
basis functions, all graphed with the same dimensions, are displayed below for c=10, 20,
30, 40, 50 , 60, 70 and 80.
100
90
80
70
60
Value
50 BF10
40 BF20
BF30
30 BF40
20 BF50
BF60
10
BF70
0 BF80
0 20 40 60 80 100 120
X
BF10 is offset from the original value by 10 whereas BF80 is zero for most of its range.
Such basis functions can be constructed for any value of c. MARS in fact considers
constructing one for every possible data value.
To illustrate how MARS uses hockey stick functions to represent splines, lets look at the
Boston Housing dataset analyzed in the pricing study by Harrison and Rubinfeld (1978).
(The data set, Boston.syd, is included with your MARS software and can be found in the
Sample Datasets folder in your MARS directory.)
Harrison and Rubinfeld studied the relationship between quality of life variables and 1970
property values in Boston. The variables examined for 506 census tracts included:
MV median value of owner-occupied homes in tract (000s)

CRIM per capita crime rates
NOX concentration of nitrogen oxides (pphm)
AGE percent built before 1940
DIS weighted distance to centers of employment
RM average number of rooms per house
LSTAT percent neighborhood lower SES
RAD accessibility to radial highways
ZN percent land zoned for lots
B measure of black population
CHAS borders Charles River (0/1)
INDUS percent non-retail business
TAX tax rate
PT pupil teacher ratio
Summary statistics for the Boston Housing data and a frequency bar chart for the target
variable, MV, are provided below:
Variable N Mean Std Dev Minimum Maximum

CRIM 506 3.6135225 8.6015425 0.0063200 88.9761963
ZN 506 11.3636364 23.3224530 0 100.0000000
INDUS 506 11.1367741 6.8603497 0.4599998 27.7399902
CHAS 506 0.0691700 0.2539940 0 1.0000000
NOX 506 0.5546949 0.1158777 0.3850000 0.8709998
RM 506 6.2846327 0.7026170 3.5609989 8.7799988
AGE 506 68.5748845 28.1488568 2.8999996 100.0000000
DIS 506 3.7950416 2.1057095 1.1295996 12.1264954
RAD 506 9.5494071 8.7072594 1.0000000 24.0000000
TAX 506 408.2371542 168.5371161 187.0000000 711.0000000
PT 506 18.4555289 2.1649459 12.5999985 22.0000000
B 506 356.6739363 91.2948426 0.3199999 396.8999023
LSTAT 506 12.6530591 7.1410593 1.7299995 37.9700012
MV 506 22.5328007 9.1971031 5.0000000 50.0000000
19
MV
120
100 0.2
Proportion per Bar

80
Count
60
0.1
40
20
0 0.0
0 10 20 30 40 50 60
MV
Pairwise scatter plots with smooths of the core variables, displayed below, clearly suggest
some non-normal distributions as well as non-linear relationships.
MV RM LSTAT AGE
MV
MV
RM
RM
LSTAT
LSTAT
AGE
AGE
MV RM LSTAT AGE
Some additional pairwise scatter plots also clearly suggest the presence of nonlinear
relationships and indicate that interaction terms are probably warranted.
60 60 60 60
50 50 50 50
40 40 40 40
MV
MV
MV
MV
30 30 30 30
20 20 20 20
10 10 10 10
0 0 0 0
0 10 20 30 40 50 60 70 80 90 0 20 40 60 80 100 120 0 10 20 30 0.3 0.4 0.5 0.6 0.7 0.8 0.9
CRIM ZN INDUS NOX
60 60 60 60
50 50 50 50
40 40 40 40
MV
MV
MV
MV
30 30 30 30
20 20 20 20
10 10 10 10
0 0 0 0
0 10 20 30 100 200 300 400 500 600 700 800 10 15 20 25 0 5 10 15
RAD TAX PT DIS
21
To illustrate how hockey stick functions represent splines, lets first define a hypothetical
basis function BF1 on the variable INDUS:
BF1 = max (0, INDUS-4).
Then, instead of using INDUS in a regression, we use the following function:
y = constant + b1 * BF1 + error.
The effect of INDUS on the dependent variable is 0 for all values below 4 and b1 for values
above 4. Now consider a second basis function, BF2 = max (0, INDUS-8). The regression
function is:
y = constant + b1 * BF1 + b2 * BF2 + error,
and the effect of INDUS on y is:
0 for INDUS <= 4

b1 for 4<= INDUS <= 8
b1 + b2 for INDUS > 8.
Summary statistics for the original INDUS variable and the two basis functions, BF1 and
BF2, are displayed below:
BF1 BF2 INDUS
N of cases 506 506 506
Minimum 0.000 0.000 0.460
Maximum 23.740 19.740 27.740
Mean 7.374 4.620 11.137
Standard Dev 6.569 5.378 6.860
Number=0 93 217 0
Note that the maximum values of the BFs are just shifted maximums of the original
INDUS maximum value; however, the mean is not simply shifted as the max( ) function is
a non-linear transform of INDUS.
An alternative notion for the basis function is (X - knot)+, which has the exact
same meaning as MAX(0, X - knot).
The spline for a MARS regression of MV on INDUS with one basis function and the spline
with two basis functions are displayed below. In the one basis function model, the slope
starts at zero and then becomes -0.659 after INDUS=4. In the two basis function model,
the slope starts at 0, becomes 2.439 after INDUS=4, and then 0.224 (- 2.439 + 2.215)
after INDUS=8.
MARS: MV on INDUS (1 basis function)

MV = 27.395 0.659 * (INDUS 4)+
30
25
Predicted MV
20
15
10
0 10 20 30
INDUS
MV = 30.290 - 2.439*(INDUS - 4)+ + 2.215*(INDUS-8) +
35
30
Predicted MV
25
20
15
0 10 20 30
INDUS
23
A standard basis function, (X - knot)+, does not provide for a non-zero slope for values
below the knot. To handle this, MARS uses a mirror image basis function, as shown
below on the left. The standard basis function is displayed on the right.
20 90
80
70
15
60
BF20R
BF20
50
10
40
30
5 20
10
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120
X X
The mirror-image hockey stick function looks at the interval of a variable X, which lies
below the threshold c. Consider, for example, BF = min (0, 20 - X) displayed below. The
left panel is for the mirror image BF and the right panel displays the basis function BF*-1:
20 20
10 10
BF20RR
BF20R
0 0
-10 -10
-20 -20
0 20 40 60 80 100 120 0 20 40 60 80 100 120
X X
The basis function is downward sloping at 45 degrees, taking on the value 20 when X=0.
It declines until hitting 0 at X=20 and remains 0 for all other X. The mirror-image function
is just a mathematical convenience: with a negative coefficient it yields any needed
slope for the X interval, 0 to 20.
We now have the following three basis functions for INDUS:
BF1 = max (0, INDUS - 4),

BF2 = max (0, INDUS - 8), and
BF3 = min (0, 4 - INDUS).
The regression equation is:
MV = 29.433 + 0.925*(4 - INDUS)+ -2.180*(INDUS-4)+ +1.939*(INDUS-8)+
As displayed below, all three line segments have a negative slope even though two of the
coefficients are greater than zero:
ESTIMATE 40
30
20
10
0 10 20 30
INDUS
By their very nature, any hockey stick function defines a knot where a regression can
change slope. Running a regression on hockey stick functions is equivalent to specifying
a piecewise linear regression. Thus, the problem of locating knots is now translated into
the problem of defining basis functions.
As noted above, basis functions are much more convenient to work with mathematically.
For example, you can interact a basis function transforming one variable with a basis
function representing another variable. In addition, the programming code to define a
basis function is straightforward.
Mirror-Image Basis Functions

As noted above, MARS creates basis functions in pairs: a mirror image always joins the
standard basis function. Thus, there are twice as many basis functions possible as there
are distinct data values. The mirror image, reminiscent of CARTs left and right child
nodes from a parent node split, is required by MARS so that it can ultimately find the right
model.
The functions are not all linearly independent but do increase the flexibility of the model.
For a given set of knots, only one mirror image basis function will be linearly independent
of the standard basis functions. Further, it does not matter which mirror image basis
function is added as they will all yield the same model. However, using the mirror image
instead of the standard basis function at any knot will change the model.
25
To illustrate that the addition of a particular mirror-image basis function does not affect the
final model, lets force MARS to keep extra basis functions when regressing INDUS on
MV; the set of basis functions and the resulting plot are now:
BF1 = max (0, INDUS - 8.140) 35

BF3 = max (0, INDUS - 6.410)
BF7 = max (0, INDUS - 8.560)
30
BF9 = max (0, INDUS - 3.970)
ESTIMATE
BF11 = max (0, INDUS - 5.190)
25
20
15
0 10 20 30
INDUS
The following results summarize a regression adding BF1R, the mirror image of BF1, and
a second regression adding BF3R, the mirror image of BF3. The two models are identical
with the exception of a shift in the estimated beta coefficients on the added mirror image
functions.
Model 1: Dep Var: MV N: 506 Multiple R: 0.580 Squared multiple R: 0.336

Adjusted squared multiple R: 0.328 Standard error of estimate: 7.539
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT 31.056 4.342 0.000 . 7.153 0.000
BF1 19.390 4.972 11.229 0.000 3.899 0.000
BF3 -16.166 2.716 -10.446 0.000 -5.952 0.000
BF7 -11.734 4.095 -6.581 0.000 -2.865 0.004
BF9 -8.818 2.176 -6.304 0.001 -4.053 0.000
BF11 17.015 3.142 11.652 0.000 5.415 0.000
BF1R 0.022 0.804 0.005 0.037 0.028 0.978
Model 2: Dep Var: MV N: 506 Multiple R: 0.580 Squared multiple R: 0.336

Adjusted squared multiple R: 0.328 Standard error of estimate: 7.539
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT 31.095 2.982 0.000 . 10.428 0.000
BF1 19.412 4.900 11.242 0.000 3.962 0.000
BF3 -16.188 2.780 -10.461 0.000 -5.823 0.000
BF7 -11.734 4.095 -6.581 0.000 -2.865 0.004
BF9 -8.818 2.176 -6.304 0.001 -4.053 0.000
BF11 17.015 3.142 11.652 0.000 5.415 0.000
BF3R 0.022 0.804 0.004 0.079 0.028 0.978
Although the coefficients for BF1 and BF3 have been shifted, the two regressions give
exactly the same predictions.
Generation of Basis Functions

MARS generates basis functions by searching in a stepwise manner. It starts with just a
constant in the model and then begins the search for a variable-knot combination that
improves the model the most (or, alternatively, worsens the model the least). The
improvement is measured in part by the change in MSE. Adding a basis function always
reduces the MSE. (Degrees of freedom and imposing penalties are revisited below.)
MARS searches for a pair of hockey stick basis functions, the primary and mirror image,
even though only one might be linearly independent of the other terms. This search is
then repeated, with MARS searching for the best variable to add given the basis functions
already in the model. The brute search process theoretically continues until every possible
basis function has been added to the model.
The MARS technology is similar to the CART methodology in that a model is deliberately
overfit and then pruned back. The core notion is that a good model cannot be built from a
forward stepping plus stopping rule; rather, the model must be generously overfit and then
the unneeded basis functions removed. However, the model still needs to be limited due
to the intensity of the search that may be required with larger datasets. For example, with
400 variables and 10,000 records, there are potentially 400*10,000 or 4 million knots to
examine just for the main effects. Even if most variables have a limited number of distinct
values (e.g., dummies only allow one knot, age may only have 50 distinct values), the
total number of possible knots will be very large.
In practice, the user specifies an upper limit for the number of knots to be generated in the
forward stage. The limit should be large enough to ensure that the true model can be
captured. A good rule of thumb for determining the minimum number is three to four times
the number of basis functions in the optimal model. This limit may have to be set by trial
and error. (See also Chapter 4 for advice on this topic.)
27
Handling of Categorical Predictor Variables

In classical modeling, a categorical variable is expanded into a set of dummy variables,
one for each level of the categorical predictor. The set of dummies, mutually exclusive
and collectively exhaustive, are used as inputs in the classical model (possibly an omitted
reference level). MARS also generates dummy variables, but the dummies represent
collections of levels of the predictor. MARS may generate fewer dummies than there are
levels of the predictor and the dummies generated may overlap and thus may not be
mutually exclusive. Consider, for example, a variable REGION with four levels representing
North, East, South and West. MARS might generate the following basis functions (or
dummy variables):
REGION 1010
REGION 1001
REGION 0110
The first basis function represents levels 1 and 3 (North and South), the second repre-
sents levels 1 and 4 (North and West), and the third represents 2 and 3 (South and
East). In each case, MARS has found some reason to group the four levels in these
patterns. MARS can easily create conventional dummies such as 1000 (North) or 0100
(East); whether it does so depends on which dummies improve the model the most.
Theoretically, one dummy could be created for every level of the categorical predictor. In
practice, however, levels are almost always grouped together, with MARS combining
levels that are similar in context. For each dummy created, a complementary basis
function is also implicitly created (e.g., 1010 is the complement to 0101). For example,
returning to the Boston Housing data, when RAD (accessibility to radial highways) is
declared categorical in the model, MV = constant + INDUS + RAD, MARS reports the
following in the text output:
Categorical Predictor Variables: 1

Variable NLEV Actual Internal Counts
2 RAD 9 1. 1 20
2. 2 24
3. 3 38
4. 4 110
5. 5 115
6. 6 26
7. 7 17
8. 8 24
24. 9 132
The basis functions, two of which are categorical basis functions for RAD, are displayed
below. Note that the basis functions representing the categorical predictors are not
graphed.
BF1 = max(0, INDUS - 8.140);

BF3 = ( RAD = 1 OR RAD = 4 OR RAD = 6 OR RAD = 24);
BF5 = ( RAD = 4 OR RAD = 6 OR RAD = 8);
BF7 = max(0, INDUS - 10.810);
BF9 = max(0, INDUS - 6.960);
BF11 = max(0, INDUS - 5.190);
BF13 = max(0, INDUS - 3.970);
Classic Forward Knot Placement Report

The Forward Knot Placement Report in the text output displays the basis functions in the
order in which they are entered in the MARS model. As noted above, basis functions are
added until the maximum number of basis functions allowed is reached. For example,
the Forward Stepwise Knot Placement Report for the model, MV = constant + INDUS +
RAD, is displayed below.
Forward Stepwise Knot Placement

===============================
BasFn(s) GCV IndBsFns EfPrms Variable Knot Parent BsF

0 84.754 0.0 1.0
2 1 62.906 2.0 9.7 INDUS 8.140
4 3 61.002 3.0 17.3 RAD 100101001
6 5 61.565 4.0 25.0 RAD 000101010
8 7 62.025 5.0 32.7 INDUS 10.810
10 9 63.828 6.0 40.3 INDUS 6.960
12 11 65.379 7.0 48.0 INDUS 5.190
14 13 66.332 8.0 55.7 INDUS 3.970
15 67.898 9.0 63.3 RAD 011101000
The constant is always entered into the model first as BF0. The next two basis functions,
BF1 and BF2, are mirror image functions for INDUS with the knot located at 8.140. The
basis function corresponding to the upper portion of the variable is numbered first. Thus,
BF1 is (INDUS-8.140)+ and BF2 is (8.140-INDUS)+.
Next, two dummies for RAD (RAD=1,4,6,24 and RAD=4,6,8) and their complements are
entered as BF3, BF4, BF5 and BF6. MARS continues to add basis functions, adding
BF7-BF15, until the maximum allowed number of basis functions is reached (in this case,
15).
29
Interpreting MARS Basis Functions

To help you interpret your MARS models, in this section we examine in detail the MARS
basis functions for a stated preference choice experiment conducted in Europe in the
early 1990s. The data set contained primary attributes and demographic information for a
sample of 3,000 individuals interested in cellular phones. The primary attribute variables
included usage charges (USE_FEE, typical usage charge for 100+ minutes) and cost of
the equipment (EQPRICE). Other candidate predictor variables included:
n Sex (GENDER), Income (INCOME), Age (AGE)

n Region of residence (REGION)
n Occupation, Type of Job, Self-employed, etc.
n Length of commute
n Binary indicator for whether had a cell phone in past, and whether have a cell
phone now (CURROWN)
n Typical use for cell phone (business, personal)
n Owns PC, home FAX, portable home phone, etc.
n Average land line phone bill (AVGBILL)
Using conventional logistic regression, the original LOGIT model estimating the response
rate (models dependent variable) included two price variables as well as dummy variables
for levels of all categorical predictors. The final log-likelihood for this model was -133864
(df=31).
The MARS model also contains main effects but with an optimal transformation on prices,
as illustrated below. The end result is the addition of one MARS basis function to the
original model to capture the price spline. The log-likelihood for this model is -133801
(df=32; x2=126 on 1 df).
0.36 0.4
0.34
0.3
E S TIM ATE
E S T IM A T E
0.32
0.30
0.28 0.2
0.26
0.24 0.1
0 50 100 150 200 250 0 50 100 150 200 250
EQPRICE EQPRICE
As illustrated below, the simple relationship between the price predictor variable, EQPRICE,
and the dependent variable, looks reasonably linear but in fact is not linear when the other
variables are controlled for.
0.4
0.3
ME1DPV2
0.2
0.1
0 50 100 150 200 250
EQPRICE
The final set of basis functions for the MARS cell choice model, which we now review one
by one, is reproduced below:
BF1 = max(0, AVGBILL - 4.000);

BF2 = max(0, 4.000 - AVGBILL );
BF3 = max(0, USE_FEE - 5.000);
BF4 = max(0, CURROWN - .351896E-09);
BF6 = max(0, 4.000 - AGE );
BF7 = max(0, MOBILE - .520776E-08);
BF9 = max(0, 80.000 - EQPRICE );
BF10 = ( INCOME > .);
BF11 = ( INCOME = .);
BF12 = max(0, INCOME + .110320E-07) * BF10;
BF13 = max(0, REGION - 12.000) * BF11;
BF14 = max(0, 12.000 - REGION ) * BF11;
BF15 = max(0, GENDER + .432800E-07);
The first pair of basis functions, BF1 and BF2, is a mirror image pair for AVGBILL, the land
line phone bill:
BF1 = max(0, AVGBILL - 4.000) and BF2 = max(0, 4.000 - AVGBILL).
As no other AVGBILL basis functions appear in the model, MARS located a single knot at
4.0 for this variable.
31
Next is an upper basis function, BF3, for USE_FEE (or monthly cost):
BF3 = max(0, USE_FEE - 5.000).
The absence of a mirror image function for USE_FEE implies that there is a zero slope
(read from the reported coefficient) until the knot at 5.0 and then a download slope.
The knot for the next basis function, BF4, is placed at the minimum observed data value
for the variable CURROWN; thus, MARS wants to keep the variable linear. The minimum
observed data value is technically a knot but, practically, it is not.
BF4 = max(0, CURROWN - .351896E-09).
The next set of three basis functions, BF6, BF7, and BF9, is similar to BF4. The next two
functions, BF10 and BF11, are missing value indicator basis functions for INCOME, which
brings us to the next topic, how MARS handles missing values.
Handling of Missing Values

When MARS reads in a data set, each variable is checked for missing values. MARS
automatically creates a missing value indicator for each variable with missing values; for
example, X_mis is created for variable X and is coded as 0 if non-missing and 1 if
missing. If an input data set has 100 variables and each variable has some missing
values, MARS automatically creates one missing value indicator for each of these 100
variables, effectively doubling the number of predictors available to MARS. The automati-
cally created dummy variable can actually be used to indicate both data absent (X_mis=1)
and data present (X_mis=0) and MARS will often use both forms of the dummies in the
same model.
Missing value indicators are essential to MARS modeling for two reasons. First, if a
variable X has missing values, it can only be used in the regression model when X is not
missing. MARS imposes this restriction by using the variable X only when interacted with
the X_mis=0 basis function. The basis function
X * (X_mis=0)
generates the variable X if X is not missing and zero otherwise. Thus, MARS effectively
substitutes 0 for all missing values, a method commonly used in conventional modeling.
Second, the missing value indicators are used to develop surrogate sub-models that
apply only when some needed data are missing. For example, the basis function
Z * (X_mis=1)
generates Z if X is missing and zero otherwise, allowing MARS to use Z in place of X for
cases with missing X values. Note that the coefficient for Z could be quite different from
the coefficients generated for basis functions constructed from X.
In general, if you direct MARS to generate an additive model, no interactions are allowed
between basis functions created from primary variables. MARS does not, however, con-
sider interactions with missing value indicators to be genuine interactions. Thus, an
additive model might contain high-level interactions involving missing-value basis func-
tions such as:
BF = (AGE >.) * (INCOME >.) * max (0, EDUC -12) * (EDUC >.)
This basis function creates an effect for individuals with at least some college who also
have non-missing age and income data. There is no limit on the degree of interaction that
MARS will consider when examining missing value indicators. Also, as shown in the
basis function above, the missing value indicators in interaction basis functions could
indicate data present ( > . ) or data absent ( = . ). Neither is favored by MARS; rather, the
best is entered.
Returning to the cell phone example, the B10 and B14 basis functions in the choice
model presented above are reproduced below:
BF10 = (INCOME > .);

BF11 = (INCOME = .);
BF12 = max(0, INCOME + .110320E-07) * BF10;
BF13 = max(0, REGION- 12.000) * BF11;
BF14 = max(0, 12.000 - REGION ) * BF11.
BF10 is the data present indicator coded 1 when non-missing and 0 when missing, whereas
BF11 is the data absent indicator coded 1 when missing and 0 when non-missing. When
income is available, BF12 is positive and BF13 and BF14 are zero. When income is
missing, BF12 is zero and one of the two basis functions for REGION (BF13 or its mirror
image, B14) is positive while the other is also zero. BF13 and BF14 are thus acting as
surrogates for INCOME when it is missing.
There is no guarantee that a surrogate for all variables with missing values will be found
and kept in the final MARS model; however, MARS will search all possible surrogates in
the basis function generation stage.
Construction of Interaction Terms

Let us now turn from ADDITIVE entry of basis function pairs into a MARS model to entry
of interaction terms. At the users option, MARS will also test interactions with candidate
basis functions in a three-stage process:
33
1) identify candidate pair of basis functions,

2) test contribution when added to model as standalone regressors, and
3) test contribution when interacted with basis functions already in the model.
If the candidate pair of basis functions contribute more when interacted with ONE basis
function already in the model, then an interaction is added to the model instead of a main
effect. Lets return to the Boston Housing data set to examine interaction terms in more
detail.
First, lets look at the main effects model using the following set of variables as candidate
predictors: CRIM, INDUS, RM, AGE, DIS, TAX, PT, and LSTAT. The forward basis function
generation begins with:
BasFn(s) GCV IndBsFns EfPrms Variable Knot
0 84.754 0.0 1.0

2 1 29.201 2.0 13.3 LSTAT 6.070
4 3 21.137 4.0 25.5 RM 6.431
6 5 19.570 6.0 37.8 DIS 1.425
8 7 17.797 8.0 50.1 CRIM 11.160
9 17.474 9.0 61.4 PT 12.600
From a total of 24 generated basis functions, seven make it into the final model. RM, DIS,
PT, TAX, and CRIM all have just one basis function. Thus, each of these variables has a
region with a slope of 0. Two basis functions are included in the final model for LSTAT, a
standard and a mirror image. The main effects model has a regression R2 equal to 0.841.
Lets now rerun the same model but allow MARS to search for interactions. The forward
basis function generation begins with:
0 84.754 0.0 1.0

2 1 29.649 2.0 17.0 LSTAT 6.070
4 3 21.810 4.0 33.0 RM 6.431
6 5 20.518 6.0 49.0 PT 18.600 RM 3
8 7 18.138 8.0 65.0 DIS 1.425
9 16.872 9.0 80.0 TAX 187.000 LSTAT 1
11 10 16.326 11.0 96.0 TAX 296.000 LSTAT 2
13 12 16.827 13.0 112.0 LSTAT 20.620 RM 4
The first two pairs of basis functions for LSTAT and RM are identical to those in the main
effects progression. The third pair, however, differ: (PT - 18.6)+ and (18.6 - PT)+ are
interacted with (RM - 6.431)+.
The Variable column displays the variable name, the Parent column displays the
previously-entered variable participating in the interaction, and the BsF column
displays the number of the basis function involved in the interaction.
As weve seen, MARS builds up its interactions by combining a SINGLE previously entered
basis function with a PAIR of new basis functions. The new pair of basis functions (a
standard and a mirror image) could be a previously entered pair, a new pair for an existing
model variable, or a pair for a new variable. Interactions are thus built by accretionone
of the members of the interaction must first appear as a main effect basis function, then
an interaction can be created involving this term. The second member of the interaction
does NOT need to appear as a main effect; in fact, an analyst might wish to require
otherwise via ex post modification of the model.
How a Variable is Entered Linearly

When no transformation is needed, MARS will enter a variable without genuine knots.
The knot selected will be equal to the minimum value of the variable in the data set. With
this type of knot, there is no lower region of the data and only one basis function is
created. We saw an example of such a basis function in the main effects Boston Housing
model:

9 17.474 9.0 61.4 PT 12.600
Only one basis function number is listed because 12.6 is the smallest value of PT in the
data. You will see this pattern for any variable you require MARS to enter linearly, a user
option to prevent MARS from transforming selected variables.
Generally, an interaction term is region specific, such as:
(PT - 18.6)+ * (RM - 6.431)+
This is not a conventional interaction of PT and RM because the interaction is confined to

the data region where RM<=6.431 and PT<=18.6. In this example, MARS determined
that either there is no RM*PT interaction outside this region or that the interaction is
different.
In the example above, we saw:

9 16.872 9.0 80.0 TAX 187.000 LSTAT 1
11 10 16.326 11.0 96.0 TAX 296.000 LSTAT 2
35
The variable TAX is entered without transformation and interacted with the upper half of the
initial LSTAT spline, BF1. TAX is then entered again as a pair of basis functions interacted
with the LOWER half of the initial LSTAT spline, BF2.
As noted above, by default, MARS fits an additive model. Variable transformations of any
complexity are allowed but interactions are not allowed (with the exception of missing
value indicators) unless you specify otherwise.
The user can specify an upper limit to the degree of interactions to be considered by
MARS. We recommend that the following series of models should be examined:
additive (main effects only)

2-way interactions
3-way interactions
4-way interactions, etc.
Illustrative results from such a series of models are displayed below:
Model GCV-R2 #BasisFuns Nave -R2

Main Effects .801 7 .841
2-way interact .835 10 .880
Based on performance and judgment, the best model should then be selected from this
series. We have also experimented with combining the best basis functions from several
MARS runs, for example, combining the best set from a no-interactions model with the
best set from a model allowing two-way interactions. In some situations, selecting the
best subset of regressors from the pooled set of candidates can yield better models. See
Chapter 4, Setting Control Parameters & Refining Models, for further guidance on model
building.
Testing, Validation and Protection Against Overfitting

Like CART, MARS uses a strategy of deliberately overfitting a model and then pruning
away those parts that contribute least to the overall model performance. For this strategy
to work effectively, the model must be allowed to grow to at least twice the size of the
optimal model. In the examples we have discussed thus far, the best model includes
about 12 basis functions. To capture a near-optimal specification in the Boston Housing
and Cell Choice examples, we allowed MARS to construct 25 basis functions.
Once the maximum number of basis functions has been added to the overfit model,
MARS begins pruning using the following deletion procedure:
1) starting with the largest model, MARS determines the ONE basis function which,
using a residual sum of squares criteria, hurts the model the least if dropped;
2) after refitting the now-pruned model, MARS again identifies a basis function to
drop;
3) this process is repeated until all basis functions have been eliminated.
The end result of this deletion procedure is a unique sequence of candidate models. If
there are 25 basis functions, then there are at most 25 candidate models. An alternative
to this process would be to consider all possible subset deletions, but this would be
computationally burdensome and carries a high risk of overfitting.
On a nave R2 criterion the largest model will always be best. To protect against overfitting,
MARS uses a penalty to adjust the R2. The penalty is similar in spirit to AIC (Akaike
Information Criterion), but is determined dynamically from the data.
MARS automatically determines the order in which the terms are dropped. In classical
modeling, of course, there are no restrictions on the order of deletion and the analyst uses
the t-test and F-test to make such judgments.
How many degrees of freedom in a MARS model?

A MARS basis function is not like an ordinary regressor; rather, the basis function is found
via intensive search procedures. Every distinct data value may have been checked as a
possible knot location. An effective degrees of freedom measure is used to take this
exhaustive search into account.
Based on the results of his experiments, Friedman suggests that the degrees of freedom
charged per knot should be between 2 and 5. Our research, however, suggests that this
factor needs to be much higher for data mining applications and moderately higher for
market research problems. A reasonable range for the degrees of freedom is between 10
and 20 for data sets of modest size (e.g., 1,000 records with 30 variables) and between 20
and 200 for data sets typically encountered in data mining (e.g., 20,000 records and 300
variables). See Chapter 4 for further discussion of this issue.
37
Generalized Cross Validation: GCV Measure of Mean Square Error

The optimal MARS model is the one with the lowest GCV (generalized cross-validation)
measure. The GCV criterion, introduced by spline pioneer Grace Wahba (see Craven and
Wahba, 1979), actually does not involve cross validation:
= - -
=
C(M) is the cost-complexity measure of a model containing M basis functions. Note that
C(M)=M is the usual measure used in linear regression. The MSE is calculated by
dividing the sum of squared errors by N-M instead of by N.
The GCV formula enables C(M) > M; in other words, enables charging each basis function
with more than one degree of freedom.
Impact of the DF Charged Per Basis Function

The degrees of freedom charged per basis function (or knot) does not affect the forward
stepping of the MARS procedure; regardless of the DF setting, the basis functions and
the knot locations will be identical.
The impact of the DF setting is on the final model selected and in performance measures
such as GCV. The higher the DF setting, the smaller the final model and, conversely, the
smaller the DF setting, the larger the model.
Lets reexamine the Boston Housing main effect model.
BasFn(s) GCV IndBsFns EfPrms Variable Knot
0 84.754 0.0 1.0

2 1 28.021 2.0 3.0 LSTAT 6.070
4 3 19.440 4.0 5.0 RM 6.431
6 5 17.229 6.0 7.0 DIS 1.425
8 7 14.977 8.0 9.0 CRIM 11.160
9 14.044 9.0 10.0 PT 12.600
11 10 13.418 11.0 12.0 TAX 337.000
13 12 13.115 13.0 14.0 INDUS 25.650
15 14 12.923 14.0 15.1 RM 8.040
17 16 12.816 15.0 16.1 TAX 224.000
19 18 12.681 16.0 17.1 CRIM 0.382
21 20 12.569 18.0 19.1 AGE 98.800
23 22 12.462 19.0 20.1 CRIM 0.825
25 24 12.353 20.0 21.1 AGE 97.300
When the DF is set to 1, some basis functions will be dropped. In this case, basis
functions numbered 4, 15, 17, 19, and 23 are deleted. BF4 and BF15 are dropped because
the slope is truly 0 for RM<=6.431. BF17 is dropped because a mirror image TAX basis
function, BF11, is included in the model. Finally, BF19 and BF23 are dropped because
they are redundant--the CRIM mirror image is already included.
How Final Model Size Varies With DF

With a high enough DF setting, a null model is selected. (Note that this is just like CART
with a high enough penalty on child nodes, the maximal tree will be pruned all the way
back to the root node.) By judiciously choosing the DF, you can get almost any size
model but the model will be one of the models from the sequence determined in the
deletion stage.
The table below reports the size of the final model selected by MARS for the Boston
Housing data set when all variables are allowed into a main effects model:
DF setting Number BFs Variables (8 is max)

500 0 0
400 1 1
100 2 2
50 4 3
25 6 5
15 7 6
5 16 8
1 19 8
The df charged per knot is clearly vital, so how should you decide on the optimal setting?
You can specify the setting manually or use one of the two automated testing methods:
1) random selection of a portion of the data for testing, and

2) genuine cross validation (with a default set to 10-fold).
Manually setting df is reasonable at two junctures: at the beginning of an analysis or

during exploratory work. When using automated approaches, the subset reserved for
training is used to generate the basis functions and estimate the model. Then, the test or
cross-validated data are used to determine which model is best. Note that these three
different approaches are likely to yield different sized models. See Chapter 4 for additional
discussion.
39
Chapter 4: Setting Control Parameters & Refining Models
T his chapter provides practical advice on setting MARS control parameters and guidance
on how to refine your MARS models.
MARS models can be shaped and refined using the following techniques, each of which
can influence the final model:
n changing the number of basis functions generated in the forward stage
n forcing variables into the model
n forbidding transformation of selected variables
n placing a penalty on the number of distinct variables (in addition to the number of
basis functions)
n specifying a minimum distance (or minimum span) between knots
n allowing select interactions only
n modifying MARS search intensity
n manually selecting a model other than the optimal model from the selector
Maximum Number of Basis Functions to Allow

In the first stage of model development, MARS constructs an overly large model by
progressively adding new basis functions or splines (new main effects or interaction terms).
How far MARS carries this first stage is user-specified via selection of the maximum
number of basis functions. The default setting is a maximum of 15 basis functions, a
rather low limit. You will almost always want to set this limit much higher.
A rule of thumb is that this maximum should be at least two to four times the size of the
truth; thus, if previous experience suggests that a robust model has approximately 35
predictors, the maximum number of basis functions should be set to at least 70 and more
likely 100.
The larger this maximum is set, the longer a MARS run will take. MARS will attempt to
create as many basis functions as are allowed even if there is no sensible way to create
that many. You should specify the maximum number judiciously, as MARS will take your
limit literally.
The limit should be reassessed when you increase the maximum number of allowable
interactions. A main effects model can only search one variable at a time so the number
of possible basis functions is limited by the number of distinct data values; however, a
two-way interaction model has many more possible functions. To ensure that both
interactions and main effects are properly searched, the maximum number might need to
be increased.
40 Chapter 4: Setting Control Parameters and Refining Models
Maximum Basis Functions in Data Mining Applications

The number of basis functions needed in an optimal model will depend on how fast the
function is changing slope and how much the function slope changes over its entire range.
The faster the change, the more knots required to track it.
In data mining problems, complex interactions must be permitted; thus, it is reasonable

to allow literally hundreds of basis functions in a forward search. A very quick way to get
a ball park estimate is to first run a CART model (given CART will run faster than MARS)
and then allow twice as many basis functions as terminal nodes in the optimal CART tree.
In our data mining experience we have rarely allowed for more than 250 basis functions.
Forcing Variables into the Model

At present, there no simple way to force variables into a MARS model. An indirect way,
however, can be used to force a variable into a MARS model linearly: regress the target
variable on the variable you wish to include and then use the residuals of the regression
equation as the new target variable. That is, run a linear regression:
y = constant + bZ
and save the residuals, e. Then use e as the target variable in your MARS model and
specify all other variables, including Z, as candidate predictors. Note that Z needs to be
included as a legal 2nd stage regressor to capture non-linearity. Subsequent releases of
MARS will allow direct forcing of user-specified variables.
Forbidding Transformations of Selected Variables

Forbidding transformations of selected variables is equivalent to forbidding knots. If a
variable enters at all, it will have a pseudo-knot at the minimum value of the variable in the
training data. There is no guarantee, of course, that the variable will be kept in the final
model after the backwards deletion stage.
Reasons to forbid transformations of certain variables include:
1) a priori judgment on the part of the modeler, and

2) variable is a score or predicted value from another model and needs to stay linear
for interpretability.
If transformations are forbidden on all variables, MARS will produce a variation

of a stepwise regression. This model can be used as a baseline from which
to measure the benefits of transformations.
41
Penalty on Added Variables

The penalty on added variables causes MARS to favor reusing variables already in the
model over adding new variables. As the penalty is increased, MARS tends to create new
knots in existing variables or generate interaction terms involving existing variables.
The penalty was originally introduced to deal with multicollinearity. Suppose, for example,
that X1, X2, and X3 are all highly correlated. If X1 is entered into the model first and there
is a penalty on added variables, MARS will lean towards using X1 exclusively instead of
some combination of X1, X2, and X3. If the correlation between the variables is quite high,
there will be little lost in fit as a result. The penalty can also be used to encourage
parsimonious models containing few variables, though they might contain many basis
functions.
Minimum Number of Observations Between Knots (Minimum

Span)
By default, MARS allows a knot (i.e., a new hockey-stick basis function) to be generated
at every observed data value; this default allows the MARS regression to change slope or
direction anywhere and as often as the data dictate. To the extent that these knots are
redundant, they will be deleted in the backwards stage.
Consider, for example, a simple model with one predictor variable LSTAT. MARS might
generate a function like that shown below.
60
50
40
MV
30
20
10
0
0 10 20 30 40
LSTAT
The function traces out a very jagged relationship between LSTAT and MV with many sign
changes. If MARS selects such a model, it will have strong support in the data.
Nevertheless, this degree of flexibility may be undesirable in many applications and a
smoother albeit less locally-accurate model might be preferred.
An effective way to restrain knot placement, i.e., to make MARS less locally adaptive, is
to specify a moderately large minimum span or, equivalently, a minimum number of
observations between knots. If, for example, the minimum number is set to 100, there
must be at least 100 observations (as opposed to data values) between knots.
For data mining applications, settings as high as several hundred or more may be appropriate
to restrain the adaptiveness of MARS. Even if true wobbles exist in the data, a high
setting may also be useful as a simplifying constraint.
Allowing or Disallowing Specific Interactions

MARS allows both global control over the maximum degree of any interaction and local
control over any specific pairwise interaction. The global control in the Options and Limits
dialog can be used, for example, to allow 2-way or 3-way interactions. (The default
setting is no interaction.)
If you simply permit MARS to introduce interactions into a model, MARS will consider all
interactions between basis functions constructed from any variables. Depending on your
type of problem, you may wish to forbid some specific interactions. MARS allows you to
either exclude a variable from any interaction or to specify in detail precisely which
interactions are allowed and which are forbidden.
The Interactions dialog (in the Model Setup command center) provides a matrix with all
candidate predictor variables appearing in both row and column headers. Any cell in this
matrix can be set to disallow a local interaction; for example, an interaction between
INDUS and RAD may be disallowed in any context (2-way, 3-way, etc.).
Specific variables can be easily excluded from all interactions by checking the variable
row in the Non-interacting column of the Variables dialog. Thus, for example, INDUS
could be prohibited from interacting with any other variable but interactions involving other
variables in the model be allowed.
Developing a robust MARS model is a sequential process not unlike developing a parametric
regression model. The difference is that MARS does much of the work for you. We
recommend you start by not allowing any interactions; this default will develop a main
effects model that will be the most easily understood. For such a model, MARS will
search for an optimal transform for each variable via its decomposition into basis functions,
and the final model will be a sum of coefficients multiplying single basis functions.
Examine the main-effects modelwhat story does it tell? Is it plausible? How does the
model fit the data? As there is no guarantee that a main-effects model will be adequate for
predictive accuracy or a faithful representation of the true data generation process, it is
necessary to then experiment with higher-order interactions. Allow two- and three-way
interactions to see if the model fit can be substantially improved.
43
To investigate the possibility of needing even higher-order interactions, we recommend

you first run CART. CART models are interaction-only and can contain very high-order
interactions. If the CART model performs well, consider using one of the following two
strategies:
1) Allow MARS to grow a model with the interaction level set to the depth of a
satisfactory CART tree; that is, if the CART tree (after reasonable pruning)
reaches a depth of six for a substantial number of cases, then let MARS
search for up to 6-way interactions
2) Develop a hybrid CART-MARS model in which the CART terminal nodes (which
capture the complex interactions) are entered as eligible predictor variables in a
main-effects MARS model.
The CART-MARS hybrid model is discussed in detail in our white paper on the topic
available on our web site as a set of power point slides.
MARS Search Intensity

A brute force implementation of the MARS search procedure requires running times
proportional to:
pN2M4
where p is equal to the number of variables, N is equal to the sample size, and M is equal
to the maximum number of allowed basis functions. Intelligent programming reduces the
M4 to M3, but this is still a very heavy computational burden.
To reduce compute times further, MARS allows intelligent search strategies that reduce
the running time to a multiple of M2. Once the model has grown to a reasonable size,
speed is gained by not testing every possible knot in every variable. Potential knots that
yielded very low improvements on the last iteration are not reevaluated for several cycles.
(Model performance is not likely to change quickly, especially when the model is already
large.)
Search Intensity or Speed Parameter

The speed parameter (located in the Options and Limits dialog) is set by default to 4, but
can be lowered to 1, 2, or 3 or increased to 5. A speed setting of 1 does almost no
optimization, with exhaustive searches conducted before every basis function selection.
On the other extreme, a speed setting of 5 results in a quick and dirty estimation. The
focus of this fast search is narrowed to the best-performing basis functions in previous
iterations.
The results CAN DIFFER if the speed setting is decreased but should be relatively similar.
Given a choice between using a smaller data set and lower speed setting (higher search
intensity) or larger data set and higher speed setting (lower search intensity), we recommend
the latter. The gain from using more training data typically outweighs the potential loss
from a less-thorough search.
Our experience to date also suggests caution when using the highest speed setting. It is
definitely worthwhile to check near final models with lower speed settings to ensure that
nothing of importance has been overlooked.
A speed setting of 1 forces MARS to test every possible knot at every forward step. While
this ensures that MARS will miss nothing in its model building phase it is also extremely
slow on large databases. For real world problems we recommend the speed setting of 4;
it offers judicious search, high speed, and additional protection from overfitting. However,
for small, and especially for artificial data sets, if you wish to ensure that MARS does find
the literally true model a speed setting of 1 is called for.
45
Chapter 5: Reading & Interpreting Model Results

T his chapter is a guide to reading and interpreting MARS results. To help you understand
the model, MARS produces simple graphs displaying the relationship between each
important variable and the target. These 2-D and 3-D graphs ae available for every model
and sub-model you care to examine. The technical basis of a MARS model is a regression
model built on new variables that MARS creates from the original predictors. This
regression model can be automatically applied to new data from within MARS itself or
exported as C-, SAS- or XML/PMML-compatible code.
Basis Function and Model Code

From a production and scoring perspective, the final MARS output is a formula for scoring
a database. The formula looks like:
y = B0 + B1*BF1 + B2*BF2 + ... + Bk*BFk
and can be found in this form in the Basis Function tab on the model summary. The
programming code for creating the basis functions and generating the fitted values can be
cut and pasted into commonly-used statistical packages and database management
tools.
Note that the final model equation, illustrated below for the Boston Housing model reviewed
earlier, specifies which basis functions are used directly in the model. Some basis functions
are used only to create other functions but do not enter the model directly. For example,
BF2 enters only indirectly in the construction of BF7 and BF8.
BF1 = max(0, LSTAT - 6.070);

BF2 = max(0, 6.070 - LSTAT );
BF3 = max(0, RM - 6.431);
BF5 = max(0, NOX - 0.647) * BF3;
BF7 = max(0, TAX - 296.000) * BF2;
BF8 = max(0, 296.000 - TAX ) * BF2;
BF10 = max(0, 1.425 - DIS );
Y = 23.914 - 0.613 * BF1 + 11.594 * BF3 - 224.875 * BF5

+ 0.018 * BF7 + 0.034 * BF8 + 41.816 * BF10;
MODEL MV = BF1 BF3 BF5 BF7 BF8 BF10.
The challenging part of MARS work is determining what needs to go into the model
components. Once defined, the formula can simply be applied to new data. The final
model equation, however, may .not lend itself to easy interpretation. An examination of
the ANOVA and variable importance tables as well as the curve and surface plots will
assist you in understanding the final MARS model.
46 Chapter 5: Interpreting Model Results
After the MARS model is estimated, a Text Report appears in the MARS Report window
and the Results dialog opens. The Results dialog displays a Model Summary Table,
Anova Decomposition Table, Variable Importance Table, Final Model Table, Basis Functions,
Gains and Lift charts, and the 2-D and 3-D curve and surface plots.
While the Text Report will primarily be of use to experts, the history of the foward stepping
may be of use to anyone curious about the model generation process. To access the Text
Report, close or minimize the Results dialog box. A hyper-linked outline of the report
contents is displayed in the left panel. Click on Learn Sample Stats, Forward Knot
Placement, Final Model, ANOVA Decomposition, Variable Importance, Basis Functions
or OLS Predicted Results to view that section of the output in the right panel. Alternatively,
use the vertical scroll bars to browse the output.
You may also want to save a copy of the output as a permanent record of
your analysis. To save the report, select Save Mars Report from the File
menu. In the Save Report to File dialog specify a file name and directory.
By default, the text file will be saved with a .dat extension. You can also
copy and paste sections of the report into another application or to the
clipboard.
Model Summary
The GUI output contains the core summary information you will want to examine. As
shown below for the Boston Housing example, the Model Summary Table gives an overview
of the variables, terms and parameters in the final MARS model and reports both R-
square and mean-square measures.
47
The Model Summary includes:
Target variable Name, min, max, mean and variance of the target
variable
Direct variables Number of variables used to construct basis functions
not counting missing value indicators
Total variables used Total number of variables used to construct basis
functions
Terms in the model Number of coefficients in the MARS model excluding
intercept
Effective parameters Number of effective parameters (based on number of
terms in the model, number of knots, and degrees of
freedom charged per knot)
Nave R2 value for regression equation using final MARS model
Nave-Adjusted Adjusted R2 for naive regression model
GCV R-Square 1 Final-GCV / Initial-GCV
Nave MSE Mean-squared error from regression equation above
MARS GCV Penalized mean-squared error
The Text Report also includes learning sample summary statisticsmean, standard
deviation, number of observations with valid values, and sum for continuous variables and
min, max, 25th, 50th and 75th quartiles for all variables. For a detailed explanation of the
Forward Stepwise Knot Placement Report, see Chapter 3.
ANOVA Table
Because MARS models are made up of pieces of variables or basis functions, the
conventional regression output can be challenging to interpret. Each subinterval of the
piecewise model has its own coefficient, and the larger picture may not be evident from
such a report. To provide another perspective on the model, MARS produces the ANOVA
Table in both the Text Report and in one of the Model Results dialogs.
The ANOVA Table summarizes the model and streamlines the output by aggregating the
basis functions into groups involving the same raw variables. For example, if three basis
functions were generated for AGE, they would be combined together into a single main
AGE effect and the contribution of the group would be reported. Similarly, if two basis
functions involving the interaction of AGE and EDUC were also present in the model,
another entry in the ANOVA table would report the combined contribution of these two
basis functions.
MARS constructs the ANOVA table by first rearranging the final MARS model so that all
basis functions involving any ONE variable are grouped together. In some cases, only one
basis function is used. This means that the variable enters with a possible upper or lower
threshold.
When more than one basis function is involved, a nonlinear transformation involving a
change in slope has been found. This group of functions represents a non-parametric
approximation to the best transformation MARS can find for that variable. Although there
is no theoretical upper-limit on the number of basis functions that might be needed to
properly represent the transformation of a variable, most applications (with the exception
of those involving time series data) will require relatively few basis functions.
The ANOVA table then continues with entries for all basis functions involving the same
PAIR of variables. These represent the optimal transform corresponding to the interaction
of those two variables. A larger number of basis functions is usually required for such
aggregates. Again, there will be one such collection for every distinct pair of variables
appearing in two-way interactions. Thus, if AGE and EDUC interact and AGE and INCOME
interact, each pair will appear in the ANOVA table as a separate entry (or row).
As illustrated below for the Boston Housing example, the first two columns in the ANOVA
table list the basis function collection number and the standard deviation of the collection.
The larger the standard deviation, the greater the contribution to the overall explanatory
power of the model.
The third column (labeled GCV in the text report and Cost of Omission in the GUI table)
displays the contribution of the collection of basis functions as measured by the resulting
loss of fit if that entire collection were to be deleted from the model. The next two columns
list the number of basis functions aggregated into the collection and the number of effective
parameters (or, similarly, the total degrees of freedom charged for the collection). The
final column lists the variable names involved in that entry, one name for main effects, two
for bivariate interactions, three for three-way interactions, etc.
49
Variable Importance Table

To calculate variable importance scores, MARS refits the model after dropping all terms
involving the variable in question and calculating the reduction in goodness of fit. The
least important variable is the one with the smallest impact on the model quality; similarly,
the most important variable is the one that, when omitted, degrades the model fit the
most. The model variables are ranked from most to least important and displayed in the
variable importance table (shown in both the Text Report and the Variable Importance
Dialog). An illustrative table for the Boston Housing example is displayed below.
Final Model Summary

The final pruned model is summarized in the Final Model dialog and in the Final Model
section of the Text Report; the summary includes:
Basis Function Assigned basis function number

Coefficient Regression coefficient for basis function term
Variable Variable used to generate basis function
Parent Variable interacted with (if BF is interaction term)
Knot Value at which function changes direction
Basis Functions
These are the transformed predictors used by MARS. Most basis functions represent a
region of one of the raw variables and there will be as many basis functions for a variable
as there are distinct regions needed in the model. Some basis functions will be missing
value indicators and others will involve interactions between two or more variable regions.
You can easily export the basis functions to a text file by selecting File|Export Rules...
menu, or you can click your right mouse button and select Export.
The exported basis function code is directly compatible with many statistical package
programming languages, as well as with database management tools for generating fitted
values and for forecasting. Contact Salford Systems for products to export to XML/
PMML.
51
Gains Chart
Once you have generated or selected a MARS model the summary reports include Gains,
Lift, and Cumulative Lift Graphs and tables. The training data are scored using the current
MARS model and provide a predicted value for every record. The data are then sorted in
descending order by this score and divided into 10 equal deciles. The Gains table reports
the results, displaying the average predicted and actual target values within each decile.
If you use the selector to choose a different model from the backwards deletion sequence,
the summary reports will all update automatically. You can display summary reports for
any number of sub-models at the same time for easy comparisons.
The three buttons , [Gains], [Lift], and [Cum. Lift], toggle the graph between each of
the three views.
[Gains] [Lift] [Cum. Lift]
Binary Gains Chart

If you specify the dependent variable as Binary, the Gains Chart results tab changes
slightly with the addition of the [Continuous] and [Binary] buttons. These give you the
ability to toggle between treating the target variable as binary, or as continuous. If you
click the [Binary] button, MARS will report on the assumption that the target is either 0 or
1 or a probability between 0 and 1. (See Chapter 5, Variables Dialog, for additional infor-
mation.)
53
The two buttons , [Continuous], and [Binary], toggle the graph between each of the
two views.
[Continuous] [Binary]
Remember that the binary option is a reporting feature; it does not change
the way in which MARS generates its models. Theoretically, it should be
possible to get more accurate models by running a logistic regression
version of MARS. Such a version is planned for future release.
Prediction Success Table

When you specify the dependent variable as Binary, the Prediction Success tab (also
known as the confusion matrix) is added to the Model Results dialog. This tab displays
a summary report showing how MARS is making mistakes (if any).
The table displays the following:
ACTUAL CLASS Class level. Using a default threshold of .5, MARS will con-
sider all target values (actual or predicted) less than .5 to
correspond to non-response (Class 0) and values greater than
.5 to correspond to the response (Class 1).
PREDICTED 0 Number of cases classified as non-response (Class 0) by
Actual Class.
PREDICTED 1 Number of cases classified as response (Class 1) by Actual
Class.
TOTAL CASES Total number of cases in the class.
PERCENT CORRECT Percent of cases for the class that were classified correctly.
Binary Threshold Table

At the time of model setup, if you specify the dependent variable as Binary, and request a
Threshold Table, both the Gain Chart and the Prediction Success tabs include a binary
Threshold slide control as seen here.
Like any regression model MARS predicts a continuous score. For the binary target
variable the score will be like a probability, but some scores (for the least likely outcomes)
may be negative, and others (for the most likely outcomes) may be greater than 1. To
convert a MARS model score to a class assignment we have to decide how high the
MARS score needs to be to count as a 1 (ie. yes or response). We accomplish this
conversion by selecting a threshold. For example, we might specify that a score greater
than 0.5 counts as a response and anything else as a non-response; this is the default in
MARS. Frequently however, a different threshold would be beneficial. For example, a low
threshold such as .3 would end up classifying many more records as a response than
would the 0.5 threshold. For some low-cost marketing campaigns, approaching pros-
pects with a modest chance of response might still be quite profitable. A low threshold
might also be used if the costs of missing a potential response far outweigh the costs of
misclassifying a non-responder as a responder. While changing the threshold from .5
might increase the absolute number of mistakes made, great benefits might be obtained
from using a different threshold.
Even if the costs of misclassification are not very different between responders and non-
responders, you may still want to tune the balance between the errors in each class. The
threshold table is designed to help you decide on a threshold. For each of the 100
thresholds between 0.00 and 1.00 the table displays the breakdown of the training data
55
among true negatives, false negatives, false positives, and true positives. Studying the table
can help you identify the best cutoff for overall classification accuracy and help you trade off
one type of error for the other.
The portion of the Threshold Table shown above displays the threshold values of .40 to
.50. You can see that as the threshold is decreased from .50 down to .40, the % Correct
Overall increases. If you look more closly, the % Correct 1 is increasing faster than the
%Correct 0 is decreasing.
Two- and Three-Dimensional Curve and Surface Plots

Exportable two- and three-dimensional plots, viewed by clicking on [Curves and Surfaces]
in the Model Results dialog, illustrate which variables MARS used to construct the final
model and the type of transformation needed to obtain the best fit. The 2-D line graphs
depict the relationship between the target variable and a single predictor variable. Low
resolution 2-D plots are also produced in the text report.
Three-dimensional, rotatable surface (or contour) plots depict the relationship between a
pair of predictor variables and the target variable. In addition, you can enable Mesh,
Shaded, Contour and/or Zones by clicking [Mesh], [Shaded], [Contours] and [Zones].
To rotate the 3-D plots, click the rotation buttons in the lower-right corner.
An example 2-D and 3-D plot from the Boston Housing example discussed earlier are
displayed below.
If you instruct MARS not to use any interactions in the model, the graphs will be 2-D
graphs. If two-way but not higher-order interactions are permitted, the graphical output is
likely to be a mixture of 2-D and 3-D plots. In those rare cases in which MARS cannot find
any valid predictive model, no graphical output is produced. Similarly, if the final model
contains three-way and higher-order interactions exclusively, no graphs are produced.
(Future releases will allow you to display 2-D and 3-D slices from the higher-dimensional
interactions.)
In most business applications, such as predicting response rates to a direct mail offer,
transforms are typically depicted as one of the following six types:
1) single change in slope

2) minimum threshold before variable has effect
3) upper bound beyond which variable has no further effect
4) ramp function where variable has no effect below minimum threshold, then rises
to an upper bound beyond which it has no further effect
5) U-shape change in direction
6) V-shape change in direction
More complex functions are of course possible, but the simpler transforms are both more
likely and more plausible in business applications. In contrast, in physical science
applications, the true relationship between the target variable and its predictors can be
extraordinarily complex; MARS is capable of tracking such functions regardless of the
number of twists and turns involved.
57
Exporting and Printing 2-D and 3-D Plots

The 2- and 3-D graphs can be printed directly from MARS or exported as windows metafiles
(.wmf), jpeg, bitmap (bmp), or portable network graphics (png). To print the plots, select
Print from the File menu when the Curves and Surface dialog is active. You can also
change the page setup options prior to printing by first selecting Page Setup from the
File menu.
To export a 2- or 3-D graph, click on the graph (which will activate a black box around the
graph indicating that it has been selected) and select Export Graph... from the File
menu. As illustrated below, in the Export Graph dialog, select one of the five graphical file
format options from the drop-down menu, enter a file name and directory location, and
click [Save].
The MARS Model Selector

MARS develops its models by conducting a forward-stepping search in which a deliber-
ately overfit model is generated. This overfit model contains redundant and unneeded
terms (basis functions) that will hurt performance on new data. The purpose of the
second model development phase is to eliminate all the unneeded terms, leaving a high
performance model. To accomplish this MARS goes through a backward deletion pro-
cess in which it drops one basis function at a time. Starting with the large model ob-
tained at the end of the forward stepping, MARS does a test deletion of every basis
function. The basis function which is least costly to drop is the first to go. This process is
repeated, dropping one basis function at a time, until the largest model has been trimmed
back to only a constant. Each backward step is evaluated on the GCV (genalized mean
square error ) criterion. After the entire sequence of backward steps is completed MARS
selects the one model with the lowest GCV.
A key parameter governing this process is the Degrees of Freedom penalty applied to knots.
Typically, the higher the penalty the smaller the final model will be. MARS allows you to
manually set the DF penalty, or to estimate it via a testing procedure (cross validation or
random setaside of records for test). Regardless of how the DF penalty is set, there will be
room for uncertainty regarding the best MARS model, as we cannot know the optimal DF
penalty with certainty.
In MARS 1.0, if you leaned towards a smaller model than the one automatically selected
by MARS for any reason, your only option was to increase the DF penalty and rerun the
analysis. This was a trial and error process because there was no way of knowing how
much smaller a model would become once the DF penalty was increased. MARS 2.0
makes selection of a model from the backwards deletion sequence a simple matter of
clicking on a row in the model selector.
To illustrate this, open the BOSTON.SYD data and set up a model with MV as the target
variable and all other variables as predictors. Note that in the bottom right hand corner of
the model dialog you see an [All Models] and a [Best Model] button. Click on [All
Models] and you will see the selector displayed below after MARS completes its analy-
sis.
The selector lists every model MARS identified in the backwards-stepping procedure (15
models in this example). The first column in the selector lists the number of basis func-
tions in the model, which decline from 15 to 1 by one basis function per row. The second
column lists the number of variables (not basis functions) being used in the model. A
model with many basis functions could easily be built up out of just a few variables if each
variable contained many knots. By checking this column you can see where the model
gets smaller by dropping a variable rather than one of several knots in a variable. The
GCV, the mean-square-error as adjusted by the DF penalty, usually is large for the most
overfit model,declines to a minimum at the MARS optimal model, and then rises again as
the model is trimmed back too far. As noted above, if we knew the DF penalty with
certainty the MARS optimal model would be best, but because we dont there is room for
analyst judgment in model selection.
To select a model with seven basis functions just double click on the 7-basis function row in
the selector, highlight it and then click the [Select] button in the lower right hand corner.
You will now obtain a complete set of MARS reports for the 7-basis function model.
59
Suppose you place great importance on small models with few predictors. The previous
table and graph show that you could prune the model back to four basis functions involving
only four distinct variables to achieve a GCV R-squared of 0.811. If you think the reduction
in model size compensates for the loss of accuracy, or if you simply believe that the
smaller model is likely to be better in a changing environment you can override MARS and
select this model. By double clicking on the row with four basis functions you make this
the new model; all reports will reflect this. Visit any summary tab and you will see that
the results are for the smaller model.
To score new data with a selected model other than the MARS optimal model you will need
to save the basis function code and run it through a data-processing step. See the readme
files that came with your software for further details and breaking news on improvements to
the MARS scoring capabilities.
Saving and Opening Selector Files

MARS allows you to save the Selector to a file and then later reload it. To save a Selector
file, bring the Selector window to the foreground and select Save|Save Selector... from
the File menu. In the Save Selector dialog box, click on the File Name text box to
change the default file name. The file extension is .SLC and cannot be changed. Select
the directory in which the Selector file should be saved and click on [Save].
To open a Selector file you have previously saved, select Open|Selector... from the File
menu. In the Open MARS Selector dialog box, specify the name and directory location of
the .SLC file and click on [Open]. Windows-compatible selector files are produced by all
versions of MARS, including UNIX versions. To view models generated by a UNIX MARS
just download the selector file to a PC, and load using your PC MARS.
Opening a Selector file in subsequent sessions allows you to continue your exploration of
detailed and summary reports for each of the models in the sequence; however, reopening
the file does not reload the model setup specifications in the GUI dialogs. To save your
model setup specifications, save the settings in a command file prior to exiting MARS.
The commands, by default stored in MARSs command log, can be accessed by select-
ing Open Command Log from the View menu (or by clicking the Open Command
Log toolbar icon). To save the command log, select Save from the File menu. To then
reload your setting in the Model Dialog, simply submit the command log. The last set of
model setup commands in the command file appear in the tabbed Model Setup dialogs.
For more on using the Command Log, see the section titled Command Log Control in
Chapter 6.
61
Chapter 6: Hands-On Tour of Graphical User Interface
T his chapter provides a hands-on tour of the graphical user interfacemenus, commands,
and dialogs. You will learn how to set up a MARS analysis, set optional control
parameters, save your MARS models, and apply them to new data.
Getting Started
Most predictive modeling techniques give more satisfactory results if the input data are
properly cleaned. MARS is no exception to this general rule. Data cleaning involves
correcting data entry errors, resolving inconsistencies, and, in particular, capping, removing,
or replacing outliers. Univariate outliers (i.e., outliers relative to a single variable) can be
detected via straightforward descriptive statistics produced optionally in every MARS run.
Multivariate outliers are of less concern and can be detected in CART runs (as outliers are
typically isolated in small terminal nodes). Once the data are clean and have passed
elementary tests of soundness, MARS modeling can begin. (Note: CART is more resistant
to data problems than MARS. If you have serious concerns regarding your data quality
and reliability, be sure to conduct some CART analyses as well.)
To open an input data file:
1. Select Open from the File menu (or click on the Open File icon in the toolbar).
(To change or set default input and output directories, select Options... from the
Edit menu and select the Directories tab in the Options dialog.)
2. In the Open Data File dialog, select the data sets file format and then browse for
the file. MARS for Windows reads over 80 different file formats.
After you open a file, the Model Setup dialog automatically opens. The MARS Report
window appears in the background with Report Contents in the left panel and text output
in the right panel. The initial text output contains the variable names, the size of the file,
and the number of records read.
Setting Up the Model

The Model Setup dialog is the primary control center for conducting MARS analyses.
Commonly used analysis functions are conveniently located in the five Model setup tabs:
Variables, Interactions, Select, Options and Limits, and Testing.
Variables Dialog
The Variables dialog, shown below, allows you to specify target and predictor variables,
specify the weight variable, indicate categorical variables, and identify predictor variables
that should be considered non-transforming and/or globally non-interacting.
62 Chapter 6: Hands-On Tour of Graphical User Interface
n To specify the target variable, highlight the name from the variables list and
click on [Select] in the Target Variable box.
n To request a Binary scoring of the dependent variable, check the Binary

checkbox. Additionally, you can request that a Binary Threshold Table be
generated by checking the Table checkbox in conjunction with Binary.
See the section below titled Binary Target Variables for additional information
on this subject. The resulting tables, dialogs, and graphs generated by these
two options are discussed in further detail in Chapter 5.
n To specify a weighting variable, highlight the variable name and click on [Select]
in the Weighting Variable box.
n To select predictor variables, highlight the variables from the variables list and
click on [Select Predictors]. (Standard Windows conventions can be used to
select more than one predictor at a time.)
n To restrict the list of candidate predictor variables, once specified, double click on
the check box in the Exclude column for the variables you wish to exclude (or
alternatively highlight the variable in the predictor variable list and click on [Remove
Predictor]).
n To select which variables MARS should consider as categorical (rather than

continuous), double click the check box in the Categorical column for each variable
(or click on the box and then click on [Check]). MARS will determine the number
of unique levels for you; any value in the data generates a valid level.
63
n To specify variables which, if they are part of the model, should be entered into
the model as linear functions only (i.e., no transformation), double click the check
box in the Non-transforming column for each variable. This option can be applied
to both continuous and categorical variables.
n To specify variables which, if they are part of the model, should not be allowed to
interact with any other variables, double click the check box in the Non-interaction
column for each variable.
n Note that the Variables tab (and also the other four model setup dialogs) contains
a [Auto-Save Model] button in the lower-left corner. At any point during the
model setup, you can create a special model file in which the MARS model will
be saved for later application to new data. This file will always have a .MDL
extension.
When you first open a database the Variables tab is colored red, meaning that
you are not ready to start an analysis until information on this tab is provided.
Once you have selected your target variable and the candidate predictors you are
ready to begin analysis and the Variables tab turns black; you do not need to
touch any other MARS setting.
We recommend that at the very least you consider how many forward steps
MARS should be allowed to take. Otherwise, you do not need to fuss with any
other settings to get useful results. But please keep reading!
Binary Target Variables

In its current implementation, MARS is a regression procedure, meaning that it always
treats the target variable as continuous. In spite of this, MARS can be used effectively to
estimate binary response models, models in which the target is typically coded as 0 or 1.
MARS prediction for such problems will normally be fractions between 0 and 1 and thus
can be viewed as probabilities. However, nothing prevents the MARS model from predicting
negative values or values larger than 1. Therefore, after reviewing binary response models
the MARS predictors should be viewed as scores rather than literal probabilities.
MARS now contains some new features to assist binary response modeling. To invoke
them, be sure to check the binary box in the model setup dialog (see Variable setup
dialog screen shot above).
If you select the binary target variable option, MARS will generate reports on the assumption
that the target is either 0 or 1 or a probability between 0 and 1. Using a default threshold
of .5, MARS will consider all target values (actual or predicted) less than .5 to correspond
to non-response and values greater than .5 to correspond to response. After classifying
all predicted and actual values of either 0 or 1, a prediction success (confusion matrix)
table is produced cross classifying actual vs. predicted class assignments.
It will often be necessary to adjust the threshold dividing 0 from 1 to obtain sensible
results. Use the slider found in the post-processing dialogs to change the threshold as
needed. A table showing all four cells of the prediction success matrix (true positive, false
positive, true negative, false negative) for each threshold from .01 to.99 can help you
decide quickly on the best threshold setting. Details of the binary results are discussed
in Chapter 5.
Remember that the binary option is a reporting feature; it does not change the
way in which MARS generates its models because the underlying algorithm is
not specifically adapted to the problem. Theoretically, it would be possible to
get more accurate models by running a logistic regression version of MARS.
Such a version is planned for future release.
Interactions Dialog
By default, MARS will consider all possible interactions between selected predictor variables.
The Interactions dialog allows you to limit or restrict pair-wise interactions between any
variable and any other variable in the model. This dialog, accessible only after predictor
variables are selected in the Variables dialog, displays the selected predictor variables in
a matrix, with each predictor appearing as both a row and a column header. Note that if
you checked the Non-interaction box in the Variables dialog for a particular predictor
variable, that particular variable will not appear in the Interactions dialog.
To disallow a pairwise interaction, double-click (or highlight and click on [Uncheck]) the
cell that corresponds to the variable pair you would like to restrict. In the Interactions
dialog displayed above, for example, unchecking the cell corresponding to the variable ZN
in the 2nd row and INDUS in the 4th column would disallow any interaction between these
two variables.
65
Select Dialog
The Select dialog allows you to build a model on a subset of your data. Selection criteria,
which can be specified using any variable in the database, are constructed as follows:
1. Select a variable from the variable list and double-click to add that variable to the
Select text box.
2. Select one of the logical relations by clicking on its radio button.
3. Enter a numerical constant in the Value text box.
4. Click on [Add to List] to accept the constructed critera (similarly, use [Delete
from List] to remove).
5. Repeat the process to create compound selection criteria involving several
variables.
For an example, see the Select dialog above. To analyze records with AGE less than or
equal to 50, double click on AGE, click on =<, enter 50 in the Value box and click on [Add
to List]. For complex subset selection it is easier to use the built-in BASIC programming
language to delete unwanted records. (See Appendix I for details.)
Options and Limits Dialog

The Options and Limits dialog allows you to change the speed factor, specify a penalty for
added variables, set the number of maximum basis functions and/or interactions, and
specify how many records to process.
Speed Setting. As discussed in Chapter 4, the speed acceleration setting controls the
trade-off between speed and accuracy in MARS search for the best model. At each step
in model construction, MARS assesses a potentially very large number of basis functions
for entry into the model. With a high-speed setting (e.g., set to 4 or 5), MARS only
searches the best candidates (the top-ranked candidates from previous steps). The
exact number of candidates that MARS considers is progressively reduced with each
higher-speed setting. Every so often, MARS will search all candidate basis functions and
update its list of top contenders, giving every basis function a chance to be selected.
As discussed in Chapter 4, a speed setting of 5 does risk generation of a

suboptimal model in favor of speed; however, the highest setting is noticeably
faster. We recommend using the default setting of 4 in almost all analyses.
What should you do if you get different models using different speed settings? The final
choice of the model is up to you. Because the speed setting simply influences how
thorough MARS is in its search for the best basis functions, it is possible that a less-
thorough search would yield a better model.
Penalty on Added Variables. Since MARS produces a regression model, undesirable

high collinearity between predictors is possible. To reduce the potential for collinearity
you can set a penalty on new variables, a fractional penalty for increasing the number of
variables (not basis functions) in the model. The penalty affects the forward-stepping
process, producing models containing fewer distinct variables.
The count of variables in the model is entirely distinct from the count of basis functions. A
variable could be represented by just one basis function or by dozens. The added variable
penalty induces MARS to make do with fewer variables but the final model could have
quite a few basis functions.
67
The default setting is no penalty. Other choices, which can be set by clicking on the
appropriate radio button, are moderate (0.05), heavy (0.10), and a user-specified penalty,
which cannot exceed 1.0. The best value depends on the specific situation. You will
probably have to experiment to find the best setting for your problem.
Maximum Basis Functions. The maximum basis functions setting specifies how many
forward steps MARS takes to generate its maximal model. Usually each step adds two
basis functions, so a limit of 40 basis functions would be reached with 20 forward steps.
The default maximum basis functions is 15 and can be modified with the up/down arrows
or by entering a new value. See Chapter 4 for guidance on setting this parameter.
Maximum Interactions. The maximum interactions control limits the highest degree of
interaction MARS can consider. The default setting is 1, which disallows true interactions;
a setting of 2 would allow 2-way interactions, 3 would permit triple products of basis
functions, and so on. The interactions setting can be increased or decreased with the
up/down arrows or by entering a new value. See Chapter 4 for guidance on setting this
parameter.
Number of Records to Process. The number of records to process setting enables you
to analyze a subset of records rather than the entire data set initially read in by MARS
when the file was opened. The default setting of 0 indicates no limit. To impose a limit,
use the up/down arrows or simply enter the record limit in the box.
Minimum Observations Between Knots. The last control in the options and limits tab is
the number of observations required between knots, or window/bandwidth size. By default,
MARS allows a knot to be generated at every observed data value; this default allows the
MARS regression to change slope or direction anywhere and as often as the data dictate.
You can make MARS less locally adaptive by increasing the number of data points required
between knots. Setting this parameter to a value like 20, 50, or 100 can be very useful in
data mining applications. Use the up/down arrow keys to change the setting or to directly
enter a new value.
Testing Dialog
The Testing dialog, displayed below, allows you to control the number of degrees of freedom
charged for knot optimization. The penalty is used to reflect the fact that MARS conducts
an extraordinarily intensive search to find the knots, creating a risk of overfitting. Penalizing
each knot helps MARS select an honest rather than overfit model. The degrees of freedom
can be fixed by you, or automatically computed using either cross validation or an
independent test sample.
The default degrees of freedom penalty setting is fixed at 3. This setting can be changed
to any non-negative number by entering a new value. Setting the penalty higher will tend
to favor a smaller optimal model.
You can ask MARS to use a test method to estimate a degrees of freedom penalty for
your data. If you select cross validation, the default v-fold factor is 10, which means that
MARS will conduct ten extra modeling runs just to select a defensible degrees of freedom
penalty. Cross validation can be quite time consuming and run times increase steadily
with the number of folds. 20-fold cross validation will take at least twice as long as 10-fold,
and 10-fold cross validation will take ten times as long as fixing the degrees of freedom
penalty manually.
If you opt to estimate the degrees of freedom using a test sample, MARS will randomly
set aside approximately every Nth observation. By default, MARS will use about half the
sample for testing, but you can choose 1/3, 1/4, 1/5, etc. The resulting test set is used in
a single test run to estimate the degrees of freedom.
As discussed earlier, the larger the degrees of freedom charged for each knot (or basis
function), the smaller the resulting MARS model. Thus, you can influence the size and
complexity of your model by setting this parameter higher or lower. If MARS is estimating
null models or models containing only a handful of basis functions, you can encourage
larger models by fixing the degrees of freedom parameter at a low number such as 1, 2 or
3. Conversely, if MARS is including too many parameters or knots for your liking, you can
force a smaller number by setting this parameter quite high. Using values as high as 10,
30 or even 200 may be necessary to estimate a model of suitable size.
If you allow MARS to determine the true degrees of freedom parameter, you will
arrive at a very defensible model. Lowering the DF penalty could yield a model
that will not stand up to new data. Increasing the DF will yield a deliberately cut-
back model that omits statistically defensible terms.
69
You always have the option of estimating All Models, which will allow you to
manually select different sized models using the backwards deletion sequence.
Regardless of which model you select manually, however, the official MARS optimal
model will have been determined by the DF penalty. See the discussion of the
MARS Model Selector for more details on manual model selection.
Edit Options
Before computing your model, you have the option of resetting default Report Preferences
settings, the random-number seed (used for cross validation and random test runs), and/
or the default directories. If youre in the Model Setup dialog and need to access this
dialog before computing a MARS model, click on [Continue] before selecting Options
from the Edit menu.
Reporting Dialog
The Reporting dialog allows you to control some classic output details. You can include
summary statistics for all model variables, summary plots, and the use of exponential
notation for values near zero. The default settings can be changed by clicking on the
radio button next to each item. You can change the settings for each MARS run as well
as save new default settings by clicking [Save As Defaults].
Random Number Dialog

The Random Number dialog allows you to set the random-number seed and to specify
whether the seed is to reset to its original value after any model is computed. The
random-number seed is used in cross validation and in randomly holding back data for
testing. Normally the seed is reset to its initial value on start up and after each model is
computed or after data are scored. The seed will not be reset if you click on the Retain
most recent values for succeeding run radio button. With the latter setting, conducting
two identical analyses involving the random number generator could result in somewhat
different results. There is nothing wrong with such an outcomeyou just need to be
aware that it could happen.
Directories Dialog
The Directories dialog allows you to set default locations for input (data, models, commands),
output (models, predicted data files, reports) and temporary files. All input and output files
are initially set to the directory housing MARS and the temporary directory is set to your
machines temporary Windows directory. To change any of the default directories, click
on the [] button next to the appropriate directory and specify a new directory in the
Select Default Directory dialog.
71
Saving a MARS Model

To begin the analysis, click on [Compute Model] or the Compute Model toolbar icon.
You can save the MARS optimal model for future scoring either before or after the model
is computed. To save the model prior to computation, click on [Auto-Save Model]
while viewing any of the Setup dialogs (or click on [Continue] and select Save MARS
Model from the File menu). In the Save Model to File dialog, specify a file name and
directory. By default, the model file will be saved with a .mdl extension.
To save the optimal model after the model is computed, select Save MARS Model
from the File menu to access the Save Model to File dialog and follow the same steps as
above. At present ONLY the optimal model can be saved to an MDL file for scoring. Basis
function code needs to be used to score other manually selected models.
Applying a MARS Model to Data

The MARS model can be applied to new data either directly in MARS or by using the
basis function code. To apply a MARS model to new data from within MARS, open the
data set and then select Apply Data to Model from the Model menu (or click on the
Apply Data toolbar icon) to access the Apply Data to a MARS Model dialog.
Both a data file and a model file must be active and all variables needed to construct the
basis functions must appear in the data file. The predicted values can be saved in one of
over 80 different file formats. To specify the file name and format, click on [Set Output
File] and specify the file name, directory and format in the Save Results to File dialog.
After the open, model and output files are specified, click on [OK] to compute the predicted
responses. The predicted response is saved in a column named ESTIMATE on the output
file. A summary dialog like the following appears:
The summary results dialog displays the data, model and output file names; the number
of records read, the number of records with a missing target variable, and the number of
records used; and the minimum, maximum, mean and standard deviation for the predicted
outcome variable ESTIMATE.
Two additional tabs are found in this results dialog, the Gains tab and the
Prediction Sucess tab. The results information contained in these tabs is
identical to that described earlier in Chapter 5. See the sections titled
Gains Chart and Prediction Success Table for further details.
73
Interactive Example
Lets now walk through a step-by-step example using data extracted from the March
Current Population Survey of 1985. (These data are available in the Sample Data folder
installed with MARS.) The target variable is the log of wage earned at the respondents
current job. The independent variables are years of education (ED), dummy variables for
living in the South (South), race non-Caucasian and Non-Hispanic (NONWH), race Hispanic
(HISP), female (FE), married (MARR), represented by a union (UNION), industry dummies
for manufacturing and construction (MANUF, CONSTR), occupational dummies for
managerial, sales, clerical, service and professional (MANAG, SALES, CLER, SERV,
PROF), years of work experience (EX) and age in years (AGE).
Follow the five steps below to estimate a MARS model with main effects only:
1) From the File menu, select Open. In the Open dialog, select files of type Systat
(.syd) in the Files of Type drop down. Browse for CPS85B.syd and click OPEN.
2) In the Variables tab of the Setup dialog, select LNWAGE as the target variable.
3) Next select ED, SOUTH, NONWH, HISP, FE, MARR, EX, UNION, AGE, MANUF,
CONSTR, MANAG, SALES, CLER, SERV, PROF as candidate predictors (total
of 16 predictors).
4) Specify SOUTH, NONWH, HISP, FE, MARR, UNION, MANUF, CONSTR,
MANAG, SALES, CLER, SERV, and PROF as categorical variables in the
Categorical column.
5) Click Compute Model.
By default, MARS searches for a maximum of 15 basis functions in the first stage of the
model building process and uses a degrees of freedom penalty of 3 in the second stage to
prune the potentially overfit model. The other key default settings include minspan (minimum
number of observations between knots) tuned automatically to the the size of the data set
(the literal parameter setting is 0) and maximum interactions equal to 1 (i.e., no
interactions).
Lets start by examining the Forward Stepwise Knot Placement report shown below.
0 0.279 0.0 1.0

2 1 0.239 2.0 5.0 ED 11.000
4 3 0.214 4.0 9.0 EX 12.000
6 5 0.201 5.0 12.0 FE 10
8 7 0.197 6.0 15.0 UNION 10
10 9 0.194 7.0 18.0 SERV 10
12 11 0.194 8.0 21.0 MANAG 10
14 13 0.192 9.0 24.0 PROF 10
15 0.193 10.0 27.0 SOUTH 10
The constant is always entered into the model first as BF0. The next two basis functions,
BF1 and BF2, are mirror image basis functions for ED with the knot located at education
equal to 11 years. Similarly, the third and fourth basis functions are mirror image basis
functions for EX with the knot at 12 years of work experience. Next, dummies for FE,
UNION, SERV, MANAG and PROF and their complements are entered as BF5-BF14.
The final basis function entered, BF15, is the dummy for SOUTH. Note that the complement
(South=0) is not entered because MARS reached the maximum allowed number of basis
functions.
The final model selected, the one with the lowest GCV score, keeps eight of the 15 basis
functions added in the forward-stepping stage and has a GCV R-squared of 0.316.
The set of final basis functions and the regression equation for this model are shown
below:
BF1 = max(0, ED - 11.000);

BF4 = max(0, 12.000 - EX );
BF5 = ( FE = 0);
BF7 = ( UNION = 0);
BF9 = ( SERV = 0);
BF11 = ( MANAG = 0);
BF13 = ( PROF = 0);
BF15 = ( SOUTH = 0);
Y = 2.202 + 0.071 * BF1 - 0.040 * BF4 + 0.197 * BF5 - 0.222 * BF7

+ 0.170 * BF9 - 0.239 * BF11 - 0.177 * BF13
+ 0.102 * BF15;
75
Only the upper portion of the set of mirror image basis functions for ED (BF1) is retained
in the final model while the reverse is true for EX (BF4). The absence of mirror images
indicates that there is a zero slope for the lower portion of ED and the upper portion of EX,
as shown in the graphs below. Thus, education does not have a positive significant effect
on wages until after 11 years whereas the effect of additional years of work experience
ends at 12 years.
Basis Function 1: ED Basis Function 4: EX
The remaining six basis functions in the model are the complementary basis functions
created for dummy variables FE, UNION, SERV, MANAG, PROF and SOUTH. The
coefficients in the MARS model equation indicate whether each of these has a positive or
negative effect on wages. For example, BF5, an indicator for not female, is positively
related to wages with a coefficient of +0.197. Graphs are not created for categorical basis
functions.
Lets now see if we can improve the model by allowing MARS to include two-way interactions:
1) Open the Setup dialog again by selecting Set Up Model from the Model menu (or
clicking the Set Up Model toolbar icon).
2) In the Options and Limits dialog, increase Maximum Interactions to 2.
3) Click Compute Model.
As shown in the Forward Stepwise Placement report below, the first eight basis functions
added to the model are identical to those added in the main effects model.
0 0.279 0.0 1.0

2 1 0.240 2.0 6.0 ED 11.000
4 3 0.216 4.0 11.0 EX 12.000
6 5 0.203 5.0 15.0 FE 10
8 7 0.200 6.0 19.0 UNION 10
10 9 0.196 7.0 23.0 SERV 10 UNION 7
12 11 0.195 9.0 28.0 ED 12.000 UNION 8
14 13 0.194 10.0 32.0 MANUF 10 ED 1
15 0.193 11.0 36.0 PROF 10 FE 6
Interaction terms do not enter the model until BF9 and BF10; here, MARS has determined
that adding an interaction between SERV and UNION=1 (a main effect already in the
model) results in the greatest reduction in GCV. Note that UNION is interacted with
SERV=1 (BF9) and its complement, SERV=0 (BF10).
The remaining two pairs of basis functions are also interaction terms. Interactions between
a new pair of mirror image basis functions for ED with a knot at 12 and UNION=0 are
entered as BF11 and BF12. Next, interactions between the (ED-11.000)+ spline and the
pair of MANUF dummies enter the model followed by an interaction between PROF and
the complementary dummy for FE.
In the final model, the GCV R-squared is slightly higher and the number of basis functions
remains the same as in the main effects model but now includes five interaction terms.
The final set of basis functions is reproduced below:

BF1 = max(0, ED - 11.000);
BF4 = max(0, 12.000 - EX );
BF6 = ( FE = 1);
BF7 = ( UNION = 0);
BF8 = ( UNION = 1);
BF9 = ( SERV = 0) * BF7;
BF11 = max(0, ED - 12.000) * BF8;
BF12 = max(0, 12.000 - ED ) * BF8;
BF13 = ( MANUF = 0) * BF1;
BF15 = ( PROF = 0) * BF6;
Y = 2.362 + 0.150 * BF1 - 0.044 * BF4 - 0.592 * BF7 + 0.231 * BF9

- 0.089 * BF11 - 0.119 * BF12 - 0.059 * BF13
- 0.216 * BF15;
77
Look at the graphs for the ED interaction terms between UNION and MANUF. Because
the interaction terms involve a categorical basis function, the graphs are only 2-D and
display the effect on the dependent variable ONLY when the dummy condition is met. For
example, the interaction graph for ED and MANUF shows the effect of ED on LNWAGE
when MANUF=0. The small negative slope, -0.059, indicates that the positive effect of
ED on LNWAGE is not quite as strong when MANUF is equal to zero.
BFs 11 & 12. ED * UNION BF 13. ED * MANUF
MARS Report Writer

MARS now includes Report Writer, a report generator, word processor and text editor that
allows you to construct custom reports from result windows and diagrams as well as the
classic MARS output appearing in the MARS Output window.
How do you use the Report Writer? Its easy! One way is to copy certain reports and
diagrams to the Report window as you view the model in the results dialog or Selector
windows.
Once a model has been built, a Model results dialog appears allowing you to explore the
model and its performance with a variety of graphic reports and diagrams. Virtually any
graph, table, grid display, or set of basis functions can be copied to the Report Writer.
Simply right-click the item you wish to add to the Report Writer and it will appear at the
bottom of the Report window.
MARS also produces classic output for those users more comfortable with a text-based
summary of the MARS model and its performance. To add any (or all) of MARS classic
output to the Report Writer window, highlight text in the classic output window, copy it to
the Windows clipboard (Ctrl+C), switch to the Report Writer window and paste (Ctrl+V)
at the point you want text inserted. This way you can combine those MARS elements
you find most usefuleither graphic in nature and originating in the Model results dialog,
or textual in nature from the classic outputinto a single custom Report.
Default Options
In the Set Report Options dialog, the currently-selected reporting items and the Automatic
Report checkbox can be saved as a default group of settings for future MARS sessions
by clicking the [Set Default] button. These default options will then persist from session
to session since they are saved in the MARS.INI file. You may recall these settings any
time with the Use Default button.
Pre-configured Reports
Additionally, MARS can produce a stock report with the click of a button. Select
components of MARS output that are the most useful to you on the Report|Set Report
Options... dialog. The stock report will be the same for all models in the session until you
visit the Set Report Options dialog again. (In addition, the currently-open selectors are
listed and individual ones can be excluded or added to the list that will appear in the report
when Report|Report All is selected.)
You can then generate a stock report for the currently active (i.e., foreground) Model
results window or Selector by choosing Report|Report Current. If the active window is
not a Model results window or Selector window, Report Current will be disabled.
Furthermore, if you have several Selectors and their associated Model result windows
open, you can generate a report for all the models (in the order in which they were built) by
choosing Report|Report All.
The Report|Report All and Report|Summary only work for Selectors (and
their child Model results windows). If you have grown a series of Best
Models, then you must add each ones results to the report separately by
bringing it to the foreground and selecting Report|Report Current.
79
If you want a report to be produced for every model that is built without having to explicitly
request it each time, check the Automatic Report box on the Set Report Options dialog.
From that point on, each model will have a stock report created for it as soon as it is built.
Printing and Saving Reports

Once you have generated a report you can print or preview it by using the Print, Print
Setup and Print Preview options on the File menu.
To save a report to a file, use the File|Save As... option. The contents of the Report
Window can be saved in four formats: Salford Systems Report (.ssr), rich text format (.rtf),
and text or text with line breaks (.txt). The .ssr format is the most compact but can only
be read by Salford Systems modules such as MARS. Rich text (.rtf) can be read by most
other word processors and maintains the integrity of any graphics imbedded in the report.
The text formats do not retain graph and diagram images or table formatting.
It is possible to cut and paste to/from the Report Window and other Windows documents,
such as Microsoft Word, Notepad, Wordpad, etc. To select the entire report quickly and
drop it into another Windows application, use Ctrl-A (shortcut for Edit -> Select All),
Ctrl+C (copy to clipboard), move to the other application and paste.
The Data Viewer

If you have opened your dataset using DBMS/COPY, MARSs Data Viewer allows you to
view (but not edit or print) the data as a spreadsheet handy for investigating data anomalies
or seeing the pattern of missing values. Since the Data Viewer is a facility of DBMS/
COPY it is only available for data files that are opened using the DBMS/COPY translators.
To ensure this, remember to have File|Use DBMS/COPY checked.
The Data Viewer window is opened by selecting the View|View Data menu item or
clicking on the View Data toolbar icon (it looks like a little spreadsheet).
Only one data file can be displayed at a time.
81
On-line Help
The Help menu provides comprehensive on-line information concerning MARS menus,
commands, BASIC programming, and frequently-asked questions.
For an outline of the topics in the help file, select Index from the Help menu. Place the
mouse pointer over the topic of interest and press enter. A discussion of the topic is
displayed on the screen. To print the topic, click Print.
Alternatively, select Help Topics to see a detailed list of index entries. Type the first few
letters of the word youre looking for or use the scroll bar to review the list. For further
instructions on using on-line help, select Using Help from the Help menu.
The About MARS for Windows selection displays information about the version number,
preprocessor and tree-building work space, available disk space and free memory.
83
Chapter 7: Command-Line Control and Batch Mode
T his chapter describes the situations in which a Windows user may want to take
advantage of the two alternative modes of control in MARS, command-line and batch,
and provides a guide to using these two control modes. For users running MARS on a
UNIX platform, this chapter contains a detailed guide to command syntax and options and
describes how the Windows version may assist you in learning the command-line language.
Alternative Control Modes in MARS for Windows

In addition to controlling MARS with the graphical user interface (GUI), you can control
the program via commands issued at the command prompt or via submission of a command
(.cmd) file. This built-in flexibility enables you to avoid repetition, create an audit trail, and
take advantage of the BASIC programming language.
Avoiding Repetition You may need to interact with several dialogs to define your model
and set model estimation options. This is particularly true when a model has a large
number of variables or many categorical variables, or requires that more than just a few
options be set to build the desired model. Suppose that a series of runs are to be
accomplished, with little variation between each. A batch command file, containing the
commands that define the basic model and options, provides an easy way to perform
many MARS command functions in one user step. For each run in the series, the core
batch command file can be submitted to MARS, followed by the few graphical user interface
selections necessary for the particular run in question.
Creating an Audit Trail The Command Log window can help you create an audit trail when
one is needed. Imagine not being able to reproduce a particular analysis track, perhaps
because the specific set of options used to create a model (e.g., the name of the data set
itself) was never recorded. The updated command log provides you with the entire
command set necessary to exactly reproduce your analysis, provided the input data do
not change.
Taking Advantage of MARS Built-In Programming Language MARS offers an integrated

BASIC programming language that allows the user to define new variables, modify existing
variables, access mathematical, statistical and probability distribution functions, and define
flexible criteria to control case deletion and the partitioning of data into learn and test
samples. BASIC commands are implemented through the command interface, either
interactively or via batch command files.
Small BASIC programs are defined near the beginning of your analysis session, after you
have opened your dataset but before you estimate (or apply) the model and usually before
defining the list of predictor variables. BASIC is powerful enough that in many cases
users do not need to resort to a stand-alone data manipulation program. See Appendix I
for more on BASIC.
84 Chapter 7: Command-Line Control and Batch Mode
Command-Line Mode
Choosing Command Prompt from the File menu allows you to enter commands directly
from the keyboard. Switching to the command-line mode also enables you to access the
integrated BASIC programming language. See Appendix I for a detailed description of the
BASIC programming language.
Command Log
Most GUI dialog and menu selections have command analogs that are automatically sent
to the Command Log and can be viewed, edited, resubmitted and saved via the Command
Log window. When the command log is first opened (by selecting Open Command
Log from the View menu), all the commands for the current MARS session are displayed.
Subsequently, by selecting Update Command Log from the View menu, the most
recent commands are added to the Command Log window.
After computing a MARS model, the entire set of commands can be archived by updating
the command log, highlighting and copying the commands to the clipboard (or saving
directly to a text file), then pasting into your text application. Alternatively, you can edit
the text commands, deleting or adding new commands, and then resubmit the analysis
by selecting either Submit Window or Submit Current Line to End from the File
menu.
Creating and Submitting Batch Files

The MARS Notepad can be used to create and edit command files. From the Notepad,
you can submit part or all of an open file. To submit a section of the command file, move
the cursor to the first line of the selected section and select Submit Current Line to End
from the File menu. To submit the entire command file, select Submit Window from the
File menu (or click on the Submit icon in the toolbar). After you submit the file, the
analysis proceeds as if you had clicked on Compute Model in the GUIthe progress
report window appears and, after the analysis is complete, the Results dialog.
To submit an existing batch file, choose Submit Command File from the File menu. In
the Submit a File dialog that appears, specify the ASCII text file from which command
input is to be read and then click on [Submit]. To facilitate multiple MARS runs, the
MARS results are directed only to the MARS report window in text form (i.e., the GUI
Results dialog does not appear).
85
Command-Line Syntax and Options

The current release of MARS for UNIX platforms is entirely command-line driven. Because
the Windows version has command analogs that are automatically displayed in the
Command Log window, users running MARS on UNIX platforms may find it instructive to
start with a Windows version. This will enable you to use the GUI to set up your MARS
models, view and learn the commands in the MARS Command Log, and save the command
file (which you can subsequently submit on the UNIX platform). In this manner, the
command log can be used as a tutor to answer the question: What would the commands
have been to accomplish what I just did in the graphical user interface?
The remainder of this chapter provides example command files for the Boston Housing
examples discussed in prior chapters and precise statements of each commands syntax
and options.
Example Command Files

The following command file was used to generate the basis functions and Forward Knot
Placement report discussed in Chapter 3:
LOPTIONS MEANS = YES, PLOTS = YES

FORMAT = 3/UNDERFLOW
USE BOSTON
KEEP
CATEGORY
LINEAR
ADDITIVE
REGRESSION = OLS
MODEL MV
KEEP INDUS, RAD
CATEGORY RAD
WEIGHT
ESTIMATE
The following command file generates the main effects model discussed in the Construction
of Interactions section of Chapter 3:
LOPTIONS MEANS = YES, PLOTS = YES

FORMAT = 3/UNDERFLOW
USE 'D:\MARS\BOSTON.SYS[SYSTAT]'
KEEP
CATEGORY
LINEAR
ADDITIVE
REGRESSION = OLS
MODEL MV
KEEP INDUS, CRIM, AGE, RM, DIS, TAX, PT, LSTAT
WEIGHT
BOPTIONS SPEED = 4, PENALTY = 0.000001, BASIS = 15,
INTERACTIONS = 1
LIMIT DATASET = 0
ESTIMATE
To generate the interactions model subsequently discussed, simply change:
INTERACTIONS = 1
to:
INTERACTIONS = 2.
87
Command Reference
ADDITIVE
Purpose
The ADDITIVE command specifies variables that can enter the model only as main effects
and not as interactions. A variable specified to enter the MARS model additively is not
allowed to enter into an interaction with any other variable. This applies to both ordinal
and categorical variables.
NOTE: Additive variables are still allowed to interact with missing value indicators. The
latter may be necessary for a variable to enter the model in any form.
Syntax
ADDITIVE <variable>, ...
To reset, issue the command with no variables:
ADDITIVE
This command is mutually exclusive with LINEAR. In other words, a variable should be
listed on one or the other command.
APPLY
Purpose
The APPLY command applies the MARS model to new data. Both a USE file and MODEL
(.mdl) file must be active and all variables in the MARS model must appear in the USE file.
Syntax
USE <INPUT FILENAME>

SAVE <OUTPUT FILENAME> [ / SINGLE ]
RETRIEVE <MODEL FILENAME>
APPLY
89
BOPTIONS
Purpose
The BOPTIONS command allows several advanced parameters to be set.
Syntax
BOPTIONS INTERACTIONS=N, BASIS=N, CRASTER=N, SRASTER=N,

PLOT=LINEAR|CUBIC, MINSPAN=N, SPEED=N, PENALTY=X, HEIGHT=N,
WIDTH=N
INTERACTIONS Sets the maximum allowable number of interactions between variables;

default is 1.
BASIS Sets the basis functions maximum; default is 15.
CRASTER Sets the number of raster points for curves produced in text output.
SRASTER Sets the number of raster points on each axis for surfaces in text
output.
MINSPAN Sets the minimum number of observations between each knot;
default is 0.
SPEED Sets the speed acceleration factor (1-5). Larger values progressively
sacrifice optimization thoroughness for computational speed advantage.
This usually results in marked decrease in computing time with little or
no effect on resulting approximation accuracy (especially useful for
exploratory work). One is no acceleration and 5 is maximum speed
advantage; the default is 4.
PENALTY Sets the fractional incremental penalty for increasing the number of
variables in the MARS model. Sometimes useful with highly collinear
designs as it may produce nearly equivalent models with fewer predictor
variables, aiding in interpretation. Must take on nonnegative values.
The default is 0. 0.05 is a moderate penalty, 0.10 is a heavy penalty.
HEIGHT Sets the height, in lines, of the low-res response plots; allowable values
are 17 to 57, inclusive.
WIDTH Sets the width, in characters, of the low-res response plots; allowable
values are 60 to 110, inclusive.
PLOT Specifies whether MARS should low-res plot the linear or cubic form of
the model.
OLS Specifies whether OLS regression models should be computed in
addition to, or instead of, the MARS model. YES is the default, which
computes OLS statistics for the original predictors as well as basis
functions. ONLY causes MARS to compute OLS model statistics
only. The covariance matrix is available when PRINT=LONG. The
default is OLS=YES.
CATEGORY
Purpose
The CATEGORY command identifies which predictors are categorical. MARS will
determine the number of unique levels for you - each unique value found in the data
generates a valid level. Just list the names of the categorical variables; for example:
CATEGORY RACE, EDUC
Syntax
CATEGORY <var1>, <var2>

91
CDF
Purpose
The CDF command evaluates one or more distribution, density, or inverse distribution
functions at specified values.
Syntax
For cumulative distribution functions the syntax is:
CDF [ NORMAL = z | T = t,dof | F = f,dof1,dof2 |CHI-SQUARE = chisq,dof |

EXPONENTIAL = x | GAMMA = gamma,p | BETA = beta,p,q | LOGISTIC = x |
STUDENTIZED = x,p,q | WEIBULL = x,p,q | BINOMIAL = x,p,q | POISSON = x,p]
To generate density values, use the syntax above with the DENSITY option:
CDF DENSITY [ distribution_name = user-specified-value(s) ]
To generate inverse cdf values, specify an 'alpha' value between 0 and 1:
CDF INVERSE [ NORMAL = alpha | T = alpha,dof | POISSON = alpha,p | F =

alpha,dof1,dof2 | CHI-SQUARE = alpha,dof | EXPONENTIAL = alpha |GAMMA
= alpha,p | BETA = alpha,p,q | LOGISTIC = alpha | STUDENTIZED = alpha,p,q |
WEIBULL = alpha,p,q | BINOMIAL = alpha,p,q ]
CDF NORMAL=-2.16, DENSITY NORMAL=-2.5, INVERSE CHISQ=.8,3

DESCRIPTIVE
Purpose
The Descriptive command specifies what statistics are computed and printed during the
initial pass through the input data. The statistics will not appear in the output unless the
command LOPTIONS MEANS=YES command is issued. By default, the mean, N, SD
and sum of each variable will appear when LOPTIONS MEANS=YES is used. To indicate
that only the N, MIN and MAX should appear in descriptive statistics tables, use the
commands:
DESCRIPTIVE N, MIN, MAX

LOPTIONS MEANS=YES
Syntax
DESCRIPTIVE MEAN=<YES/NO>, N=<YES/NO>, SD=<YES/NO>,

SUM=<YES/NO>, MIN=<YES/NO>, MAX=<YES/NO>,
MISSING=<YES/NO>, ALL
ALL will turn on all statistics and MISSING will produce the fraction of observations with
missing data.
Remarks
Also BOPTIONS MISSING will produce a special report summarizing which variables are
missing most often.
93
ECHO
Purpose
The ECHO Command suppresses output to the screen.
Syntax
ECHO
ESTIMATE
Purpose
Reads the data, chooses the training and test samples (if any) and computes the MARS
model.
Syntax
ESTIMATE [ / ALL]
In most circumstances, MARS will report only the best model. If the ALL option is
specified, however, MARS will generate a full model at every backstep and produce a
model selector from which you can choose any model in the sequence. The process of
estimating a model at each backstep can increase compute time considerably.
If the SEQUENCE command is issued, the ALL option is implicitly in effect even if not
specified.
95
EXCLUDE
Purpose
Specifies a list of predictor variables to be excluded from the model.
Syntax
EXCLUDE<varlist>
<varlist> is a list of variables prohibited from entering the model. All other numeric variables
are candidates for entry into the model..
Examples
The following example excludes ID, SSN, and ATTITUDE from the candidate list of predictor
variables.
MODEL CHOICE
EXCLUDE ID, SSN, ATTITUDE
FORMAT
Purpose
The FORMAT command sets the precision to which most numerical output is printed.
Syntax
FORMAT=<number> [/UNDERFLOW]
Number is a whole number between 0 and 9, inclusive, representing the desired number of
digits to the right of the decimal point. The UNDERFLOW option prints very small numbers
in exponential notation, rather than rounding them off to zero. FORMAT with no arguments
sets the format to its default.
Remarks
The default is FORMAT=3.
Examples
Set the precision to 6, with numbers smaller than .000001 printed in exponential notation:
FORMAT=6/UNDERFLOW
97
HELP
Purpose
For the command-line (non-GUI) version of MARS, the HELP command provides brief, on-
line command descriptions and examples.
Syntax
HELP [<command_name>]
in which:
command_name is one of the MARS commands.
HELP with no arguments will give a command listing.
Remarks
MARS will use the descriptions contained in the files MARS.HLP when the HELP command
is used. These ASCII text files may be edited with a text editor.
The help files must reside in your working directory to be accessible.
Examples
Obtain a description of the BOPTIONS command:
HELP BOPTIONS
Note for GUI users
The Help menu provides comprehensive on-line information concerning MARS menus,
commands, BASIC programming, and frequently-asked questions.
For an outline of the topics in the help file, select Index from the Help menu. Place the
mouse pointer over the topic of interest and press enter. A discussion of the topic is
displayed on the screen. To print the topic, click Print.
Alternatively, select Help Topics to see a detailed list of index entries. Type the first few
letters of the word youre looking for or use the scroll bar to review the list. For further
instructions on using on-line help, select Using Help from the Help menu.
HISTOGRAM
Purpose
HISTOGRAM produces low-resolution density plots.
Syntax
HISTOGRAM <var1> [, <var2> , <var3> , ... / FULL, TICKS | GRID, WEIGHTED,

NORMALIZED, BIG]
The plot is normally a half screen high; the FULL and BIG options will increase it
to a full screen (24 lines) or a full page (60 lines).
TICKS and GRID add two kinds of horizontal and vertical gridding.
WEIGHTED requests plots weighted by the WEIGHT command variable.
NORMALIZED scales the vertical axis to 0 to 1 (or -1 to 1).
Some examples are:
HISTOGRAM IQ / FULL, GRID

HISTOGRAM LEVEL(4-7) / NORMALIZED
Only numerical variables may be specified.

99
IDVAR
Purpose
The IDVAR command lists up to 50 variables that are to be included in the next dataset to
be SAVED. This can be any numerical variable from the file being USEd, and facilitates
merging of the SAVEd data with other files.
Syntax
If every record in your dataset has a unique identifier, say SSN, you could specify:
IDVAR SSN
SAVE WATER
The file WATER.SYS will include the variable SSN in addition to its normal contents.
INTERACT
Purpose
The INTERACT command specifies which variables are or are not allowed to interact in
the model.
Syntax
INTERACT ALLOW | DISALLOW <variable1> [ * variable2 ]
Some examples are:
INTERACT ALLOW = _ALL_ / DISALLOW = GENDER, AGE

INTERACT DISALLOW = _ALL_ / ALLOW = VALUE*REGION, SMOKER*AGE,
CHECK
INTERACT DISALLOW = AGE, VALUE*NIGHT, VALUE*REGION / LIST
101
KEEP
Purpose
The KEEP command specifies a list of independent (predictor) variables.
Syntax
KEEP <indep_list>
in which <indep_list> is a list of potential predictor variables. Independent variables may

be separated by spaces, commas, or + signs.
See the MODEL and EXCLUDE commands for other ways to restrict the list of candidate
predictor variables.
KNOT
Purpose
The KNOT command controls the number of degrees of freedom for (unrestricted) knot
optimization. This can be fixed, or automatically computed via cross validation or a test
set.
Syntax
KNOT CROSS=<N> | TEST=<N> | FIXED=<X>
FIXED directly set the d.o.f. to X, number greater than 0.0.

CROSS use N-fold cross validation to estimate the d.o.f. This will increase
compute time by roughly a factor of N.
TEST randomly set aside every N'th observation. The resulting test set is
used in a single validation pass to estimate the d.o.f. This will increase
compute time by roughly a factor of 2.
103
LIMIT
Purpose
Allows model limitations to be set.
Syntax
The LIMIT command is:
LIMIT DATASET=<n>
DATASET limits the size of the sample used to build the MARS model. MARS extracts
the test sample from the first <n> records, thereby effectively reducing the size of the
learn sample. So, if
LIMIT DATASET=10000
then records 1 -to- 10000 from the USE dataset will be read in and processed.
LINEAR
Purpose
The LINEAR command specifies variables that, if they are part of the model, will enter the
model only linearly. This applies to both ordinal and categorical variables.
Syntax
LINEAR <variable>, ...
To reset, issue the command with no variables:
LINEAR
This command is mutually exclusive with ADDITIVE. In other words, a variable should be
listed on one or the other command. The LINEAR command only applies to ordinal
variables. Any CATEGORICAL variable included in the LINEAR command will be ignored.
The LINEAR command is used to identify those variables which will not be transformed by
MARS via knot selection. MARS implements this by selecting a single knot value of
0.000 for such variables. A variable entering the model linearly (untransformed) can
participate in interactions with other variables.
105
LOPTIONS
Purpose
The LOPTIONS command toggles several logical options on and off.
Syntax
LOPTIONS MEANS=YES|NO, TIMING=YES|NO, NOPRINT,

PREDICTION_SUCCESS=YES|NO, PLOTS = YES|NO / "<plot_character>"
MEANS controls printing of summary stats for all model variables

TIMING reports CPU time on selected platforms
NOPRINT omits node specific output and prints only summary tables
PREDICTION_SUCCESS requests the prediction success table
PLOTS toggles summary plots and allows a user-specified plotting
symbol
To turn an option ON the '=YES' portion is not needed. For example,
LOPTIONS MEANS, TIMING (turn MEANS printing and CPU timing on)
LOPTIONS MEANS=NO (turn MEANS printing off)
MODEL
Purpose
The MODEL command specifies the dependent variable.
Syntax
MODEL <depvar> [ = <indep_list> ] [ BINARY / TABLE ]
in which <depvar> is the dependent variable and <indep_list> is an optional list of potential
predictor variables. If no <indep_list> is specified, all numeric variables are considered
(unless KEEP or EXCLUDE commands are used).
For example:
MODEL DIGIT (all non-character variables used in tree generation)

MODEL WAGE = AGE - IQ , EDUC, FACTOR(3-8) , RACE (selected variables)
MODEL CLASS = PRED(8) + VARA-VARZ + PRED(1-3)
See the KEEP and EXCLUDE commands for another way to restrict the list of candidate
predictor variables.
The BINARY option indicates that MARS should consider the target variable binary in
reporting dialogs. Typically this option is used if the target variable takes on values 0/1 or
1/2.
In addition to what the BINARY option does, the BINARY/TABLE option instructs MARS
to generate an a 100 point table summarizing SENSITIVITY and SPECIFICITY as functions
of a moving threshold for the target.
107
NAMES
Purpose
The NAMES command lists the variables on the data set.
Syntax
NAMES
NEW
Purpose
The NEW command resets all options, as if MARS had been terminated and restarted. It
also clears out any data transformation (BASIC) statements that are in effect.
Syntax
NEW
109
OPTIONS
Purpose
The OPTIONS command displays current environment settings and options, input and
output devices, etc.
Syntax
OPTIONS
Remarks
The OPTIONS command will not list the settings of the LOPTIONS or BOPTIONS
commands.
OUTPUT
Purpose
The OUTPUT command directs output from MARS to the printer, the screen, or a file.
Syntax
OUTPUT [*|<filename>]
Where <filename> is given the default extension .DAT in the OUTPUT command. This
extension default may be overridden by placing quotes around the filename and explicitly
giving a file extension.
The default is to send the output to the console, OUTPUT *. Output will still appear on the
screen as it is being sent to a file.
Examples
Send output to the console (screen):
OUTPUT*
Send output to file RESULTS.DAT:
OUTPUT RESULTS
Send output to file RESULTS.PRN:
OUTPUT "RESULTS.PRN"
111
PAGE
Purpose
The PAGE command lets you choose wide or narrow output format.
Syntax
PAGE WIDE | NARROW
WIDE forces output to fit in 132 columns. NARROW forces output to fit in 80 columns.
PRINT
Purpose
the PRINT command controls the printing of output.
Syntax
PRINT=LONG|SHORT
LONG will result in a small amount of additional output to be produced if cross validation
is used. Also, when OLS coefficients are estimated (with BOPTIONS OLS=YES or
BOPTIONS OLS=ONLY) then PRINT=LONG will cause the covariance matrix of the
coefficients to be printed. Often this is not needed; therefore, the default is PRINT=SHORT.
Remarks
The default is PRINT=SHORT.

113
QUIT
Purpose
The QUIT command, when typed from MARS, terminates MARS and returns to the operating
system. When typed from ESTIMATE or APPLY, ends ESTIMATE or APPLY phase and
returns to MARS.
Syntax
QUIT
REGRESSION
Purpose
The REGRESSION command selects what type of regression is performed.
Syntax
The REGRESSION command is:
REGRESSION = OLS
MARS carries out OLS regression, using Salford Systems 2SLS algorithms after a MARS
regression. OLS results are presented for regressing the final basis functions against the
original dependent variable along with the standard R-squared statistics.
A future release of MARS will also carry out logistic regression using code from Salford
Systems LOGIT program.
115
REM
Purpose
The REM command allows comments to be inserted in the command stream. It causes
no action by the program.
Syntax
REM <text>
The text is optional and no quotes are required.

RETRIEVE
Purpose
When applying data to a MARS model, the RETRIEVE command specifies which file
stores the model information.
Syntax
RETRIEVE <filename>
For example, the following estimates a model and stores it in MYMODEL.MDL. A

subsequent application of a different dataset uses the model in MYMODEL.MDL and
stores the results in an output dataset.
USE LEARN1
MODEL TARGET
STORE MYMODEL
ESTIMATE
USE VALIDAT3
SAVE PREDICT
RETRIEVE MYMODEL
APPLY
117
SEED
Purpose
The SEED command allows you to set the random number seed and to specify whether
the seed is to remain in effect after the MARS model is computed. Normally the seed is
reset to 987654321 on start up and after each MARS model is computed (ESTIMATE) or
applied to new data (APPLY).
Syntax
SEED X RETAIN | NORETAIN
Legal values include integers between 1 and 2147483647. If RETAIN is not specified, the
seed will be reset to 987654321 after the current model is completed.
If RETAIN is specified, the seed will keep its latest value after the model is computed.
SELECT
Purpose
The SELECT command specifies selection criteria for computing a model based on a
subgroup of cases.
Syntax
SELECT <var1> <rel> <# ($)> [, <var2> <rel> <# ($)> <...>]
in which
<var> is a variable on the data set,
<rel> is a logical relation: =, <>, <, >, <=, =<, >=, =>,
<#> is a numerical constant, and
<($)> is a character constant. It must be enclosed in quotes.
Remarks
SELECT may be based on any variable appearing in the data set, whether or not that
variable is involved in the model.
SELECT may not be based on any variables defined on the fly by internal Data
Transformation statements.
Examples
SELECT AGE < 45, INCOME > 20000

119
SEQUENCE
Purpose
The SEQUENCE command specifies a selector file into which the model and selector
information is to be stored. Later, perhaps in another MARS session, the selector file can
be recalled and the selector information viewed in the GUI.
Syntax
SEQUENCE <filename>
Remarks
Using the SEQUENCE command implies that the model at each backstep is to be esti-
mated, as if the ESTIMATE / ALL command had been issued. This can significantly
increase the time required for the model to run.
STORE
Purpose
The STORE command creates files in which MARS model information is saved for later
viewing, printing, or application to new data.
Syntax
STORE <filename>
The model is automatically saved with .mdl extension.

121
SUBMIT
Purpose
The SUBMIT command specifies a file from which command input is to be read.
Syntax
SUBMIT <filename>
Remarks
Filename is given the extension .CMD in the SUBMIT command. This extension default
may be overridden by placing quotes around the filename and explicitly giving a file
extension.
The default is to read commands from the keyboard.

USE
Purpose
The USE command opens a SYSTAT format file for analysis, and lists the variable names
in the file.
Syntax
USE <filename>
Remarks
Filename is given the extension .SYS in the USE command. This extension default may
be overridden by placing quotes around the filename and explicitly giving a file extension.
You will need to enclose lowercase file names in quotes on UNIX platforms. Within
MARS, all non-quoted names are read as uppercase.
123
WEIGHT
Purpose
The WEIGHT command identifies a case-weighting variable.
Syntax
WEIGHT=<variable>
in which variable is a variable present in the USE dataset. The WEIGHT variable can
contain any non-negative real values.
XYPLOT
Purpose
The XYPLOT command produces 2-D scatter plots, plotting one or more y variables
against an x variable in separate graphs.
Syntax
XYPLOT <yvar1> [, <yvar2> , <yvar3> ] * <xvar> [ / FULL, TICKS | GRID,

WEIGHTED, BIG]
The plot is normally a half screen high: the FULL and BIG options will increase it
to a full screen (24 lines) or a full page (60 lines).
TICKS and GRID add two kinds of horizontal and vertical gridding.
WEIGHTED requests plots weighted by the WEIGHT command variable.
NORMALIZED scales the vertical axis to 0 to 1 (or -1 to 1).
Some examples are:
XYPLOT IQ*AGE / FULL, GRID

XYPLOT LEVEL(4-7)*INCOME / NORMALIZED
XYPLOT AGE,WAGE,INDIC*DEPVAR(2) / WEIGHTED
Only numerical variables may be specified.

125
Appendix I. BASIC Programming Language

MARS, CART, LOGIT, SURVIVAL, CTM, ASCII, and other Salford Systems modules
contain an integrated implementation of a complete BASIC programming language for
transforming variables, creating new variables, filtering cases, and database programming.
Because the programming language is directly accessible anywhere in MARS, you can
perform a number of database management functions without invoking the data step of
another program.
The BASIC transformation language allows you to modify your input files on the fly while
you are in an analysis module and to save permanent copies of your changed data in
ASCII. We expect users will find that they can accomplish almost any required data
manipulation involving a single data file.
Although this integrated version of BASIC is much more powerful than the simple variable
transformation functions sometimes found in other statistical procedures, it is not meant
to be a replacement for more comprehensive data steps found in general use statistics
packages. At present, integrated BASIC does not permit the merging or appending of
multiple files, nor does it allow processing across observations. In Salford Systems'
statistical analysis packages, the programming work space for BASIC is limited and is
intended for on-the-fly data modifications of 20 to 40 lines of code (though custom large
work space versions will accommodate larger BASIC programs). For more complex or
extensive data manipulation, we recommend you use the large work space for BASIC in
ASCII or your preferred database management software.
The remainder of this appendix describes what you can do with BASIC and provides
simple examples to get you started.
126 Appendix I: BASIC Programming Language
Getting Started
Your BASIC program will consist of a series of statements which all begin with a % sign.
These statements could comprise simple assignment statements that define new variables,
conditional statements that delete selected cases, iterative loops that repeatedly execute
a block of statements, and complex programs with the flow control provided by GOTO
statements and line numbers. Thus, somewhere before a HOT! Command such as
ESTIMATE or RUN in a Salford module, you might type:
% LET BESTMAN = WINNER

% IF MONTH=8 THEN LET GAMES = BEGIN
% ELSE IF MONTH>8 LET GAMES = ENDED
% LET ABODE= LOG (CABIN)
% DIM COLORS(10)
% FOR I= 1 TO 10 STEP 2
% LET COLORS(I) = Y * I
% NEXT
% IF SEX$="MALE" THEN DELETE
The % symbol appears only once at the beginning of each line of BASIC code; it should
not be repeated anywhere else on the line. You can leave a space after the % symbol or
you can start typing immediately; BASIC will accept your code either way.
Our programming language uses standard statements found in many dialects of BASIC.
These include:
LET
Assigns a value to a variable. The form of the statement is:
% LET variable = expression
IF...THEN
Evaluates a condition, and if it is true, executes the statement following the THEN. The
form is:
% IF condition THEN statement

ELSE
Can immediately follow an IF...THEN statement to specify a statement to be executed
when the preceding IF condition is false. The form is:

% ELSE statement
127
Alternatively, ELSE may be combined with other IFTHEN statements:

% ELSE IF condition THEN statement
% ELSE IF condition THEN statement
% ELSE statement
FOR...NEXT
Allows for the execution of the statements between the FOR statement and a subsequent
NEXT statement as a block. The form of the simple FOR statement is:
% FOR
% statements
% NEXT
For example, you might execute a block of statements only if a condition is true, as in
%IF WINE=COUNTRY THEN FOR

%LET FIRST=CABERNET
%LET SECOND=RIESLING
%NEXT
When an index variable is specified on the FOR statement, the statements between the
FOR and NEXT statements are looped through repeatedly while the index variable re-
mains between its lower and upper bounds:
% FOR [index variable and limits]

% statements
% NEXT
The index variable and limits form is:
FOR I= start-number TO stop-number [ STEP = stepsize ]
where I is an integer index variable that is increased from start-number to stop-number in

increments of stepsize. The statements in the block are processed first with I = start-
number, then with I = start-number + stepsize, and repeated until I >=stop-number. If
STEP=stepsize is omitted, the default is to step by 1. Nested FORNEXT loops are not
allowed.
DIM
Creates an array of subscripted variables. For example, a set of 5 scores could be set up
with:
% DIM SCORE(5)
This creates the variables SCORE(1), SCORE(2), , SCORE(5).
The size of the array must be specified with a literal integer up to a maximum size of 99;
variable names may not be used. You can use more than one DIM statement, but be
careful not to create so many large arrays that you exceed the maximum number of
variables allowed (currently 8019).
DELETE
Deletes the current case from the data set.
OPERATORS
The table below lists the operators that can be used in BASIC statement expressions.
Operators are evaluated in the order they are listed in each row with one exception: a
minus sign before a number (making it a negative number) is evaluated after exponentiation
and before multiplication or division. The "<>" is the "not equal" operator.
Numeric operators () ^ * / + -
Relational operators < <= <> = >= >
Logical operators AND OR NOT
BUILT-IN SPECIAL VARIABLES

BASIC has five built-in variables available for every data set. You can use these variables
in flow control and create new variables from them. You should NEVER attempt to rede-
fine them or change their values directly.
129
Built-in
Variables Definition Values
CASE observation number 1 to maximum observation

number
logical variable for 1 if beginning of file, 0
BOF
beginning of file otherwise
logical variable for 1 if end of file, 0 otherwise
EOF
end of file
logical variable for 1 if beginning of by-group, 0
BOG
beginning of by-group otherwise
logical variable for 1 if end of by-group, 0
EOG
end of by-group otherwise
MATHEMATICAL AND STATISTICAL FUNCTIONS

Integrated BASIC also has a number of mathematical and statistical functions. The
statistical functions can take several variables as arguments and automatically adjust for
missing values. Only numeric variables may be used as arguments. The general form of
the function is:
FUNCTION(variable, variable, .)
Function Definition Examples
AVG Arithmetic mean % LET XMEAN = AVG(X1,X2,X3)
MAX maximum % LET BEST=MAX(Y1,Y2,Y3,Y4,Y5)
MIN minimum Note: These statistical functions will
MIS number of missing automatically adjust for the presence of
values missing values. Thus, if X1 is
STD standard deviation missing for a case, AVG(X1,X2,X3) is
equal to (X2+X3)/2
SUM summation
Functions Definition Examples

ABS absolute value %LET VAL= ABS(X)
ACS arccosine
ASN arcsine
ATH arc hyperbolic tangent
ATN arctangent
COS cosine
EXP exponential function
LOG natural logarithm %LET LOGXY=LOG(X+Y)
SIN sine
SQR square root %LET PRICESR=SQR(PRICE)
TAN tangent
Integrated BASIC also includes a collection of probability functions that can be used to
determine probabilities and confidence level critical values, and to generate random numbers.
The following table shows the distributions and any parameters that are needed to obtain
values for either the random draw, the cumulative distribution, the density function, or the
inverse density function.
131
Distribution Key- Random Cumulative Comments

Letter Draw (C) ( is the probability for inverse
Density (D) density functions)
Inverse (I)
Beta B BRN BCF(,p,q) = beta value
BDF(,p,q) p,q = beta parameters
BIF(,p,q)
Binomial N NRN(n,p) NCF(x,n,p) n = number of trials
NDF(x,n,p) p = prob of success in trial
NIF(,n,p) x = binomial count
Chi-Squared X XRN(df) XCF(2,df) 2 = chi-squared value
XDF(2,df) df = degrees of freedom
XIF(,df)
Exponential E ERN ECF(x) x = exponential value
EDF(x)
EIF()
F F FRN(df1, FCF(F,df1,df2) df1 and df2 = degrees of freedom
df2) FDF(F,df1,df2) F = F-value
FIF(,df1,df2)
Gamma G GRN(p) GCF(C ,p) p = shape parameter
GDF(C ,p) C= gamma value
GIF(,p)
Logistic L LRN LCF(x) x = logistic value
LDF(x)
LIF()
Normal Z ZRN ZCF(z) z = normal z-score
(Standard) ZDF(z)
ZIF()
Poisson P PRN(p) PCF(x,p) p = Poisson parameter
PDF(x,p) x = Poisson value
PIF(,p)
Studentized S SRN(k,df) SCF(s,k,df) k = parameter
SDF(s,k,df) df = degrees of freedom
SIF(,k,df)
t T TRN(df) TCF(t,df) df = degrees of freedom
TDF(t,df) t = t-statistic
TIF(,df)
Uniform U URN UCF(x) x = uniform value
UDF(x)
UIF()
Weibull W WRN(p,q) WCF(x,p,q) p = scale parameter
WDF(x,p,q) q = shape parameter
WIF(,p,q)
These functions are invoked with either 0, 1, or 2 arguments as indicated in the table
above, and return a single number, which is either a random draw, a cumulative probabil-
ity, a probability density, or a critical value for the distribution.
We will illustrate the use of these functions with the chi-square distribution. To generate
10 random draws from a chi-square distribution with 35 degrees of freedom for each case
in your data set:
% DIM CHISQ(10)
% FOR I= 1 TO 10
% LET CHISQ(I)=XRN(35)
% NEXT
To evaluate the probability that a chi-square variable with 20 degrees of freedom exceeds
27.5:
%LET CHITAIL=1 - XCF(27.5, 20)
The chi-square density for the same chi-square value is obtained with:
%LET CHIDEN=XDF(27.5, 20)
Finally, the 5% point of the chi-squared distribution with 20 degrees of freedom is calculated
with:
%LET CHICRIT=XIF(.95, 20)

133
Missing Values
The system missing value is stored internally as the largest negative number allowed.
Missing values in BASIC programs and printed output are represented with a period or dot
("."), and missing values can be generated and their values tested using standard expres-
sions.
Thus, you might type:
%IF NOSE=LONG THEN LET ANSWER=.

%IF STATUS=. THEN DELETE
Missing values are propagated so that most expressions involving variables that have
missing values will themselves yield missing values.
One important fact to note: because the missing value is technically a very large negative
number, the expression X < 0 will evaluate as true if X is missing.
BASIC statements included in your command stream are executed when a HOT! Command
such as ESTIMATE, APPLY, or RUN is encountered; thus, they are processed before any
estimation or model building is attempted. This means that any new variables created in
BASIC are available for use in MODEL and KEEP statements, and any cases that are
DELETEd via BASIC will not be used in the analysis.
More Examples
It is easy to create new variables or change old variables using BASIC. The simplest
statements create a new variable from other variables already in the data set. For example:
% LET PROFIT=PRICE *QUANTITY2* LOG(SQFTRENT), 5*SQR(QUANTITY)
BASIC allows for easy construction of Boolean variables, which take a value of one if true
and zero if false. In the following statement, the variable XYZ would have a value of 1 if any
condition on the right-hand side is true, and 0 otherwise.
% LET XYZ = X1<.5 OR X2>17 OR X3=6
Suppose your data set contains variables for gender and age, and you want to create a
categorical variable with levels for male-senior, female-senior, male-non-senior, female-
non-senior. You might type:
% IF MALE = . OR AGE = . THEN LET NEWVAR = .

% ELSE IF MALE = 1 AND AGE < 65 THEN LET NEWVAR=1
% ELSE IF MALE = 1 AND AGE >= 65 THEN LET NEWVAR=2
% ELSE IF MALE = 0 AND AGE < 65 THEN LET NEWVAR=3
% ELSE LET NEWVAR = 4
If the measurement of several variables changed in the middle of the data period, conversions
can be easily made with the following:
% IF YEAR > 1986 OR MEASTYPE$="OLD" THEN FOR

% LET TEMP = (OLDTEMP-32)/1.80
% LET DIST = OLDDIST / .621
% NEXT
% ELSE FOR
% LET TEMP = OLDTEMP
% LET DIST = OLDDIST
% NEXT
If you would like to create powers of a variable (square, cube, etc.,) as independent
variables in a polynomial regression, you could type something like:
% DIM AGEPWR(5)
% FOR I = 1 TO 5
% LET AGEPWR(I) = AGE^I
% NEXT
135
Filtering or Splitting the Data Set

Integrated BASIC can be used for flexibly filtering observations. To remove observations
with SSN missing try:
% IF SSN= . THEN DELETE
To delete the first 10 observations type:
% IF CASE <= 10 THEN DELETE
Because you can construct complex Boolean expressions with BASIC, using program-
ming logic combined with the DELETE statement gives you far more control than is
available with the simple SELECT statement. For example:
% IF AGE>50 OR INCOME<15000 OR (REGION=9 AND GOLF=.)

THEN DELETE
It is often useful to draw a random sample from a data set to fit a problem into memory or
to speed up a preliminary analysis. By using the uniform random number generator in
BASIC, this is easily accomplished with a one-line statement:
% IF URN < .5 THEN DELETE
The data set can be divided into an analysis portion and a separate test portion distinguished
by the variable TEST:
% LET TEST= URN < .4
This sets TEST equal to 1 in approximately 40% of all cases and 0 in all other cases. The
following draws a stratified random sample taking 10% of the first stratum and 50% of all
other strata:
% IF DEPVAR = 1 AND URN < .1 THEN DELETE

% ELSE IF DEPVAR<>1 AND URN < .5 THEN DELETE
Advanced Programming Features

Integrated BASIC also allows statements to have line numbers that facilitate the use of
flow control with GOTO statements. Line numbers must be integers less than 32,000,
and we recommend that if you use any line numbers at all, all your BASIC statements
should be numbered. BASIC will execute the numbered statements in the order of the line
numbers, regardless of the order in which the statements are typed. Unnumbered BASIC
statements are executed before numbered statements.
Here is an example of using the GOTO:
%10 IF PARTY=GOP THEN GOTO 96

%20 LET NEWDEM=1
%30 LET VEEP$="GORE"
%40 GOTO 99
%96 LET VEEP$="KEMP"
%99 LET CAMPAIGN=1
137
IFTHEN Statement
Purpose
Evaluates a condition and if it is true executes the statement following the THEN.
The form is:
An IFTHEN may be combined with an ELSE statement in two ways. First, the ELSE
may be simply used to provide an alternative statement when the condition is not true:
% IF condition THEN statement1

% ELSE statement2
Second, the ELSE may be combined with an IFTHEN to link conditions:

% ELSE IF condition2 THEN statement2
To allow multiple statements to be conditionally executed, combine the IFTHEN with a

FOR...NEXT:
% IF condition THEN FOR

% statement
% statement
% NEXT
Examples
To remove outlier cases from the data set:
% IF ZCF(ABS((z-zmean)/zstd))>.95 THEN DELETE

LET Statement
Purpose
Assigns a value to a variable.
Syntax
The form of the statement is:
% LET variable = expression
The expression can be any mathematical expression, or a logical Boolean expression. If

the expressions are Boolean, then the variable defined will take a value of one if the
expression is true, or zero if it is false. The expression may also contain logical opera-
tors such as AND, OR and NOT.
Examples
% LET AGEMONTH = YEAR - BYEAR + 12*(MONTH , BMONTH)

% LET SUCCESS =(MYSPEED = MAXSPEED)
% LET COMPLETE = (OVER = 1 OR END=1)
139
ELSE Statement
Purpose
Follows an IF...THEN to specify statements to be executed when the condition following

a preceding IF is false.
Syntax
The simplest form is:
% IF condition THEN statement1

% ELSE statement2
The statement2 can be another IFTHEN condition, thus allowing IFTHEN statements
to be linked into more complicated structures. For more information, see the section for
IFTHEN.
Examples
% 5 IF TRUE=1 THEN GOTO 20

% 10 ELSE GOTO 30
% IF AGE <=2 THEN LET AGEDES$ = "baby"
% ELSE IF AGE <= 18 THEN LET AGEDES$ = "child"
% ELSE IF AGE < 65 THEN LET AGEDES$ = "adult"
% ELSE LET AGEDES$ = "senior"
DIM Statement
Purpose
Creates an array of subscripted variables.
Syntax
% DIM var(n)
where n is a literal integer. Variables of the array are then referenced by variable name and
subscript, such as var(1), var(2), etc.
In an expression, the subscript can be another variable, allowing these array variables to
be used in FORNEXT loop processing. See the section on the FORNEXT statement
for more information.
Examples
% DIM QUARTER(4)
% DIM MONTH(12)
% DIM REGION(9)
141
FOR...NEXT Statement
Purpose
Allows for the processing of steps between the FOR statement and an associated NEXT
statement as a block. When an optional index variable is specified, the statements are
looped through repetitively while the value of the index variable is in a specified range.
Syntax
The form is:
% FOR [index variable and limits]

% statements
% NEXT
The index variable and limits are optional but, if used, they are of the form
x = y TO z [STEP=s]
where x is an index variable that is increased from y to z in increments of s. The state-

ments are processed first with x = y, then with x = y + s, and so on until x= z. If STEP=s
is omitted, the default is to step by 1.
Remarks
Nested FORNEXT loops are not allowed and a GOTO which is external to the loop may
not refer to a line within the FORNEXT loop. However, GOTOs may be used to leave a
FOR...NEXT loop or to jump from one line in the loop to another within the same loop.
Examples
To have an IFTHEN statement execute more than one statement if it is true:
% IF X<15 THEN FOR

% LET Y=X+4
% LET Z=X-2
% NEXT
DELETE Statement
Purpose
Drops the current case from the data set.
Syntax
% DELETE
% IF condition THEN DELETE
Examples
To keep a random sample of 75% of a data set for analysis:
% IF URN < .25 THEN DELETE

143
GOTO Statement
Purpose
Jumps to a specified numbered line in the BASIC program.
Syntax
The form for the statement is:
% GOTO ##
where ## is a line number within the BASIC program.
Remarks
This is often used with an IFTHEN statement to allow certain statements to be ex-
ecuted only if a condition is met.
If line numbers are used in a BASIC program, all lines of the program should have a line
number. Line numbers must be positive integers less than 32000.
Examples
% 10 GOTO 20
% 20 STOP
% 10 IF X=. THEN GOTO 40
% 20 LET Z=X*2
% 30 GOTO 50
% 40 LET Z=0
% 5O STOP
STOP Statement
Purpose
Stops the processing of the BASIC program on the current observation. The observation is
kept but any BASIC statements following the STOP are not executed.
Syntax
% STOP
Examples
%10 IF X = 10 THEN GOTO 40

%20 ELSE STOP
%40 LET X = 15
145
Appendix II. Further Reading and References

To the best of our knowledge, as of the writing of this Users Guide, the documentation for
Salford Systems MARS software constitutes the only extended discussion of MARS.
MARS is referenced in over 120 scientific publications, dating back to 1994. A list can be
downloaded from our website at http://www.salford-systems.com/MARSCITE.PDF.
Friedmans articles are challenging classics but definitely worth the effort.
Friedman, J.H. (1988), Fitting Functions to Noisy Data in High Dimensions. Proc.,
Twentieth Symposium on the Interface, Wegman, Gantz and Miller, eds., American
Statistical Association, Alexandria, VA, 3-43.
Friedman, J.H. (1991a), Multivariate Adaptive Regression Splines (with discussion), Annals
of Statistics, 19, 1-141 (March).
Friedman, J.H. (1991b), Estimating Functions of Mixed or Ordinal and Categorical

Variables Using Adaptive Splines. Department of Statistics, Standford University,
Technical Report LCS108.
Friedman, J.H. and Silverman, B.W. (1989), Flexible Parsimonious Smoothing and
Additive Modeling (with discussion), Technometrics, 31, 3-39 (February).
DeVeaux et. al. provide examples in which MARS outperforms neural networks:
DeVeaux R.D., Psichogios D.C., and Ungar L.H. (1993), A Comparison of Two
Nonparametric Estimation Schemes: MARS and Neural Networks, Computers
Chemical Engineering, 17, 8.
146 Appendix II: Further Reading and References
Additional References
Belsley, D. A., E. Kuh, and R. Welsch, Regression Diagnostics, New York: Wiley,
1980.
Breiman, Leo, Jerome Friedman, Richard Olshen, and Charles Stone, Classification and
Regression Trees, Pacific Grove: Wadsworth, 1984.
Harrison, D. and D. Rubinfeld, Hedonic Housing Prices and Demand for Clean Air,
Journal of Environmental Economics and Management, 5, 81-102, 1978.
Scott, David W., Multivariate Density Estimation, New York: Wiley, 1992.
Steinberg, Dan and Phillip Colla, CART: Tree-Structured Non-parametric Data Analy-
sis, San Diego, CA: Salford Systems, 1995.
Please visit our web site for updates on new publications about MARS.
http://www.salford-systems.com
147
Index
Symbols C
.SLC 60 CART Report window 61
2-D, 3-D CATEGORY 90
Curves CDF 91
Plots 55 Chapter 1: Introduction 14
classic output 77
Command Log 60, 84
Command Log window 83
A Command-Line Syntax and Options 85
About CART for Windows 81 Command-Line Mode 84
ADDITIVE 87 Cost of Omission 48
Advanced Programming Features 136 Creating Batch Files
Allowing Specific Interactions 42 Submitting Batch Files 84
ANOVA 46, 47 custom reports 77
APPLY 88 cut 80
Applying Data to a MARS Model D
scoring 71
Audit Trail 83 data anomalies 80
Automatic Report 78, 79 Data Viewer 80
DBMS/COPY
linking DBMS/COPY 6
DBMS/Copy 80
B DBMS/Copy translators 80
default options 78
BASIC programming language 83, 125
DF
Basis Function 45
dof 67
Basis Function and Model Code 45
DELETE 128, 142
Basis Functions
DESCRIPTIVE 92
BF 50, 57
DIM 128, 141
batch command file 83
direct variables 47
binary
Directories Dialog 70
gains chart 52
Disallowing Specific Interactions 42
binary target variables
dependent variable E
target variable 63
BOPTIONS 89 ECHO 92
Effective parameters 47
ELSE 126, 139
ESTIMATE 94
examples 73
exporting K
rules 50
Exporting and Printing 2-D and 3-D Plots KEEP 101
57 KNOT 102
Exporting Graphs 57 L
F LET 126, 138
File menu 61 LIMIT 103
Filtering the Data Set 135 LINEAR 104
FOR...NEXT 127, 140 LOPTIONS 105
Forbidding Transformations of Selected M
Variables 40
Forcing Variables into the Model 40 MARS Search Intensity 43
frequently-asked questions 81, 97 Maximum Basis Functions 40, 67
Maximum Interactions 67
G Microsoft Word 80
gains chart 51 Minimum Number of Observations
GCV 47, 58 Between Knots 41
GCV R-Square 47 Minimum Span 41
Getting Started 61 Missing Values 133
Open 61 MODEL 106
GOTO Statement 143 Model Code 45. See also Basis
Graphical User Interface Function
GUI N
Tour 61
Nave MSE 47
H Nave-Adjusted 47
HELP 97 NAMES 107
help 81 NEW 108
Help menu 97 Notepad 80
HISTOGRAM 98 Number of Records to Process 67
hybrid CART-MARS model 43 O
I observations between knots 67
IDVAR 99 Open 61
IF...THEN 126, 137 Open File icon 61
Installation Procedure 5 Open Tree Navigator dialog box 60
Installing and Starting CART 5 OPERATORS 128
INTERACT 100 OPTIONS 109
interactions 56 options 69
Interactions Dialog 64 Options and Limits Dialog 65
OUTPUT 110
149
P STOP Statement 144

STORE 120
PAGE 111 SUBMIT 121
paste 80 System Requirements 5
Penalty on Added Variables 41, 66
prediction success table T
binary 53
target variables 47
PRINT 112
Testing Dialog 67
Pprinting Reports 79
text editor 77
Printing Graphs 57
text output 61
Q Text Report 46
threshold table 54
QUIT 113
U
R
updated command log 83
Random Number Dialog 69 USE 122
REGRESSION 114 Use DBMS/COPY 80
REM 115 Use Default 78
Report Contents 61 Using Help 81, 97
Report Current 78
Report Dialog 69 V
report generator 77
Variable Importance Table 49
report writer 77
Variables Dialog 61
running times 43
version number 81
S view data 80
Save Mars Report 46 W

Save Tree Navigator dialog box 59
WEIGHT 123
saving model 71
weighting variable 62
Search Intensity 43
word processor 77
SEED 117
Wordpad 80
SELECT 118
Select Dialog 65 X
Selector 57-59
Set Default 78 XYPLOT 124
Set Report Options 78
Set Report Options dialog 78
Setting Up the Model 61
Speed Parameter 43
Speed Setting 66
spreadsheet 80
Starting CART for Windows 6
Starting MARS for Windows 6
stock report 78

MARSv 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MARSv 2

Uploaded by

Copyright:

Available Formats

MARS

Salford Systems, 2001

MARS is a trademark of JerIll, Inc and is exclusively licensed to Salford Systems.

CHAPTER 2: INSTALLING AND STARTING MARS ............................................... 5

CHAPTER 3: MARS BASICS ...................................................................... 9

CHAPTER 4: SETTING CONTROL PARAMETERS & REFINING MODELS .................... 39

CHAPTER 5: READING & INTERPRETING MODEL RESULTS .................................. 45

CHAPTER 6: HANDS-ON TOUR OF GRAPHICAL USER INTERFACE ......................... 61

MARS Report Writer ................................................................................................. 77

CHAPTER 7: COMMAND-LINE CONTROL AND BATCH MODE ............................... 83

SEQUENCE ...................................................................................................... 119

APPENDIX I. BASIC PROGRAMMING LANGUAGE ........................................... 125

APPENDIX II. FURTHER READING AND REFERENCES .......................................... 145

Chapter 1: Introduction to MARS

n This is in contrast to a decision tree where a small change in a predictor could

n When scoring a database, regression models typically produce unique scores

n In a regression, it is often possible to read the effect of a predictor variable on

variable selection: separating relevant from irrelevant predictor variables

About this User Guide

Chapter 2. Installing and Starting MARS

Chapter 2: Installing and Starting MARS

READ ME if you are installing MARS as an upgrade to a previous version!

Starting MARS for Windows

MARS for Windows takes advantage of Windows' preemptive multitasking

Preparing Your Data for MARS

Setting up dynamic link to DBMS/COPY

ABOUT HERE MARS EXPECTS A NUMERIC VARIABLE

Character or Text Data

Chapter 3: MARS Basics

The Modelers Challenge

1) Which predictor (independent) variables should be used?

The conventional approach employed in classical statistical modeling is the process of

specifies a plausible model,

Global Parametric Modeling vs. Local Nonparametric Modeling

Median Smooth MV on LSTAT Super Flexible Smooth MV on

The goal of non-parametric modeling is to predict y as a function of X. To estimate the

Kernel Smoothes and Density Estimators

The fitted value of y corresponding to predictor data value X is:

Bias-Variance Tradeoff in Global and Local Modeling

Fatal Flaw in Nonparametric Modeling: The Curse of Dimensionality

A solution is needed that accomplishes the following two criteria:

judicious selection of which regions to look at and their boundaries

MARS Smoothing, Splines and Knots Selection

1) curve segments must join,

MARS Basis Functions

max (0, X -c), or

MV median value of owner-occupied homes in tract (000s)

Variable N Mean Std Dev Minimum Maximum

Proportion per Bar

BF1 = max (0, INDUS-4).

Then, instead of using INDUS in a regression, we use the following function:

y = constant + b1 * BF1 + error.

y = constant + b1 * BF1 + b2 * BF2 + error,

and the effect of INDUS on y is:

0 for INDUS <= 4

MARS: MV on INDUS (1 basis function)

MV = 30.290 - 2.439*(INDUS - 4)+ + 2.215*(INDUS-8) +

We now have the following three basis functions for INDUS:

BF1 = max (0, INDUS - 4),

The regression equation is:

MV = 29.433 + 0.925*(4 - INDUS)+ -2.180*(INDUS-4)+ +1.939*(INDUS-8)+

Mirror-Image Basis Functions

BF1 = max (0, INDUS - 8.140) 35

MV = 30.290 - 2.439(INDUS - 4)+ + 2.215(INDUS-8) +

MV = 29.433 + 0.925(4 - INDUS)+ -2.180(INDUS-4)+ +1.939*(INDUS-8)+

y = B0 + B1BF1 + B2BF2 + ... + Bk*BFk