You are on page 1of 3

Tutorial: Predicting with SAS Enterprise Miner

Matthew Beauregard
March 16, 2005

1 Important note
SAS is a very large, very complex piece of software. Enterprise Miner, while making up only a small part
of the SAS system, is itself very large and complex. Until you understand what you are doing, follow these
instructions carefully and in order. It is very possible to make a mistake from which you cannot recover,
except by building your system all over again.
Note to FIT Linux lab users: apart from not following the instructions, the two most common reasons
for strange errors in SAS are that you are out of free space on the network storage, or that you cancelled
the VMware login box after starting Windows. To check your free space, ssh to charlie and type quota
-v. Also, a helpdesk technician from Technical Services alleges that the Desktop is temporary local storage,
not network storage, so you might try using that as working space.

2 Startup
1. Download the datafiles from http://www-staff.it.uts.edu.au/~mbeaureg/topics/prediction_
in_sas/data and extract the contents.
2. Run The SAS System and choose Solutions → Analysis → Enterprise Miner.
3. Choose File → New → Project, name it NN and click Create.
4. Right-click the empty pane on the right, choose Add Node and add an Input Data Source, a Data
Partition, a Replacement and a Neural Network. Arrange these left to right.
5. Hover your mouse cursor over the right edge of the Input Data Source until it becomes crosshairs.
Drag a connecting arrow to the left edge of the Data Partition. Repeat to connect the other nodes in
a line.
6. In the Explorer window (left) double click Libraries then right-click an empty area. Choose New.
7. Enter TUTORIAL as the name and click Browse. Navigate into the folder you expanded from the zipfile
and click OK, then OK again.
8. Double-click Input Data Source then click Select. Choose the TUTORIAL library and ORGANICS data.
Click OK.

3 Walkthrough: finding organics buyers


The ORGANICS dataset contains information about customers of a supermarket. The target variable is ORGYN,
a boolean that says whether or not the customer is interested in purchasing organic food products.

1. If the Input Data Source window is not open, double click that node.

1
2. Choose the Variables tab and right click input beside ORGYN. Choose Set Model Role → target.
Close the window. Save changes.
3. Double-click the Data Partition node. Set the percentages to 60% train, 20% validation and 20% test.
Close the window. Save changes.

4. Double-click the Neural Network node. Choose the Basic tab. Set the Runtime limit to 10 minutes.
5. Click the triangle besides Multilayer Perceptron. Choose the hidden neurons preset for Moderate noise
data. Choose OK, close the window, save changes. Call your model NN.
6. Ensure that the Neural Network node is selected (dotted outline) and choose Actions → Run.
Enterprise Miner will traverse all the preprocessing nodes before displaying a training/validation per-
formance graph. Training will cease after about 15 iterations because the model becomes perfect. Once
calculation is complete, view the results.

3.1 Questions
1. From the Tables tab, what is the misclassification rate on validation data?
2. Examine the training graph on the Plot tab. Is there any difference between performance on the
training and validation data sets?
3. What features in the data might lead to this perfect performance?

4 Walkthrough: organics buyers again


1. Close the Results window if it is open, and open the Input Data Source node (which may now be called
TUTORIAL.ORGANICS. In the Variables tab, set the Model Role for ORGANICS to rejected. Close the
window, save changes.
2. Run the Neural Network node again. When the Neural Network Monitor appears click Continue.
You may stop training after about 12 iterations.
3. View the results.

4.1 Questions
1. Describe the training performance plot. What do you think would have happened if we allowed training
to continue?

2. Is this a useful predictive model?

5 Walkthrough: multiple models


1. Add two Tree nodes, connected from your Data Partition to a Control Point.

2. Connect your neural network to the Control Point.


3. Add an Assessment node connected from the Control Point.
4. Configure one of the Trees to have a maximum of 4 branches rather than 2. Call this tree 4way Tree
when prompted. Close, save.

5. Click on the tree’s label in the diagram and rename it to 4way Tree.

2
6. Run your network from the Assessment node. Examine the results.
7. Highlight all the models and choose Tools →Lift Chart.

6 Walkthrough: ensemble model


An ensemble model combines several other models.

1. Add an Ensemble connected from the Control Point to the Assessment.


2. Run your network from the Assessment node. Is the ensemble model different?

7 Walkthrough: loading your own data and making predictions


1. Load abalone into Excel and separate it into two files known and production, with about 80% of data
in known. Both files should contain the field headings in the first line.
2. Make a new library by right-clicking in the Explorer window and choosing New. Give it a name and
set the path to wherever your data is.
This will create an empty library, because your directory does not yet contain any SAS-formatted files.
3. Click in the Explorer window and choose File →Import Data. Follow the wizard to place the two
datafiles in your new library.
4. Add a Score node connected from a model or Control Point to an Insight node.
5. Add an Input Data Source for your production data, connected to Score. In the Data tab set Role
to SCORE.
6. In the Settings tab of your Score node, choose Apply training data score code to score data
set.
7. In the Data tab of your Insight node, choose Insight based on: Entire data set.
8. Choose Select... and expand the tree until you find the dataset beginning EMDATA.SD. This set contains
the output of your Score node. Select it and choose OK. Close, save.
9. Run your network from the Insight node. Amongst all the extra variables will be one beginning P_.
This is the prediction.

7.1 Saving results into a SAS library, exporting


1. With your results sheet open, choose File →Save →Data.... Choose a destination library and give
your table a name.

2. Locate your table using the Explorer pane on the left, right-click it and choose Export. Follow the
wizard.

You might also like