Simca-P 11 PDF

Tutorial
SIMCA-P, SIMCA-P+
Version 11.0
By Umetrics AB
1992-2005 Umetrics AB
Information in this document is subject to change without notice
and does not represent a commitment on the part of Umetrics AB.
The software, which includes information contained in any
databases, described in this document is furnished under a license
agreement or non-disclosure agreement and may be used or copied
only in accordance with the terms of the agreement. It is against
the law to copy the software except as specifically allowed in the
license or nondisclosure agreement. No part of this manual may be
reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying and recording, for any
purpose, without the express written permission of Umetrics AB.
SIMCA is a registered trademark of Umetrics; Windows is a
trademark of Microsoft Corporation.
Covers products:
SIMCA-P
SIMCA-P+
Manual edition date: May 16, 2005
UMETRICS AB
Box 7960
S-907 19 Ume
Sweden
Tel. +46 (0)90 184800
Fax. +46 (0)90 184899
Email: info@umetrics.com
Home page: www.umetrics.com
Contents
How to get started with SIMCA
Regular Project (non-Batch) ......................................................................................................1

General.........................................................................................................................1
The Analysis cycle.......................................................................................................1
Import the primary data, create a new project .............................................................2
View ............................................................................................................................3
Pre-processing the data (Dataset menu).......................................................................3
Prepare the data (Workset menu).................................................................................3
Develop the model (Analysis menu)............................................................................4
Fit the model................................................................................................................5
Review the fit (Analysis menu) ...................................................................................5
Predictions (Predictions menu) ....................................................................................6
Plots/Lists ....................................................................................................................7
Road map to SIMCA-P ..............................................................................................................7
Batch Projects (SIMCA-P+ 10) .................................................................................................7
General.........................................................................................................................7
The Analysis cycle.......................................................................................................8
Introduction
11
General.....................................................................................................................................11
Plots and Lists............................................................................................................11
Foods
13
Data..........................................................................................................................................13
Data table...................................................................................................................13
Objective ..................................................................................................................................14
Analysis Outline ........................................................................................................14
Define project...........................................................................................................................14
Workset Wizard .......................................................................................................................17
Analysis ...................................................................................................................................19
Scores and Loadings ..................................................................................................20
Third Component.......................................................................................................22
Summary....................................................................................................................22
Mineral sorting at LKAB
25
Introduction..............................................................................................................................25
Data description .........................................................................................................26
Data table...................................................................................................................26
Objective ..................................................................................................................................28
Analysis outline .........................................................................................................28
Create the project .....................................................................................................................29
Tutorial SIMCA-P, SIMCA-P+
Contents i
Prepare the data ....................................................................................................................... 31

Workset Wizard ........................................................................................................ 31
Analysis................................................................................................................................... 33
PC of Y...................................................................................................................... 33
Scores and Loadings ................................................................................................. 34
PLS MODELING...................................................................................................... 37
Refining the model.................................................................................................... 40
Excluding observation 208 using the interactive tool box ........................................ 40
Removing some observations for a test set ............................................................... 41
Observation Risk....................................................................................................... 46
Predictions................................................................................................................. 47
Summary ................................................................................................................... 48
NIR
49
Introduction ............................................................................................................................. 49
Data ......................................................................................................................................... 49
Variables ................................................................................................................... 49
Observations.............................................................................................................. 50
Objective ................................................................................................................................. 51
Analysis Outline ...................................................................................................................... 51
The steps to follow in SIMCA-P are:........................................................................ 51
Create the project..................................................................................................................... 52
Prepare the data ....................................................................................................................... 53
Default Workset ........................................................................................................ 53
Transform the variables............................................................................................. 53
Analysis................................................................................................................................... 55
PLS model of all the samples.................................................................................... 55
Excluding sample 32 ................................................................................................. 61
Separate PLS models for the Sphagnum and Carex .................................................. 62
Sphagnum Model, class 2.......................................................................................... 62
Model class 1 (Carex peat)........................................................................................ 63
Predictions ............................................................................................................................... 63
Making a prediction Set ............................................................................................ 64
Cooman's Plot ........................................................................................................... 64
Summary ................................................................................................................... 64
Plots and Lists ........................................................................................................... 64
Hierarchical Models
67
Introduction ............................................................................................................................. 67
Data ......................................................................................................................................... 67
Objective ................................................................................................................................. 67
Analysis Outline ...................................................................................................................... 68
The steps to follow in SIMCA-P are:........................................................................ 68
Create the project..................................................................................................................... 68
Summarizing the feed.............................................................................................................. 69
Workset ..................................................................................................................... 69
Analysis..................................................................................................................... 70
Summarizing the reactor.......................................................................................................... 72
Workset ..................................................................................................................... 72
Analysis..................................................................................................................... 73
Scores t1 vs. t2 .......................................................................................................... 73
Loadings p1 and p3 (the 2 most important components) .......................................... 74
ii Contents
Summarizing the purification...................................................................................................74

Workset......................................................................................................................74
Summarizing the less important Y's.........................................................................................76
Workset......................................................................................................................76
Preparing for the hierarchical model........................................................................................77
Workset of the top level model..................................................................................77
Analysis .....................................................................................................................78
The score plot (t1 vs. t2) of the top level model ........................................................79
The w*c plot ..............................................................................................................80
Coefficients................................................................................................................81
Variable Importance (VIP) ........................................................................................82
Observed vs. Predicted ..............................................................................................83
Predictions................................................................................................................................84
DModXPS .................................................................................................................85
Scores tPS1 vs. tPS2 colored by test set and training set...........................................87
Cusum Chart ..............................................................................................................87
Conclusion ...............................................................................................................................89
Spectral Filtering and Compression, including OPLS
91
Introduction..............................................................................................................................91
Data..........................................................................................................................................91
Objective ..................................................................................................................................91
Analysis Outline.......................................................................................................................92
The steps to follow in SIMCA-P are: ........................................................................92
Create the project .....................................................................................................................92
Plotting the Spectra ..................................................................................................................93
Prepare the Data .......................................................................................................................94
Workset......................................................................................................................94
Analysis ...................................................................................................................................95
PLS model .................................................................................................................95
Validating the Model 1 ..........................................................................................................100
Orthogonal Signal Correction and Wavelets Compression....................................................101
Model with the Signal corrected and compressed data ..........................................................103
Summary of the preprocessed project......................................................................103
Change the default Scaling ......................................................................................104
Validating the Model 2 ..........................................................................................................106
Conclusion OSC-Wavelets ....................................................................................................107
OPLS (Orthogonal PLS) ........................................................................................................107
Conclusions............................................................................................................................109
Batch Modelling with SIMCA-P+
111
Introduction............................................................................................................................111
Data........................................................................................................................................111
Objectives ..............................................................................................................................111
Analysis Outline.....................................................................................................................112
The steps in SIMCA-P are:......................................................................................112
Create the observation level project .......................................................................................112
Analysis .................................................................................................................................115
Batch Control charts (Training set)..........................................................................117
Monitoring new batches.........................................................................................................119
Import the secondary data set with the new batches ................................................119
Control Charts for new batches ...............................................................................120
Contents iii
Prediction | Batch Control Charts | DModX............................................................ 123

Creating and Modelling the batch level project..................................................................... 124
Analysis: Autofit ..................................................................................................... 124
Analysis: Scores ...................................................................................................... 124
Analysis |Batch Control Charts | Batch Variable Importance ................................. 125
Predicting the quality of the new batches .............................................................................. 125
Predictions: T Predicted .......................................................................................... 126
Predictions: Contribution Scores for batch 51......................................................... 126
Conclusion............................................................................................................................. 128
Modelling of a Batch Digester
129
Introduction ........................................................................................................................... 129

Data ....................................................................................................................................... 129
Objectives.............................................................................................................................. 130
Analysis Outline .................................................................................................................... 130
The steps in SIMCA-P are: ..................................................................................... 131
Create the observation level project ...................................................................................... 131
Specify the Workset .............................................................................................................. 136
Analysis................................................................................................................................. 138
Fitting All the Class models .................................................................................... 138
Scores Line plot of t1, t2 and t3 .............................................................................. 139
Loadings p1, p2 and p3 ........................................................................................... 140
Batch Control charts (Training set) ......................................................................... 141
Monitoring new batches ........................................................................................................ 146
Creating the Prediction set: Complement of Workset ............................................. 146
Batch Control Chart of the Prediction set ............................................................... 147
OOC plot ................................................................................................................. 147
Group Contribution plot.......................................................................................... 148
Variable control chart.............................................................................................. 148
Prediction | Batch Control Charts | DModX............................................................ 149
Creating and Modelling the batch level project..................................................................... 150
Analysis: Autofit ..................................................................................................... 151
Analysis: Scores ...................................................................................................... 151
Analysis | Batch Variable Importance..................................................................... 153
Predicting the quality of the prediction set batches ............................................................... 153
Predictions: T Predicted .......................................................................................... 154
Predictions: Contribution Scores for batch 28......................................................... 154
Predictions: Distance to the Model (DModX)......................................................... 156
Contribution Plot..................................................................................................... 156
Conclusion............................................................................................................................. 158
iv Contents
How to get started with SIMCA
Regular Project (non-Batch)

General
SIMCA-P is organized into projects. A project is a folder
containing the results of the analysis (unlimited number of models)
of a primary dataset.
You start a new project by importing its data (primary dataset).
Unfitted models are implicitly created by SIMCA-P when you
specify a Workset or with an existing Workset when you select
Active Model Type.
At the very beginning of a project, the default Workset consists of
all data with all variables centered and scaled to unit variance and
considered as X, and the model is a principal components model
(PC) of X.
The project window displays, for every model, one line
summarizing the model results.
The active model, the one you are working with, is also listed in a
list box to the left of the gray area (status bar) just beneath the
command menu bar.
To open a model, double click on it in the project window. This
opens a model window with the details (one line per component)
of the model results.
Another way to activate a model (if several are available), is to
select its name from the list box (upper left).
The Analysis cycle
1.
Pre-processing and selection of data: (Dataset and

Workset menu)
2.
The Dataset menu allows you to trim / winsorize your

data, generate new variables, and perform spectral
filtering, or wavelet compression of the data.
A model is developed from a Workset. The default
Workset, to start, is the whole dataset with all variables as
X and scaled to unit variance. This is also obtained by
Workset | New.
How to get started with SIMCA 1
The Workset menu allows you to modify the starting

Workset.
3.
Specifying and fitting the model (Analysis menu).
4.
Reviewing the results and performing diagnostics

(Analysis menu).
5.
Using the model for predictions (Predictions menu).
Import the primary data, create a new project

File: New
Select to import data from file or databases.
SIMCA-P imports files with the following format types:
DIF: Data interchange format (many applications can export DIF
files).
TXT: Standard delimited text file (one observation per line).
TXT: Free format text, with or without header.
MAT: Matlab version 4.0 files (binary).
XLS: All versions of EXCEL files.
LOTUS 1 2 3 : *.wk1 files
JCAMP-DX : *.jcm, *.dx, *.jdx
ANDI: Chromatography AIA files
NSAS: files
GRAM: Galactic *.spc files
Others (refer to chapter 4), including old SIMCA-P file types.
Select the Source file
Source Directory: The directory that contains the data file
.
Name: Locate the source file, e.g., ENVIRO.DIF
Double click on the name of the source file.
Destination Directory: The directory in which to store the project,
e.g., C:\SIMDATA\ENVIRO.
You may also change the project directory (destination), if you
wish. By default SIMCA-P uses the source directory as the
destination directory.
Indicate file contents
Specify Primary and as many Secondary identifiers as desired for
both variables and observations.
Secondary datasets
Later you may import additional data (secondary datasets) for use
in predictions. You do that in the menu File | Import Secondary
Dataset
2 How to get started with SIMCA
View
Customize your display and specify the project level option and
general options.
Pre-processing the data (Dataset menu)

Plotting variables or observations from the dataset
Mark the variables or observations you want to plot, right click on
the marked objects and select the desired plots.
To plot all the X observations as a line plot, just right click on the
dataset and select Plot | Xobs.
Use the Dataset menus to view or modify a SIMCA-P dataset as
follows:
Quick Info
Interactive plots tied to the dataset displaying variables or
observations in the time or frequency domain.
Trimming / Winsorizing single, or all variables
Edit dataset
General Edit commands
Generate new variables
Generate new variables as functions of existing ones or from
model results
Spectral filter the dataset with:
Orthogonal Signal Correction (OSC)
Multiple Scatter Correction (MSC)
Standard Normal Variates (SNV)
1st and 2nd Derivatives
Wavelet transform and compression
PLS wavelet transform of time series
Decimation of time series
Prepare the data (Workset menu)

The default Workset, at the project start, is the whole dataset with
variables defined as Xs and Ys as specified at import, and scaled
to unit variance. The associated model (unfitted) is listed in the
active area. You are ready to fit a PLS model (default), or PC of X
or Ys, with all the data of the primary dataset. If this is what you
want, you can go directly to the Analysis menu.
To fit a different model with maybe excluded variables, or
transformations, or different scaling, it is necessary to first modify
the Workset.
An unfitted model is generated by SIMCA-P when you specify the
Workset (select a starting Workset New or As Model).
Workset
New
Uses the whole original primary dataset with Xs and Ys as defined
at import
New As Model
Use the Workset of a selected model as starting point.
Modify the Workset as follows

Observations
Include / exclude observations or group them into classes for
classification.
Variables
Define X/Y variables, transformations, scaling tec.
Transform
Transform variables.
Lag
Create lagged variables (SIMCA-P only).
Variables/Block
Select variables, and specify roles.
To select variables as X, Y or excluded, mark the variables as X, Y or
excluded and click on the Set button.
Expand
Expand the X matrix with cross terms, squares or cubes.
Scale
Select scaling base type (UV = cantered, unit variance, Par = cantered
and Pareto, etc.). A modifier can be selected (default = 1.0) that
changes the scaling of a variable relative to its base weight. Block
scaling can also be specified.
Trim / Winsorize variables
Trimming / Winsorizing the workset does not affect the dataset but
just that particular workset (refer to the workset chapter).
Options
Specify the model level options
Develop the model (Analysis menu)

Select model type
The default model is a PCX model if all your variables are defined as
X's, or a PLS model if you have defined both X's and Y's at import.
You can change the model type, and when the Workset specification
allows it, you can select among:
PCX
PC model of the X's.
PCY
PC model of the Y's.
PCAll
PC model of all included variables, X and Y.
PC Class
PC of a selected class when your observations are divided into
classes.
PLS
PLS analysis of X and Y
PLS Class
PLS of a selected class when your observations are divided into
classes.
PLSDA
PLS discriminate analysis when your observations are divided into
classes.
Fit the model

Autofit
Rule based fitting.
2 First Components
Calculate two components directly, often used to get a quick
overview of the data.
Next Component
Calculate one component at a time. Here it is possible to force
components to be calculated regardless of significance rules.
Remove Component
Remove the last component
Autofit Class Models

Autofit or takes as many components as specified of all class
models
Specify Hierarchical Models

Specify a model to be Base or Top Hierarchical
Review the fit (Analysis menu)

After a fit, the whole spectrum of plots and lists are available for
model interpretation.
Summary of fit
1.
Model Overview
2.
X/Y overview: Cumulative Fit of all variables (Y only in

PLS
3.
X/Y/Comp: The Fit of a Variable by Component.
4.
Component Contribution: The contribution of a model

component to the Fit.
5.
Scores:t1 vs. t2, t1 vs. u1, etc.
6.
Loadings: p1 vs. p2, w*c1vs. W*c2, etc.
7.
Coefficients (PLS)
8.
VIP (PLS) Variable influence on projection
9.
DMod (X or Y) Distance to the model (X or Y )
10. Observed vs. predicted (PLS)

11. Residual plots:
Normal probability plot (for selected Y's)
12. Observation risk
Note: By default, in the Analysis menu, all plots and lists are
displayed for the last component. To select a different component
for display in the plots, and/or a different variable, click on the
right mouse button and select from the available options.
Select a New Model Type

You can, after fitting the model, select a new model type. SIMCAP then creates a new unfitted model with the selected model type.
For example, if you have defined your Workset variables as X's
and Y's, you can first fit a PCY (PC of the responses), then change
the model type to PLS and fit a PLS model (another model) to the
same data.
Predictions (Predictions menu)

Building the Prediction Set
Use the menu Predictions | Specify Prediction set to build your
prediction set from the Primary or any secondary datasets. You can
display the Prediction set as a spreadsheet or just plot or list
results.
When you do not specify a prediction set, the prediction set is by
default the primary dataset with all the data.
You can build the prediction set from observations belonging to
the primary dataset or any secondary dataset that you have
imported. You can also enter the data in the prediction set through
the keyboard when you build the prediction set in the spreadsheet.
Displaying the predictions

All the prediction results (scores, y-values, etc.), computed with
the active model, are displayed as plots or lists.
Plots/Lists
Under this menu you can find general plot and list routines. Here it
is possible to plot and list any data, and results from the analysis.
There are scatter, line, column, 3d scatter, histogram, contour,
response surface, normal probability plots, wavelets plots, control
charts and batch control charts available.
Note: Click on the right mouse button to display available
properties for an active plot or list. You can generate lists form
plots and plots from list.
Road map to SIMCA-P

1. Start a project
File New
Read Data File
Specify Label Cols & Rows
2. Look at the data

Data set
Quick Info
Variables or Obs.
3. Prepare a work copy

Workset
variables
observations
4. Fit the model

Analysis
Autofit
ot fast button
5. Plot results
Analysis
Scores, Loadings
Distance to Model
6. Outliers in scores
Polish data
Prepare new workset
Graphically or via Workset
6. No outliers in scores
Continue
Interpret model (plots)
Relate to Objective
7. New data
Predictions
Select Pred.set (observations)
T_pred, Y_pred, DModX, etc.
Batch Projects (SIMCA-P+ 10)

General
A SIMCA-P Batch projects consists of two or more linked
projects. (a) The Observation level project with several
observations per batch with the variables measured during the
evolution of the batch, and (b) the batch level project(s) consisting
of the completed batches, with one batch being one observation
(matrix row). The variables of the Batch level project are the
scores, or original variables of the observation level at every time
point folded out side-wise. Batches may be divided into phases.
Observation level project

With Batch data, you start by importing the Observation level data
and create the Observation Level project.
In the data, you must have a Batch identifier, indicating the start
and end of the batch, and if phases are present, also a phase
identifier. You may also have a variable indicating the evolution of
the batch or phase and its end point. This variable can be Time or
Maturity. You can have different Maturity variables for different
phases.
Unfitted batch models are implicitly created by SIMCA-P. When
batches have phases, theses are one PLS class model with Time or
Maturity as Y for each phase. By default all variables in a phase
are scaled to unit variance.
The project window displays, for every model, one line
summarizing the model results.
When Batches have phases the PLS Batch class models (one for
every phase) are grouped under an umbrella call MBxx , xx is a
sequential number.
You can display the results of the analysis of the training set
batches in Control Charts, either as scores, DModX, predicted time
or maturity, or as individual variables.
Secondary datasets can be imported with new batches. These can
also be displayed in Control Charts in the same way.
Batch Level Project

The Batch level project is based on scores or original variables for
completed batches, obtained from the observation level project.
The Batch level project is a regular SIMCA-P project. Batch initial
conditions and quality variables, when present, are automatically
added to the batch level dataset. You can change the default model
type (PCA) to any desired model type allowed by the workset
specification.
The Analysis cycle

Observation level project
13. Pre-processing and selection of data: (Dataset and
Workset menu)
6.
The Dataset menu allows you to trim / Winsorize your

data, generate new variables, and perform spectral
filtering, or wavelet compression of the data.
A model is developed from the default Workset. The
default Workset consists of PLS Batch class models, one
for every phase.
7.
Fitting the Observation level model (Analysis menu).
8.
Reviewing the results and performing diagnostics

(Analysis menu).
9.
Batch Control Charts for training set batches (Analysis

menu)
10. Importing a Secondary dataset with new batches and

using the model to display the new batches in the Control
Charts (Prediction | Batch Control Chart).
Batch Level Project

11. Creating the Batch level project (File | Create Batch Level
project)
12. Fitting the Batch level Project
13. Interpretation using score plots, loading plots, DModX,
contribution plots, etc.
14. Predicting and interpreting results for new whole batches.
Introduction
General
This tutorial is just a brief introduction to using SIMCA-P on
selected data sets. The user is advised to go through the different
phases of modeling, import data, PC and PLS modeling, and look
at the results in graphs and lists. For a more detailed description of
how to use
SIMCA-P, the USERS GUIDE and the ON-LINE HELP system
(identical) are recommended.
There are five examples in this tutorial.
The first example shows the strength of using projection methods
on food data.
The second example is from a real process at a mineral sorting
plant.
The third example is a multivariate calibration often performed in
analytical chemistry.
The fourth example illustrates hierarchical modeling.
The fifth demonstrates the use of Spectral filtering.
Example six and seven show how to handle batch type of data,
without and with phases.
As a tutorial, this provides just a brief introduction to the main
functionalitys and plots in SIMCA-P. We recommend that you
continue with your own data, and use the Manual for details. The
Help system contains the same information as the Manual, but
organized in a different way.
Plots and Lists

You can display the results of SIMCA-P in numerous graphs and
lists.
From the Analysis and the Prediction menu, results of the active
model are available as plots and lists. With the menu Plot/List, you
have access for plotting or listing, to the raw data and every
computed value from every model. You can even plot vectors from
different models against each other.
Auto and Cross Correlation plots as well as Power Spectrum are
available for all vectors.
In Dataset you can preprocess the data by trimming and winsorizing.
Quick info plots are available with all spreadsheets.
Introduction 11
12 Introduction
Foods
Data
Collected data are often presented as data tables which are almost
useless when it comes to extract information. A data table is much
better presented graphically. The example below will illustrate the
principles of projection. The data in this example describes the
consumption of different food items in several European countries.
Variables
The selection of the variables reflects the different traditions and
cultural behavior of the countries.
Observations
16 European countries have been selected.
Data table
1
10
Grain_Coffee
Inst_Coffee
Tea
Sweet
Bisc Pa_Soup
Ti_Soup
In_Potat
Fro_Fish
Fro_Veg
21
Germany
90
49
88
19
57
51
19
21
27
Italy
82
10
60
55
41
France
88
42
63
76
53
11
23
11
Holland
96
62
98
32
62
67
43
14
14
Belgium
94
38
48
11
74
37
23
13
12
Luxembourg
97
61
86
28
79
73
12
26
23
England
27
86
99
22
91
55
76
17
20
24
Portugal
72
26
77
22
34
20
Austria
55
31
61
15
29
33
15
11
10
Switzerland
73
72
85
25
31
69
10
17
19
15
11
Sweden
97
13
93
31
43
43
39
54
45
12
Denmark
96
17
92
35
66
32
17
11
51
42
13
Norway
92
17
83
13
62
51
17
30
15
14
Finland
98
12
84
20
64
27
10
18
12
15
Spain
70
40
40
62
43
14
23
16
Ireland
30
52
99
80
75
18
11
Foods 13
11
12
13
14
15
18
19
20
Apples
Orang
Ti_Fruit
Jam
Garlic Butter
16
Margarine
Olive_Oil
Youg
Crisp_Bread
26
Germany
81
75
44
71
22
91
85
74
30
Italy
67
71
46
80
66
24
94
18
France
87
84
40
45
88
94
47
36
57
Holland
83
89
61
81
15
31
97
13
53
15
Belgium
76
76
42
57
29
84
80
83
20
Luxembourg
85
94
83
20
91
94
94
84
31
24
England
76
68
89
91
11
95
94
57
11
28
Portugal
22
51
16
89
65
78
92
Austria
49
42
14
41
51
51
72
28
13
11
10
Switzerland
79
70
46
61
64
82
48
61
48
30
11
Sweden
56
78
53
75
68
32
48
93
12
Denmark
81
72
50
64
11
92
91
30
11
34
13
Norway
61
72
34
51
11
63
94
28
62
14
Finland
50
57
22
37
15
96
94
17
15
Spain
59
77
30
38
86
44
51
91
16
13
16
Ireland
57
52
46
89
97
25
31
64
Objective
The objective of this study is to understand how the variation in food
consumption among a number of industrialized countries is related to
culture and tradition and hence find the similarities and dissimilarities
among the countries. Hence data have been collected on 20 variables
and 16 countries. The data show how many percent of households
use 20 food items regularly.
Analysis Outline
The steps to follow in SIMCA-P are:
Import the data set.
Prepare the data (Workset menu).
Fit a PC model and review the fit (Analysis menu).
Interpret the results (Analysis menu).
Define project
Start SIMCA-P and create a new project from FILE | NEW
14 Foods
Select type of data (XLS) or ALL Supported Files (the default) and
find the data set (FOODS.XLS). Data can be imported from your
hard-disk or from a network drive. Data can be imported in
different formats, so select the one which is appropriate or All
Supported Files. In this example we have the data in a XLS-file
created from Excel.
If the data set is on a floppy disk, we recommend that you first
copy the file to the hard disk.
Foods 15
If you want to leave open the current project, remove the check
mark from the box Close Current Project.
Note: The data set to import can be located anywhere on an
accessible directory. It does not have to be located where you have
defined the destination directory.
When you click on Open, SIMCA-P opens the Import Wizard.
With SIMCA-P+, mark the radio button SIMCA-P normal project.
SIMCA-P has recognized that this example has observation

numbers and names and variable names, and has correctly color
coded them.
16 Foods
When you click on Next, the Project specification page opens. You
can change the project name and a destination directory.
Mark the check box Use workset wizard and click on Finish.
Workset Wizard
The workset wizard opens to guide through the creation of the
workset and the fitting of the model.
Foods 17
Select in the Variable page, which variables are X or Y and which

variables to exclude.
If you mark variables and press Transform, the software checks
and applies transformation (Log Transform) when needed.
For this example, all variables are X and no transformation is
needed; click on Next.
In this page you exclude/include observations or set observations

into classes. The Set class from ObsID uses a selected part of any
observation ID to set classes automatically.
18 Foods
This example is a PCA to get an overview of the data table, all

observations are included and no classes are specified. Click on
Next to display a summary of the specifications and then click on
Finish to fit the model with cross validation.
Analysis
The plot with the summary of the fit of the model is displayed with
R2X(cum) (fraction of the variation of the data explained after
each component) and Q2(cum) (cross validated R2X(cum)).
Double click on model summary line. The summary of the fit of
the model is displayed with R2X (fraction of the variation of the
data explained by each component) and cumulative R2X(cum), Q2
and Q2(cum) (cross validated R2X and R2X(cum)) as well as the
eigenvalues. The food variables are, as expected, correlated, and
fairly well summarized by three new variables, the scores,
explaining 65% of the variation.
Foods 19
Scores and Loadings

Scores
Select Analysis | Scores | Scatter Plot or the fast button
to
display the score plot of t1 vs. t2 (default). In the Label Types
page, make sure the secondary identifier Onam is selected.
The ellipse represents the Hotelling T2 with 95% confidence (see

statistical appendix).
The scores t1 and t2, one vector for components 1 and 2, are new
variables computed as linear combinations of all the original
variables to provide a good summary.
The weights combining the original variables are called loadings
(p1 and p2), see below.
The score plot shows 3 groups of countries. One group with the
Scandinavian countries (the North), the second with countries from
the South of Europe, and a third more diffuse with countries from
Central Europe.
20 Foods
To color the observations (countries) by the values of a variable,

right click, and open the properties. Select color, by categories,
and in the combo box choose a variable (here garlic). In the split
range window, enter 4.
Change the split range as needed in the text boxes on the right.
Garlic separates clearly Northern Europe from Southern Europe.
Loadings
Select Analysis | Loadings | Scatter Plot to display the loadings p1
vs. p2.
The loadings are the weights with which the X-variables are
combined to form the X-scores, t (se above). This plot shows
which variables describe the similarity and dissimilarity between
countries.
Foods 21
Scandinavians eat crisp bread, frozen fish and vegetables, while in

southern Europe people use garlic and olive oil, and central
Europeans (in particular the French) consume a lot of yogurt.
Third Component
Plot the scores (t1 vs. t3) and loadings (p1 vs. p3). The third
component explains 13.8% of the variation in the data, and mainly
shows high consumption of Tea, Jam and canned soups mainly in
England and Ireland.
Summary
In conclusion, a three components model of the data summarizes
the variation in three major latent variables, describing the main
variation of food consumption in the investigated European
countries.
This example shows a simple PC modeling to get an overview of a
data table. The user is encouraged to continue to play around with
22 Foods
the data set. Take away observations and/or variables, refit new
models, and interpret at the results.
Foods 23
Mineral sorting at LKAB
Introduction
The following example is taken from a mineral sorting plant at
LKAB in Malmberget, Sweden. Research engineer Kent Tano, at
LKAB was responsible for this investigation.
In this process, raw iron ore (TON_IN) is divided into finer
material (<100 mm, 50% Fe) passing several grinders. After
grinding, the material is sorted and concentrated in several steps by
magnetic separators. The separation flow is divided in several
parallel lines and there are also feedback systems to get as high Fe
concentration as possible. The concentrated material is divided
into two products, one (PAR) which is sent to a flotation process
and another part (FAR, fines) which is sold as is. For both these
products high Fe content is important.
Twelve process factors were identified. Of these, three important
factors were used to set up a statistical design (RSM). The results
of each experiment were measured in 6 response variables. Several
observations were collected for each design point.
The process is equipped with an ABB Master system with a
SuperView 900 connected to the process data system. Data where
transferred from the ABB system to a personal computer with the
SIMCA-P software for modeling. Models were transferred back to
the SuperView system for on-line monitoring (predictions, score
and loading plots) of the process. The investigation was made in
1992. The multivariate on-line control of the process is still in
work with very good results concerning the quality of the products.
Mineral sorting at LKAB 25
Data description
The following is a description of variables and observations.
Variables
Data from 18 variables were collected.
Process variables (X)
Explanation
Abbr.
RSM
Total load
TON_IN
Design
Load of grinder 30
KR30_IN
Load of grinder 40
KR40_IN
PARmull
PARM
Velocity of separator 1
HS_1
Design
Velocity of separator 2
HS_2
Design
Effect grinder 30
PKR_30
Effect grinder 40
PKR_40
Ore waste
GBA
10
Load of separator 3
TON_S3
11
Waste from grinding
KRAV_F
12
Total waste
TOTAVF
Responses (Y)
Explanation
Abbr.
13
Amount of concentrate type 1
PAR
14
Amount of concentrate type 2
FAR
15
Distribution of type 1 and 2
r-FAR
16
Iron (Fe) in FAR
%Fe_FAR
17
Phosphor (P) in FAR
%P_FAR
18
Iron (Fe) in raw ore
%Fe_malm
Observations
A subset of 231 observations was used for modeling. Each
observation has a name referring to the date and time when data
were collected.
Data table
A subset of the data is shown in Table 1.
26 Mineral sorting at LKAB
/ Sovr.XLS Last change 930818

/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/Nr
ID (logged time)
Ton_in
KR30_IN KR40_IN PARM HS_1
HS_2
PKR_30 PKR_40 GBA
TON_S3 KRAV_F TOTAVF PAR
FAR
r_FAR %Fe_FAR %P_FAR %Fe_malm
ONUM ONAM
Ton_in
KR30_IN KR40_IN PARM HS_1
HS_2
PKR_30 PKR_40 GBA
TON_S3 KRAV_F TOTAVF PAR
FAR
r_FAR %Fe_FAR %P_FAR %Fe_malm
91
1992030512300000
1271.81
275.81
190.88
62.98
90.41
79.52
57.44
41.35
163.8
203.94
75.34
383.71
307.88
591.75
65.78
66.2
0.24
47.9
92
1992030512310000
1290.56
278.55
208.58
58.08
90.41
79.52
56.85
43.57
156.38
203.64
79.49
384.28
314.63
601.5
65.66
66.2
0.24
47.9
93
1992030512320000
1267.39
278.55
207.38
63.19
90.41
79.52
51.38
42.2
188.57
200.64
81.99
398.21
312.19
585.06
65.21
66.2
0.24
47.9
94
1992030512330000
1250.44
278.06
204.53
57.48
90.41
79.52
54.02
41.22
155.1
206.27
75.79
384.56
298.63
576.75
65.89
66.2
0.24
47.9
95
1992030512340000
1265.51
279.56
190.43
49.31
90.41
79.52
54.74
42.66
169.35
214.82
79.99
415.38
304.13
591.75
66.05
66.2
0.24
47.9
96
1992030512350000
1268.18
276.11
194.63
62.88
90.41
79.52
52.29
42.66
169.8
212.27
80.69
403.71
310.88
592
65.57
66.2
0.24
47.9
97
1992030512360000
1284.3
272.55
211.28
58.68
90.41
79.52
48.03
41.74
174.83
206.57
80.89
405.33
293.13
575.5
66.25
66.2
0.24
47.9
98
1992030512370000
1284.41
275.4
208.28
50.48
90.41
79.52
59.11
43.77
182.25
208.67
76.49
420.53
304.13
580
65.6
66.2
0.24
47.9
99
1992030512380000
1272.79
274.35
207.53
62.68
90.41
79.52
59
44.36
181.5
201.92
74.53
394.57
300.13
598.25
66.59
66.2
0.24
47.9
100
1992030512390000
1317.11
269.81
192.23
56.18
90.41
79.52
56.25
42.59
185.93
199.44
79.49
409.68
311.13
579.75
65.08
66.2
0.24
47.9
101
1992030512400000
1273.16
264.71
195.38
49.56
90.41
79.52
54.5
42.53
186.15
193.82
79.14
405.47
291.63
585.25
66.74
66.2
0.24
47.9
102
1992030512410000
1239.15
264.86
209.93
56.78
90.41
79.52
62.6
44.88
173.78
207.24
83.94
393.18
300.63
601
66.66
66.2
0.24
47.9
103
1992030512420000
1290.86
272.21
201.08
62.83
90.41
79.52
56.28
42.66
159.38
211.52
79.79
389.36
325.63
605.5
65.03
66.2
0.24
47.9
104
1992030512430000
1272.64
267.6
201.38
65.58
90.41
79.52
53.66
40.89
163.8
209.04
78.89
377.68
304.13
592.75
66.09
66.2
0.24
47.9
105
1992030512440000
1285.58
264.26
203.78
55.43
90.41
79.52
52.33
43.38
168.83
213.17
75.69
400.21
314.44
594.56
65.41
66.2
0.24
47.9
106
1992030512450000
1263.75
267.9
187.13
63.58
90.41
79.52
50.4
42.85
176.55
196.74
81.44
390.11
314.88
587.75
65.12
66.2
0.24
47.9
107
1992030512460000
1289.36
264.86
212.63
53.03
90.41
79.52
52.3
42.72
175.65
190.52
76.59
406.36
300.38
593.5
66.4
66.2
0.24
47.9
108
1992030512470000
1309.05
272.55
194.78
50.93
90.41
79.52
50.01
45.47
172.35
200.64
76.54
392.13
297.88
596.25
66.69
66.2
0.24
47.9
109
1992030512480000
1282.01
271.16
209.48
61.83
90.41
79.52
47.49
46.85
197.94
193.59
75.99
397.38
315.88
576.5
64.6
66.2
0.24
47.9
110
1992030512490000
1288.91
264.26
222.83
51.88
90.41
79.52
60.47
44.36
193.07
193.29
77.19
419.35
305.63
577.5
65.39
66.2
0.24
47.9
111
1992030512500000
1289.96
262.46
210.38
45.11
90.41
79.52
51.09
47.77
195.77
189.24
79.14
419.04
310.13
566.25
64.61
66.2
0.24
47.9
131
1992030513100000
1062.49
217.61
170.63
32.41
94.52
74.7
41.08
40.82
130.71
153.08
59.13
310.49
242.56
477.19
66.3
67.2
0.2
51.2
132
1992030513110000
1024.8
218.06
178.43
43.46
94.52
74.7
38.12
38.27
132.58
152.7
62.28
294.07
261.13
505.5
65.94
67.2
0.2
51.2
133
1992030513120000
1070.74
215.06
165.34
38.51
94.52
74.7
40.92
39.25
140.01
147.88
57.33
304
248.56
501.75
66.87
67.2
0.2
51.2
134
1992030513130000
1054.65
216.08
176.03
31.41
94.52
74.7
44.23
39.19
135.88
148.48
59.33
318.05
249.56
523
67.7
67.2
0.2
51.2
135
1992030513140000
1072.05
214.73
166.24
41.08
94.52
74.7
42.43
37.09
127.71
149.7
61.14
302.51
263.56
514.5
66.13
67.2
0.2
51.2
136
1992030513150000
1056.71
224.63
177.68
35.31
94.52
74.7
46.42
39.25
117.51
143.68
57.73
285.5
252.88
522.25
67.38
67.2
0.2
51.2
137
1992030513160000
1025.7
216.68
174.68
29.61
94.52
74.7
46.51
36.76
117.81
148.93
57.13
294.25
257.63
513.25
66.58
67.2
0.2
51.2
138
1992030513170000
1045.91
215.03
171.23
43.26
94.52
74.7
43.41
39.19
129.58
148.63
52.23
290.73
253.63
499.94
66.34
67.2
0.2
51.2
139
1992030513180000
1044.15
219.08
166.88
36.01
94.52
74.7
47.06
39.58
127.41
136.56
56.43
279.88
237.06
508.25
68.19
67.2
0.2
51.2
140
1992030513190000
1106.14
219.08
175.58
27.11
94.52
74.7
41.08
41.94
130.11
149.38
55.28
312.23
243.81
519.25
68.05
67.2
0.2
51.2
141
1992030513200000
1079.55
222.83
199.58
37.91
94.52
74.7
48.49
38.92
121.41
148.33
56.03
285.54
259.88
521.5
66.74
67.2
0.2
51.2
142
1992030513210000
1024.35
213.23
184.43
36.41
94.52
74.7
45.05
42.33
140.53
136.41
57.68
296.2
257.13
515.5
66.72
67.2
0.2
51.2
143
1992030513220000
1071.49
213.41
174.83
27.91
94.52
74.7
44.67
40.17
125.21
141.9
59.73
300.23
266.88
515.5
65.89
67.2
0.2
51.2
144
1992030513230000
1069.65
220.76
192.38
36.41
94.52
74.7
40.41
39.71
122.76
145.63
55.03
287
254.63
531.25
67.6
67.2
0.2
51.2
145
1992030513240000
1055.14
217.76
182.03
36.96
94.52
74.7
38.3
39.51
128.16
142.71
52.73
284.99
265.38
530.5
66.66
67.2
0.2
51.2
146
1992030513250000
1087.2
225.11
157.88
34.98
94.52
74.7
41.79
38.73
131.03
140.03
56.09
292.16
256.38
527.25
67.28
67.2
0.2
51.2
173
1992030513520000
1059.75
224.33
176.48
6
94.52
48.19
46.19
39.58
104.81
99.11
55.68
249.03
238.31
591.75
71.29
64.3
0.39
50.9
174
1992030513530000
1056.15
229.01
164.93
6.5
94.52
48.19
43.1
41.81
115.03
95.66
54.03
258.22
232.81
557.75
70.55
64.3
0.39
50.9
175
1992030513540000
1032.9
221.33
173.48
0.05
94.52
48.19
46.29
39.25
112.56
95.14
57.43
260.62
230.31
564.75
71.03
64.3
0.39
50.9
176
1992030513550000
1059.04
237.26
173.93
6.5
94.52
48.19
45.42
39.38
115.78
96.56
55.93
263.47
223.31
587.5
72.46
64.3
0.39
50.9
177
1992030513560000
1008
228.11
166.28
9.86
94.52
48.19
42.97
41.22
105.58
102.06
57.34
257.74
248.38
567.75
69.57
64.3
0.39
50.9
178
1992030513570000
1079.29
223.61
170.03
9.65
94.52
48.19
46.19
38.07
119.68
102.11
56.03
263.05
225.31
615.75
73.21
64.3
0.39
50.9
179
1992030513580000
1096.84
225.41
177.68
3.15
94.52
48.19
42.26
40.69
120.28
99.71
57.38
278.64
205.06
600.25
74.54
64.3
0.39
50.9
180
1992030513590000
1057.65
225.11
174.98
11.1
94.52
48.19
43
39.97
109.31
98.36
56.13
252.7
216.56
562.75
72.21
64.3
0.39
50.9
181
1992030514000000
1073.29
223.01
168.04
5.65
94.52
48.19
42.33
37.81
120.88
101.14
55.58
268.34
227.56
604.5
72.65
64.3
0.39
50.9
182
1992030514010000
1094.55
224.03
163.24
2
94.52
48.19
45.65
38.27
106.16
102.26
53.58
260.15
221.56
611.5
73.4
64.3
0.39
50.9
183
1992030514020000
1033.5
219.98
168.98
10.9
94.52
48.19
41.81
39.12
111.49
99.64
53.38
265.02
226.06
576.25
71.82
64.3
0.39
50.9
184
1992030514030000
1061.29
214.73
167.44
4.61
94.52
48.19
42.85
39.05
111.56
101.53
58.43
269.5
223.06
571.81
71.94
64.3
0.39
50.9
185
1992030514040000
1037.4
217.43
167.14
1.25
94.52
48.19
42.2
39.19
100.84
99.79
54.83
254.2
218.81
595
73.11
64.3
0.39
50.9
186
1992030514050000
1048.65
219.98
174.53
8.4
94.52
48.19
42.1
37.29
109.84
102.79
52.23
258.65
220.56
571
72.14
64.3
0.39
50.9
198
1992030514170000
1059.9
221.93
152.63
12.9
86.3
74.7
39.63
36.76
101.89
127.78
50.63
267.39
238.31
579.5
70.86
66.3
0.28
51.9
199
1992030514180000
1043.25
215.48
159.98
20.7
86.3
74.7
37.7
37.02
94.16
124.26
49.06
246.78
227.81
595.75
72.34
66.3
0.28
51.9
200
1992030514190000
1054.8
211.43
168.53
17.2
86.3
74.7
39.45
37.4
94.01
124.78
46.86
256.11
217.81
557.25
71.9
66.3
0.28
51.9
201
1992030514200000
1037.85
219.23
168.04
11.4
86.3
74.7
40.28
39.97
90.41
130.86
47.26
262.16
228.56
590
72.08
66.3
0.28
51.9
202
1992030514210000
1062.49
225.56
155.63
19.7
86.3
74.7
41.25
35.65
105.56
137.61
49.91
273.38
228.81
572.25
71.44
66.3
0.28
51.9
203
1992030514220000
1036.05
208.13
163.43
17.2
86.3
74.7
40.86
37.88
83.29
135.06
50.88
258.99
229.31
570.75
71.34
66.3
0.28
51.9
204
1992030514230000
1059.15
210.98
155.03
11.95
86.3
74.7
37.32
35.5
102.19
135.81
51.78
280.93
227.81
586.75
72.03
66.3
0.28
51.9
205
1992030514240000
1033.99
213.11
169.58
18.15
86.3
74.7
38.06
38.14
102.49
128.68
47.26
259.08
218.81
575.75
72.46
66.3
0.28
51.9
206
1992030514250000
1035.04
215.96
161.03
21.4
86.3
74.7
40.17
38.4
109.01
130.33
50.63
261.36
235.81
577.75
71.01
66.3
0.28
51.9
207
1992030514260000
1029.3
208.73
163.09
20.7
86.3
74.7
40.86
36.83
102.11
127.48
48.36
249.91
228.56
568.5
71.32
66.3
0.28
51.9
208
1992030514270000
0
0
0
1.5
86.3
74.7
20.45
15.52
0
0
0
0
182.06
532
74.5
66.3
0.28
51.9
233
1992030514520000
1491.83
297.86
233.51
1.41
86.3
48.19
59.31
48.68
168.23
140.38
75.79
372.87
305.63
861.88
73.82
62.6
0.42
49.8
234
1992030514530000
1489.58
298.46
254.21
6.8
86.3
48.19
58.97
47.57
141.73
140.76
85.74
359.68
299.38
846.38
73.87
62.6
0.42
49.8
235
1992030514540000
1500.08
295.31
235.91
9.75
86.3
48.19
62.99
47.18
144.73
141.28
89.64
365.9
331.88
825.63
71.33
62.6
0.42
49.8
236
1992030514550000
1467.9
305.4
261.26
3.35
86.3
48.19
59.59
49.67
165.53
134.98
90.89
394.19
313.38
838.88
72.8
62.6
0.42
49.8
237
1992030514560000
1477.58
307.8
247.16
6.8
86.3
48.19
62.85
48.75
145.48
138.51
90.79
363.23
331.63
856.13
72.08
62.6
0.42
49.8
238
1992030514570000
1490.55
306.75
251.66
11.3
86.3
48.19
60.13
50.25
159.75
137.31
81.34
372.12
323.13
833.38
72.06
62.6
0.42
49.8
239
1992030514580000
1500.83
296.81
262.16
12.3
86.3
48.19
63.78
51.11
151.35
135.88
81.99
356.92
318.88
857.63
72.9
62.6
0.42
49.8
240
1992030514590000
1506.04
295.01
255.23
5.45
86.3
48.19
59.47
51.83
159.83
133.5
84.54
375.34
323.88
787.44
70.86
62.6
0.42
49.8
241
1992030515000000
1495.2
307.8
271.46
13.4
86.3
48.19
60.11
54.71
171.08
139.03
89.39
382.27
303.88
835.63
73.33
62.6
0.42
49.8
242
1992030515010000
1493.78
310.8
259.46
7.35
86.3
48.19
59.2
49.34
152.1
133.63
84.99
366.02
339.13
848.38
71.44
62.6
0.42
49.8
251
1992030515100000
1269.71
273.11
215.33
21
90.41
66.24
49.55
46.72
152.48
146.03
74.73
347.11
294.63
650.06
68.81
67.1
0.19
50.1
252
1992030515110000
1270.91
270.41
228.71
18.3
90.41
66.25
50.76
51.04
148.48
144.21
72.78
347.16
274.38
652.81
70.41
67.1
0.19
50.1
253
1992030515120000
1279.91
266.21
232.31
16.5
90.41
66.26
46.13
48.36
153
139.41
71.68
350.78
309.63
638.56
67.35
67.1
0.19
50.1
254
1992030515130000
1280.36
267.86
230.21
25.71
90.41
66.26
50.34
46.39
141.13
141.96
75.54
336.06
291.38
616.75
67.91
67.1
0.19
50.1
255
1992030515140000
1249.46
262.61
213.38
23.6
90.41
66.26
56.84
45.8
149.91
139.93
77.49
339.68
299.88
653.06
68.53
67.1
0.19
50.1
256
1992030515150000
1277.63
264.71
211.43
16.7
90.41
66.26
47.34
49.93
162.08
149.53
72.63
362.66
287.13
655.31
69.53
67.1
0.19
50.1
257
1992030515160000
1306.05
260.81
210.23
25.81
90.41
66.26
50.91
46.65
172.2
140.23
75.69
367.11
306.63
635.06
67.44
67.1
0.19
50.1
258
1992030515170000
1243.24
271.01
216.08
20.7
90.41
66.26
47.08
47.05
165.83
148.48
77.89
362.19
288.88
656.06
69.43
67.1
0.19
50.1
259
1992030515180000
1262.59
277.31
205.58
13.65
90.41
66.26
55.09
45.34
152.7
146.76
74.43
364.57
280.88
654.06
69.96
67.1
0.19
50.1
260
1992030515190000
1282.76
262.76
224.03
23.61
90.41
66.26
52.17
47.5
163.43
137.61
77.29
362.88
289.38
664.06
69.65
67.1
0.19
50.1
261
1992030515200000
1277.18
257.66
221.48
17.95
90.41
66.26
50.37
47.24
156.23
145.33
73.18
356.78
288.13
659.06
69.58
67.1
0.19
50.1
262
1992030515210000
1258.46
263.81
203.63
10.4
90.41
66.26
49.32
45.93
154.95
143.16
77.59
368.96
268.13
653.31
70.9
67.1
0.19
50.1
263
1992030515220000
1236.94
255.86
207.53
20.25
90.41
66.26
48.91
47.57
173.1
143.76
75.39
364.57
284.63
632.06
68.95
67.1
0.19
50.1
323
1992030516220000
1247.89
253.31
230.78
6.3
82.19
66.26
56.59
46.06
190.82
140.08
90.44
411.49
257.13
622.25
70.76
64.6
0.24
44
324
1992030516230000
1262.66
258.26
216.83
13.1
82.19
66.26
51.78
52.81
191.34
139.86
90.64
413.86
275.38
628.06
69.52
64.6
0.24
44
325
1992030516240000
1303.99
259.31
238.31
12.45
82.19
66.26
52.94
50.32
196.14
135.88
96.64
416.21
263.13
622.5
70.29
64.6
0.24
44
326
1992030516250000
1280.74
267.71
229.13
5.25
82.19
66.26
56.69
49.53
203.27
138.88
91.49
428.39
273.13
626.06
69.63
64.6
0.24
44
327
1992030516260000
1259.63
263.81
220.43
11.4
82.19
66.26
54.28
53.73
184.95
136.86
84.39
394.79
285.13
622.75
68.59
64.6
0.24
44
328
1992030516270000
1301.21
262.46
223.13
16.15
82.19
66.26
54.95
48.22
196.82
134.76
89.74
407.26
278.13
628.06
69.31
64.6
0.24
44
329
1992030516280000
1282.58
264.11
222.83
22.8
82.19
66.26
51.83
50.58
191.34
136.56
94.84
411.26
290.13
625
68.3
64.6
0.24
44
330
1992030516290000
1252.13
265.01
226.43
11.2
82.19
66.26
54.12
48.16
191.12
135.51
84.89
390.99
268.13
635.06
70.31
64.6
0.24
44
331
1992030516300000
1248.98
270.11
230.63
17
82.19
66.26
54.58
45.93
190.37
133.11
88.34
386.02
270.88
607.75
69.17
64.6
0.24
44
Table 1
Objective
The objective of this study is to investigate the relationship between
the process variables and the 6 output variables describing the quality
of the final product.
Analysis outline
An Overview of the Responses
A PC model of the responses is made to understand:
How the responses relate to each other and to the

observations.
The similarity and dissimilarity between the observations,

and if there are outliers.
The explanatory power of the variables.
Relating the process conditions to the responses
Understand and interpret the relationship between the

process variables and the responses.
Predict the output of new process conditions.
The steps to follow in SIMCA-P
Define the project: Import the primary data set.

Specify which variables are process variables (X) and
which are responses (Y).
Expand the X matrix with the squares and cross terms of
the 3 designed variables.
Fit the models, first PC-Y and then PLS, and review the
fit (Analysis menu).
Refine models if necessary by removing outliers

(Workset menu).
Use the PLS model for predictions (Prediction menu).
Create the project

Start SIMCA-P and import the data file from FILE | NEW
Find the data set (SOVR.XLS).

If you have SIMCA-P+, select the radio button to create a normal
SIMCA-P project and click on Next.
Click on Commands and create Index Variable to generate

Variables numbers, and mark them as secondary ID's.
Mark the columns (Variables) PAR to the end, use the arrow on
one of the variables, and from the drop down menu, select them as
Ys. This selection becomes the default workset.
Click on Next.
The Import wizard opens. In the Project specification page, you

can change the project name and destination directory.
Make sure the check box use workset wizard is marked and click
on Finish, the workset wizard opens.
Prepare the data

Workset Wizard
SIMCA-P's default workset consists of all the observations in the
primary data set with all variables, scaled to unit variance and
defined as X's or Ys as specified at import.
To Expand the X matrix with squares and/or cross terms press

Use Advanced Mode and click on Expand.
The three variables TON_IN, HS_1 and HS_2 were varied

according to a statistical design (RSM) supporting a full quadratic
model. We will expand the X matrix with the squares and crossterms of these 3 variables.
Mark TON_IN, HS_1, HS_2. Press the button Sq & Cross and the
squares and cross-terms of these 3 variables are displayed in the
expanded list.
Click on OK to exit the workset menu.
Analysis
To first get an overview of the responses, we fit a PC model of the
Y variables (PCY).
PC of Y
When you exit the workset window, an unfitted model (M1) is
created with model type PLS (The default for a workset with both
X's and Y's). Click on Analysis | Active Model Type and select
PCY. The model type changes to PC-Y. Click on Analysis | 2 First
Components to fit a PC model of the Y's with 2 components.
The model overview plot opens.
Click on the model summary line to open a table with the

summary of the fit of the model. This table displays R2X (fraction
of the variation of the data explained by each component) and
cumulative R2X(cum), as well as the eigen values and the Q2 and
Q2(cum)(cross validated R2). The six Y's are correlated, and are
summarized by two new variables, the scores t1 and t2, explaining
70.9% of their variation.
Scores and Loadings

Scores
Select Analysis | Scores | Line Plot to display the score plot of t1
vs. t2 with a line drawn between the points. In Label Types mark
Use identifier Obs ID (primary).
The scores t1 and t2, one vector for dimension 1 and 2, are the new
variables computed as linear combinations of the six responses and
summarizing Y.
The score plot shows that the observations cluster in different
groups. Each group represents a setting of the experimental design.
The process ran for a certain time at each of these settings (design
points) to reach stability. Measurements on the process (the
observations in the score plot) were recorded every minute. No
obvious outliers are present.
Loadings
Select Analysis | Loadings | Scatter Plot to display the loadings p1
vs. p2.
In Label Types mark Use Identifier Var ID (Primary) and click on
Save AS Default Options, to always display variable names.
The loadings are the weights with which the variables are
combined to form the scores, t. The loadings, p, for a selected PC
component, represent the importance of the variables in that
component and show the correlation structure between the
variables, here the responses Y.
In this plot we see that PAR, FAR, %P_FAR is positively
correlated and negatively correlated to %Fe_FAR. r_Far
dominates the second component, is here negatively correlated to
PAR and has only a small correlation to the other variables in
component 2. %Fe-Malm is not correlated to any of these variables
in the first two components.
Click on Analysis | Next Component, and compute a third
component. Display the loadings p1 vs. p3. The third component
(explaining 22% of the variation of the data) is dominated by %FeMalm. In the third component this variable has a small positive
correlation to %Fe-FAR, r_FAR and FAR and little to the others.
Summary of Overview of Responses

No outliers were detected. All of the responses participate in the
model, and are correlated to each other, with the exception of
%Fe-Malm, which is only slightly correlated to three of them.
PLS MODELING
The main objective is to develop a predictive model, relating the
process variables X's to the output measurements (responses) Y.
The experimental design in three of the process variables accounts
for an important part of the variation of the Y's.
New Model Type

Click on Analysis | Active Model type and select PLS.
Another unfitted model, M2, is created and you are ready to fit a
PLS model.
Autofit
Click on Analysis | Autofit, or the fast button
model, with cross validation.
, to fit a PLS
The Model Overview Plot displays R2Y(cum), the fraction of the

variation of Y (all the responses) explained by the model after each
components, and Q2(cum), the fraction of the variation of Y that
can be predicted by the model according to the cross-validation.
Values of R2Y(cum) and Q2Y(cum) close to 1.0 indicate an
excellent model.
Double click on the model summary line to display a list of the fit
of the model per component.
The present model is indeed excellent and explains 80% of the
variation of Y, with a predictive ability (Q2) of 76%.
Summary: X/Y Overview

Click on Analysis | Summary | X/Y Overview | Plot and display
the cumulative R2Y and Q2Y for every response. With the
exception of %Fe-FAR and %P-FAR, all responses have an
excellent R2 and Q2.
Scores t1 vs. t2
Click on Scores | Scatter plot and t1 vs. t2. Use the marker
to label the outlying observation. Observation 208 lies far
away in the first component.
Scores t1 vs. u1
Right click and in properties select t1 vs. u1, and in Label Types
mark ObsID (Primary). We have a good relationship between the
first summary of the X's (t1), and the first summary of the Y's (u1),
with the exception of observation 208.
Contribution plot
To understand why observation 208 differs from the others in the
first score (t1), in the t1vs u1 plot double click on observation 208.
This contribution plot displays the differences, in scaled units, for

all the terms in the model, between the outlying observation 208
and the normal (or average) observation, weighted by w1* (the
importance of the X-variables in component 1).
The raw iron ore (TON_IN) as well as the load on the grinders and
the other variables were all far below average. Inspecting the data,
we find TON_IN and load on the grinders to be 0 for observation
208, obviously causing a process upset (an outlier) at time 14:27.
Refining the model

We will remove observation 208, set aside a few observations as a
Test set, and then refit the PLS model.
Excluding observation 208 using the

interactive tool box
In the score plot t1 vs u1, mark observation 208 and click on the
. SIMCA-P excludes observation 208 from
red arrow
the workset and asks if you want to generate a new unfitted model
M3. Say Yes.
The workset bar opens with the workset for model M3.
Observation 208 is excluded. When you display the Dockable
window Observations, 208 is marked excluded.
Removing some observations for a test set

In the Workset bar, hold the Ctrl key and mark observations 140146, 173-179,350-379,551-555, then right click and select
Exclude.
The deleted observations are also marked on the plot
Autofit
Click on Analysis | Autofit or the fast button, to refit the PLS
model.
The Summary | Model Overview plot is updated as the model is
fitted.. Note the improvement in both R2Y(cum) and Q2(cum).

Click on Analysis | Summary |X/Y Overview | Plot to display the
cumulative R2Y and Q2Y for every response.
The responses PAR, FAR and %FE_malm are very well

explained (90% or better) and the others a little less well.
Scores t1 vs. t2
Click on Analysis | Scores | Scatter t1 vs. t2 and display the t1 vs.
t2 plot. We see the observations separated in groups, each group
representing a setting of the experimental design.
Scores t1 vs. u1
In the Properties change the Scores to t1 vs. u1.
We now have an excellent relationship between t1 and u1 with no

outliers.
Loadings w*c1 vs. w*c2

The w*'s are the weights that combine the original X variables (not
their residuals in contrast to w) to form the scores t. In the first
component w* is equal to w. The w*'s are related to the correlation
between the X variables and the Y scores u. X variables with large
values of w* (positive or negative) are highly correlated with u
(and thereby Y).
The c's are the weights used to combine the Y's (linearly) to form
the scores u. The c's express the correlation between the Y's and
the t's (X-scores).
In the first two component, PAR, and FAR are positively

correlated with all the load variables and negatively correlated
with r_PAR, %Fe-FaR and %Fe_Malm. The model is almost
linear except for HS_2 and its squared term dominating the second
component.
Normal Probability plot of residuals

Click on Analysis | Residuals | Normal Probability Plot to display
the Normal probability plot of residuals.
Examining this plots,we see the residuals close to normally

distributed with no outliers. Right click and in the Properties page
shift between different Y variables and/or change options.
Coefficients
Click on Analysis | Coefficients | Plot to display the PLS

regression coefficients (for scaled and centered data) for PAR,
with confidence intervals (the default is 95%). The dominating
factors are TON_IN, KR30_in KR40_in and Ton_S3 with a
positive effect. Use the Property bar to change responses or
components.
Variable Importance
Click on Analysis | Variable Importance. This plot shows the
importance of the terms in the model, as they correlate with Y (all
the responses) and approximate X.
Distance to the Model

Click on Analysis | Distance to the Model | XBlock to display the
distance to the model (how far away an observation is from the
model hyper-plane) in the X space.
These distances are in normalized units and are the same as the
row residual standard deviations.
Observation Risk
Click on Analysis | Observation Risk
This plot displays the Observation Risk for every Y and for the
pooled Ys.
Using the zoomer around observation 349 which has a large
observation risk we get the following plot:
Observation 349 for Y Far has a larger Y residual when not in the
training set model than when the observation is included in the
model; hence its prediction is uncertain, risky.
The following list displays the Y (Far) residual when observation

349 is and is not in the model.
Predictions
We can now use the model to predict the outcome of the process
for the Test set observations.
Click on Prediction | Specify Prediction set | Specify. Remove all
observations from Observations in the Prediction set. In the left
window select Workset Complement, click on Select All and use
the arrow to move all the observations to the left window.
Mark observation 208 and click on Remove to exclude it from the
prediction set. Click on Apply and Close this dialog
Click on Predictions |Y Predicted | Scatter plot. The observed vs.

predicted plot, for PAR, is displayed.
For PAR and FAR (Select from properties), we have excellent

predictions, they are less good for the other responses.
Also look at DModX (under prediction menu).
Summary
This example shows that statistical design in the dominating
process variables gives data with high quality that can be used to
develop good predictive process models. With multivariate
analysis we extract and display the information in the data.
NIR
Introduction
The following example originates from a research project on peat
in Sweden. Peat is formed by an aerobic microbiological
decomposition of plants followed by a slow anaerobic chemical
degradation. Peat in Sweden (northern hemisphere in general) is
mainly formed from two types of plants, Sphagnum mosses and
grass of Carex type. Within the main groups there is variation
among the species. Depending on location, climate etc. there are
several other plants involved in the peat forming process.
In the project many different types of chemical analyses were
performed to get detailed information about the material and to
investigate differences among different peat types. Chemical
analysis was performed according to traditional methods (GC,
HPLC, etc.) which often were laborious and time consuming. To
speed up the analysis of samples, Near Infrared Spectroscopy
(NIR) together with multivariate calibration was introduced. This
strategy was found to work very well and after the calibration
phase, samples were analyzed in minutes instead of weeks.
In this tutorial we selected a subset of samples, which represents
the typical variation of peat in Sweden.
Data
Variables
Variables 1-19 represent spectra from the NIR instrument, which
in this case was a 19 channel filter instrument. Spectra are
recorded as Log (Absorbance) and then scatter corrected by a
MSC procedure.
Variables 20-46 represent different chemical analyses, which the
NIR spectra can be calibrated against.
NIR 49
Var. No.
Type
Name
Explanation
1-19
NIR
Log Absorbance
20
Rhamnos
Mono saccharide
21
Fucos
Mono saccharide
22
Arabinos
Mono saccharide
23
Xylos
Mono saccharide
24
Mannos
Mono saccharide
25
Galaktos
Mono saccharide
26
Glukos
Mono saccharide
27
Klason l
Klason Lignine
28
Bitumen
Bitumen
29
Aspargin
Amino acid
30
Threonin
Amino acid
31
Serin
Amino acid
32
Glutamin
Amino acid
33
Prolin
Amino acid
34
Glycin
Amino acid
35
Alanin
Amino acid
36
valin
Amino acid
37
Methionin
Amino acid
38
Isoleucin
Amino acid
39
leucin
Amino acid
40
Tyrosin
Amino acid
41
Fenylalanin
Amino acid
42
Histidin
Amino acid
43
Lysin
Amino acid
44
Aginin
Amino acid
45
Glucose-amin
Amino sugar
46
Galactos-amin
Amino sugar
Variable 27 (Klason l) is Klason Lignin (rest after hydrolysis) and

variable 28 is Bitumen, which represents carbohydrates solvable in
acetone.
Observations
From a huge number of peat samples 41 were selected,
representing the main variation of peat in Sweden. The sample
(observation) names are coded in all 20 characters. Each position
50 NIR
in the names carries certain information. In the plots a sub-string

of two characters (position 6 and 7) are often used. Position 6
represents the degree of decomposition, L (low), M (medium) and
H (high). Position 7 represents peat type, S (Sphagnum) and C
(Carex).
Objective
The objective of this study is to model and predict different
constituents of samples of peat directly from their NIR spectra. 41
samples of peat, mainly of two types Sphagnum and Carex, were
subjected to NIR spectroscopy. The spectra were recorded at 19
wavelengths (19 filters) with a reflectance instrument (log(abs))
and scatter corrected before the analysis.
For this objective, we will now develop a PLS model relating the
X variables (NIR spectra) to the Y variables (peat constituent
concentrations measured by traditional analysis).
Analysis Outline
Making a PLS model relating the NIR spectra variables to

the peat constituents in order to:
Understand and interpret the relationship between the
spectra (X) and peat composition (Y variables).
Develop separate PLS model for each type of peat

(Sphagnum and Carex), to:
1) Increase the precision of the calibration.
2) Be able to classify and predict peat types.

a) Specify which variables are process variables (X) and
which are responses (Y)
b) Transform the variables
The responses are concentrations of the chemical
constituents of peat, and their variation is non linear, a
Log transformation is warranted.
(log Y + 0.1) with 0.1 to make sure that all values are
positive before the transformation.
c) Group the observations in 2 classes for peat type
Sphagnum and Carex.
Fit the model, a PLS of all the data (Analysis menu).
Fit a PLS model for each of the peat type, Sphagnum and
Carex.
Use the PLS model for predictions and classification

(Prediction menu).
NIR 51
Create the project

Start a new project. The data set name now is NIRKHAM.XLS
Start SIMCA-P and create a new project from FILE: NEW.
If you have SIMCA-P+, select normal SIMCA-P project.
The import wizard opens.
The first two columns are correctly marked as observations

numbers and names and the first raw is variable names.
Mark the first raw and click on Variable secondary Ids to have a
variable index.
Mark the variables starting with Ramos to end, and from the
combo box select Y Variable.
52 NIR
SIMCA-P marks these variables as Y (response) variables.

Click on Next to open the Project specification page. You can
change, as desired, the destination folder, or the project name.
Click on Finish, the data set Nirkham is imported.
Prepare the data

Default Workset
SIMCA-Ps default workset consists of all the observations in the
defined as X's or Ys as specified at import. This is the starting
workset when you select Workset | New.
Transform the variables

Click on Workset | New and select the Transform tab.
Mark all the Y variables, select Log, with C1= 1 and C2=0.1
(some of the concentrations are 0.0), and click on Set.
NIR 53
Group observations in classes:

Select the Observations tab and display the secondary ID's
Right click on the Primary ID's and select Observation label.
To group observations in 2 classes the Carex and Sphagnum click

on From Obs ID and select Obs Sec ID.
54 NIR
And select start position 7 for length 1.
The Carex are set to class 1 and the Sphagnum to class2

One observation, 21 (not Sphagnum or Carex type) is set to class 3
as it belongs to neither group. Mark it and set to no class.
Click on OK to exit the Workset window.
Analysis
created with model type PLS class (The default for a workset with
both X's and Y's and classes). In Analysis Model Type change it to
PLS. You are ready to fit a PLS model.
PLS model of all the samples

Autofit
Change the model type to PLS and click on Analysis | Autofit. The
model overview plot is updated as the model is fitted. This plot
displays R2Y cumulative by component and Q2 Y cumulative by
NIR 55
component. R2 Y is the fraction of the variation of Y (all the

responses) explained by the model after each component, and Q2Y
is the fraction of the variation of Y that can be predicted by the
model according to the cross-validation. Values of R2Y(cum) and
Q2Y(cum) close to 1.0 indicate an excellent model.
Double click on the Model Summary line to display the

corresponding list.
Multivariate calibration with NIR spectra often leads to many

components due to the high precision of the data. The present
model is indeed excellent and explains 88.2% of the variation of
Y, with a predictive ability (Q2) of 73.9%.

Click on Analysis | Summary | X/Y Overview | Plot and display
the cumulative R2Y and Q2Y for every response. Use the
Properties page to select variable labels and Click on Save As
default Options to always have variable names.. With the
exception of Bitumen all responses have an excellent R2 and Q2.
56 NIR
Scores t1 vs. t2
Click on Analysis | Scores | t1 vs. t2 plot. Use the Marker to mark
the outlying observation, and then use the label button to label it.
Observation 32 lies far away in the second component, indicating
that sample 32 is different with respect to NIR spectra.
Comparing the spectra of observation 32 and 39

Mark both observations, right click and select Plot Xobs to display
the spectra of these 2 observations.
NIR 57
Scores t1 vs. u1
We have a good relationship between the first summary of the X's
(t1), and the first summary of the Y's (u1), with some spread in the
data.
To display informative labels, select in properties Obs Sec ID, start

in position 6 for length 2.
58 NIR
You can now distinguish two groups of observations, S Sphagnum

peat and C Carex peat.
Scores u1 vs. u2
The projection of the samples in the Y space (traditional chemical
analyses) does not show observation 32 as outlier as in the Scores
plot.. NIR spectroscopy can detect very small changes in chemical
composition (PPM level) compared to the traditional analyses
which typically have large measurements errors (3-50%). With
NIR spectroscopy one achieves better control of the samples.
Contribution plot
To understand why sample 32 differs from the others, double click
on observation 32 in the Scores t1 vs. t2.
NIR 59
This contribution plot displays the differences, in scaled units, for

all the terms in the model, between the outlying observation 32
and the normal (or average) observation, weighted by w*1 w*2 (the
importance of the X-variables in component 1, 2
In the plot we see some spectral variables close to 8 standard
deviations, indicating some contamination in this sample. We shall
remove sample 32.
Loadings w*c1 vs. w*c2

The w*'s are the weights that combine the original X variables (not
their residuals in contrast to w) to form the scores t. In the first
component w* is equal to w. The w*'s are related to the correlation
between the X variables and the Y scores u. X variables with large
values of w* (positive or negative) are highly correlated with u
(and thereby Y).
The c's are the weights used to combine the Y's (linearly) to form
the scores u. The c's express the correlation between the Y's and
the t's (X-scores).
60 NIR
This plot shows how the different chemical compounds correlate

to the different parts of the NIR spectra. Plots displaying the
loadings, one component at a time, may be more informative.
Loadings: Column plot w*c1

Click on Analysis | Loadings | Column plot w*c1.. This plot shows
the importance of different parts of the NIR spectra, in the first
component, to explain the variation among the constituents of the
peat.
Excluding sample 32
Display the Score plot (t1 vs t2), mark observation 32 and click on
the red arrow to exclude this sample from the workset. SIMCA-P
excludes this sample from the workset and creates a new unfitted
model with model type PLS class(1)
NIR 61
Separate PLS models for the Sphagnum and

Carex
Autofit class models
Click on the fast button Autofit class models to fit both classes.
Sphagnum Model, class 2

Note that the model for class (2), the Sphagnum has only one
component as the second component was not significant. When we
take the second component and the third, we find it significant. We
continue taking components until not significant. The model has 7
significant components.
The present model is excellent and explains 86% of the variation

of Y, with a predictive ability (Q2) of 41%.
62 NIR

Click on Analysis | Summary | X/Y Overview | Plot to display the
cumulative R2Y and Q2Y for every response. All responses have
excellent R2 and good Q2 values.
Scores t1 vs. u1 and t1 vs. t2

These plots do not show any outliers.
Model class 1 (Carex peat)

The model with 7 components is not as good as the preceding one,
and though it explains 84.4% of the variation of Y, it only has a
predictive ability (Q2) of 12.6%.
This is mainly due to the fact that the Carex peat is not as rich in
carbohydrates as the Sphagnum peat, and the variations in the
chemical constituents are small.
Scores t1 vs. u1 and t1 vs. t2

These plots do not show any outliers.
Predictions
We now have two good models describing the relation between
NIR spectra and Chemical composition of peat and they can be
used to classify peat samples as Sphagnum or Carex.
NIR 63
In this tutorial we do not have new peat samples. However, we

will use the data set and classify every sample with respect to the
two models. We first will want to remove sample 32.
Making a prediction Set

By default the prediction set is all of the primary data set.
Cooman's Plot
Exclude sample 32 from the Prediction set (Predictions | Specify
Prediction set | Remove observation 32 from prediction set) and
display the Cooman's plot.
This plot displays the Distance to the model of every observation
with respect to model M2 and M3, and shows a very good
separation between the Sphagnum and the Carex peat samples.
Sample 21 is correctly classified as being neither a Sphagnum

sample nor a Carex peat sample.
Summary
As a tutorial, this provides just a brief introduction to the main
functionalitys and plots in SIMCA-P. We recommend that you
continue with your own data, may be another tutorial, and then
look in the Manual for details. The Help system contains the same
information as the Manual, but organized in a different way.
Plots and Lists

You can display the results of SIMCA-P in numerous graphs and
lists.
From the Analysis and Prediction menu, results of the active
model are available as quick plots and lists. With the menu
64 NIR
Plot/List, you have access, to the raw data and every computed
value from every model. You can even plot coefficient vectors
from different models against each other.
NIR 65
Hierarchical Models
Introduction
This example illustrates the use of hierarchical multivariate
modeling (PCA and PLS), using a small set of process data.
Details of the process are not revealed for proprietary reasons, but
a general outline is given below.
Data
In this process, raw materials are combined and reacted to give a
product with certain properties measured by 8 y-variables. Two of
these, y6=impurity level, and y8=yield are the most important.
The feed is described by 7 input X-variables (x1-x7), and 18
intermediate process variables from steps such as reaction (x8x15) and purification (x16-x25) are also available.
The data are collected hourly, and comprise 92 observations. The
process functioned fairly well to around obs. 79, but then went out
of control and was closed down at point 92.
Objective
To understand the relationship between the two most important y
variables (y6= impurity, and y8= yield) and the three steps of the
process, feed (x1-x7), reactor (x8-x15), and purification and work
up (x16-x25).
We shall do the following, using obs. 1-79 as a training set:
15. PLS model of X= feed (x1-x7) with y6 and y8 (Block 1)
16. PLS model of X= reactor (x8-x15) with y6 and y8
(Block2)
17. PLS model of X= purification (x16-x5) with y6 and y8
(Block 3)
18. PCA model of less important y's (y1 to y7 not including
y6) (block4)
19. Top level hierarchical model with scores of blocks 1 3 as
X and scores of block 4 plus y6 and y8 as Y.
The objective of Block Models 1 to 4 is to summarize the various
steps of the process by scores to then be used as X variables the
top level model.
Hierarchical Models 67
Analysis Outline
Create the project by importing the data set
Generate and fit the three PLS model for X-blocks 1-3
(obs.1-79), and mark them as Base hierarchical.
Generate and fit the PC model for block 4., and mark it as
base hierarchical
Generate and fit the top level hierarchical
Interpret the hierarchical model
Validate the hierarchical model with the test set (obs. 8092)
Create the project

Start a new project. The data set name proc1a.dif
Start SIMCA-P and create a new project from FILE | NEW
If you have SIMCA-P+ make sure to select Create a SIMCA-P
normal project.
Holding the CTRL key, mark Y6+ and Y8+, click on X variables
and select Y variables, to make these 2 variables Ys.
68 Hierarchical Models
Click on Command | Create Index | Variable and generate a

variable index.
Click on Next. Now you can change the project name and
destination directory. Click on Finish and the project is imported.
Summarizing the feed

Workset
Select Workset | New. In variable blocks, keep x1 to x7 as X's and
y6 and y8 as Y's, exclude all other variables. Right click on
variables name and mark variable secondary ID's to display
variable number.
In Observations, exclude observations 80 to end, for a test set.

Click on OK to exit the workset.
Analysis
Autofit the model, the model window opens and is updated as the
model fits. One component is significant. Take 3 more component
as the objective here is to summarize the X block (feed).
Double click on the model title and call the model Feed.
Double click on the Summary line of the model to display its
details:
The model explains 65% of X, hence the scores of model M1are a

good summary of the feed.
Scores t1 vs. t2
Click on the fast button
to display the scores t1 vs t2.
Observation 1 is an outlier. Double click on it with the

Contribution tool and SIMCA-P displays the Contribution plot.
Position the cursor on x6in, to shows its value in the data set (4.54)
The trend plot of that variable (double click on it) shows clearly an
abnormal value of that variable in observation 1.
We shall exclude observation 1 using the interactive marker and
the red arrow and refit the model.
Fitting the model without observation 1
Model M2 is very similar to model M1 (make sure you take 4

components) and explains 64% of the variation of the feed.
Loadings p1
The loading p1 is the vector of weights that combine the original X
variables to form the scores t1. The first dimension explains 32%
of the X, i.e., the feed. You can think of t1 as a new variable,
summarizing the feed and explaining 32% of their variation.
To display p1, click on Analysis | Loadings | Column plot and p1.
In the first dimension, all the feed variables, with the exception of
x4 and x5 are well summarized by t1.
Summarizing the reactor

Workset
Prepare a workset with the reactor variables as X (variables 16 to
23), and y6+ and y8+ as Y's. Select only observations 2 to 79.
Use the menus Workset | New as model M2, then select the
variables in Variable Blocks. The included observations will be 2
to 79.
The workset should look like this:

Analysis
Click on Autofit and take 2 extra components.
With 4 components model M2 is a good summary of the reactor

explaining 76% of the variation of X. Call the model reactor.
Scores t1 vs. t2
This plot shows no serious outliers.
Loadings p1 and p3 (the 2 most important

components)
Summarizing the purification

Workset
Prepare a workset with the purification variables as X (variables
24 to 33), and y6+ and y8+ as Y's. Select only observation 2 to 79.
Use the menus Workset | New as model M2, then select the
variables in Variable Blocks. The observations will be correct.
The workset should look like this:
Click OK to exit the worksheet and Autofit the model.

There are 4 significant components, explaining 65% of X.
The third component explains the most of X. Click on Analysis |

Loadings | Column and select p3.
Summarizing the less important Y's

Workset
In Workset | New, Variable Blocks, select as X variables, variables
8 to 12 and 14.
Select only observation 2 to 79. The workset will look like this:
Exit the workset and Autofit the PC model.

SIMCA-P will extract 0 components, as none are significant. Take
3 components.
Preparing for the hierarchical model

Right click on model M2 and check hierarchical base Model
Scores.
The scores of model M2 are added to the workset as new variables.
Do the same for models M3 to M5. All these Models are marked
B.
Workset of the top level model

In Workset | New, Variable Blocks start by selection All and
Exclude.
Select as X's, all the scores from models M2, M3 and M4.
Select as Y's, y6+ and y8+ and all the scores of model M5.
Select observation 2-79, start with New | As model M2.
(continued, upper window scrolled down)
Exit the workset.
Analysis
Autofit the model.
There are 4 significant components explaining 53% of the Y's.
The Summary | X/Y Overview, shows that the 2 most important

variables y6+ and y8+ are well explained and predicted.
The score plot (t1 vs. t2) of the top level model
This plot is colored by the values of y6+ in Model M6 , the side

product. To display the legend, use Plot Settings | Plot area from
the pop up menu. The process starts up to the right with high
values of y6+, moves down to the left with lower values, and then
is manipulated to give lower values of y6+ (upper left quadrant).
The process then becomes unstable and moves back to the right.
The w*c plot
The important y variable (y6+), the side product, is on the right of

the plot. Positively correlated to y6+ are the first component of the
feed, and the second component of the reactor and purification.
We also see that y6+ is negatively correlated to the first
component of model M5 (a summary of the less important Y's).
Y8+, the yield is negatively correlated to the first component of
the purification.
With the contribution tool, double click on any score variable
point, and the corresponding loadings opens. This plot shows us
the important original variables in that score.
For example M3 t2 was positively correlated with y6+. In the
loadings of M3 (reactor) second component, we see that variables
2, 3, and 8 are positively correlated with y6+, while variables 1, 4
and 7 are negatively correlated with y6+.
This gives us a zoom in zoom out picture. In the wc plot we

understand relationships between the 2 important y's and the
section of the process, i.e. the feed, reactor, purification. In the
loadings plot we understand which variables in the feed, reactor,
purification, dominates and its relationship to the y's.
Click on the other score variables to display their loadings.
Coefficients
Click on Analysis | Coefficients.
For y6+ the dominating variables are M2 (the feed)t1 (first

component), M3 (reactor) t2 and t4 (second and fourth component)
and
Use the contribution tool and double click on any of these
variables to open the corresponding loading plot.
For example double click on M3 (reactor) t4.
Again we see here the importance of variable 3 as being positively

correlated with y6+.
Variable Importance (VIP)
The most important variables, for all the y's, are t2 and t1 of the
reactor, and t1 and t2 of the purification. You can use the
contribution tool to display the corresponding loadings.
Observed vs. Predicted
Observation 41 is an outlier, and has a large residual.

Using the contribution tool, double click on observation 41.
t4 of purification is the culprit variable. Double click on it, to

display the contribution with the original variables. This plot
points to variable 6 in the purification as being much too low.
The time series plot of this variable (double click on it) in the data
set, shows an abnormal value for observation 41.
Predictions
Make sure Model M6 (the top level) is the active model.
To specify the prediction set click on Predictions | Specify
Predictions set | Specify and remove observation 1 (outlier).
Remember that obs. 2-79 actually comprise the training set. They
are still included below for comparison in the plots.
DModXPS
From observation 80 on the process becomes unstable with

DModXPS quickly increasing.
The contribution plot for observation 91 (large DModXPS) shows
the feed (t2, t3 and t4) as being the problem.
With the Contribution tool, double click on the feed, in all 3

components points to variable 6.
The trend plot confirms the problem with this variable
Scores tPS1 vs. tPS2 colored by test set and

training set
The model was based on normal operation up to observation 79.

The predicted scores from observation 80 on, colored red are show
clearly that the process is going out of control.
Cusum Chart
Click on Predictions | Control charts | Cusum and select subgroup
1.
A contribution plot around observation 76, shows both the feed

and purification were related to the problem.
Double clicking on both the feed and the purification shows the
culprit variables.
Conclusion
Hierarchical approach to multivariate analysis greatly enhances
our ability to understand complex problems.
The zoom in zoom out capabilities, allows us first to understand
complex relationships in terms of components of a process and
then zoom in on a single component to resolve the details in terms
of the process variables.
Spectral Filtering and

Compression, including OPLS
Introduction
This example illustrates the use of spectral filtering and wavelet
compression with multivariate calibration. The recently added
OPLS approach (Orthogonal OPLS) is also demonstrated.
The data set of this example was collected at Akzo Nobel,
rnskldsvik, in Sweden. The raw material for their cellulose
derivative process is delivered to the factory in form of cellulose
sheets. Before entering the process the cellulose sheets are
controlled by a viscosity measurement, which functions as a
steering parameter for that particular batch.
In this data set NIR spectra for 180 cellulose sheets were collected
after the sheets had been sent through a grinding process. Hence
the NIR spectra were measured on the cellulose raw material in
powder form. For calculation of a calibration model 160 samples
spectra were used. A selection of 20 spectra was used for model
validation.
Data
The data consists of:
X: 1201 wavelengths in the VIS-NIR region
Y: Viscosity of cellulose powder.
Objective
The objective of this study is to develop a good calibration model
with the 160 samples and validate this model with the test set of 20
samples.
We will use orthogonal signal correction (OSC) to improve the
calibration model, and we will compress the X matrix, with
orthogonal wavelets, for efficiency and fast computation.
The results of the model after OSC and wavelets compression will
be compared to the results of the model with the original data.
Finally OPLS will be run on the same data.
Spectral Filtering and Compression, including OPLS 91
Analysis Outline
Make a PLS model relating the NIR spectra variables to

the viscosity with the original data.
Review and validate the calibration model with the test

samples.
Apply OSC and wavelet compression to the X matrix
Make a PLS model with the OSC and wavelet

compressed data.
Review and validate this model and compare the results

to the calibration model made with the original data.
Finally, OPLS is run and the results are compared to the

previous analyses.

a) Specify which variables are process variables (X) and
which are responses (Y)
b) Exclude 20 specific samples from the training set for a

test set.
Fit the calibration model, and review the fit (Analysis

menu).
Validate the model with the test set. (Prediction menu)
Use Spectral Filter, OSC, followed by wavelet

compression (Dataset menu)
Prepare the data (Workset menu)
Fit a PLS model on the spectral filtered data (Analysis

menu)
Validate this model with the test set and compare the
results (Prediction menu)
Return to the first project (original data) and change to

OPLS (Analysis / Change model type), Fit model with
Autofit, Predict (use same prediction set), and compare.
Create the project

Start a new project with the data set Malyx.mat
Start SIMCA-P and define a new project from FILE: NEW.
The file is a Matlab file. Note that there are no variables names, no
observations numbers or names.
92 Spectral Filtering and Compression, including OPLS
The import wizard opens, make sure to select a normal SIMCA-P

project, if you are using SIMCA-P+.
Make 1.st column as Y: In the first column top cell, click on the
arrow and from the Combo select Y Variable (Viscosity).
Click on Next to open the Project specification page. You can

change, as desired, the destination folder, or the project name.
Click on finish, the data set Malyx (we named it Malyx) is
imported.
Plotting the Spectra

With the dataset open and active, right click and select Plot Xobs
to plot the spectra.
All spectra are plotted together:

MALYX.DS1 MALYX
Observation
1.20
1.10
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
Num
SIMCA-P+ 11 - 5/3/2005 4:17:38 PM
Prepare the Data

Workset
SIMCA-Ps default workset consists of all the observations in the
defined as X's or Ys as specified at import This is the starting
workset when you select Workset | New.
Workset | New
The Workset window opens with the variable names, the variable
block X or Y, scaling (default UV), and the observations numbers.
Change the scaling of the X variables to CTR

(centered only)
Click on Scale, mark variables 2 to 1202, select in Base Ctr and
click on Set. The X variables are now just centered, and not scaled.
Exclude observations for the Training set

Click on Observations and mark the following 20 observations: 45, 18-20, 30-34, 100-104, and 130-134 and click on Exclude.
All these observations are now excluded from the training set.
Click on OK and exit the workset menu.
Analysis
created with model type PLS (The default for a workset with both
X's and Y's). You are ready to fit a PLS model.
PLS model
Autofit
Click on Analysis | Autofit, or use the fast button.
The model overview plot updates as the model is fitted.

MALYX.M1 (PLS)
R2Y(cum)
Q2(cum)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
Comp[7]P
Comp[6]P
Comp[5]P
Comp[4]P
Comp[3]P
Comp[2]P
0.00
Comp[1]P
0.10
Comp No.
SIMCA-P+ 11 - 5/3/2005 4:28:42 PM
Double click in the project window on the model summary line to

display the details by component.
R2Y(cum) the fraction of the variation of Y explained by the

model after 7 components, is equal to 0.756 and Q2(cum) the
fraction of the variation of Y that can be predicted by the model
according to the cross-validation is equal to 0.686. Values of
R2Y(cum) and Q2Y(cum) close to 1.0 indicate an excellent model.
For a calibration model, model M1 is a rather poor model.
Scores t1 vs. u1
Click on Scores: t1 vs. u1 to display the t1 vs. u1 plot. The
relationship between t1 and u1 is not very good in particular for
the cluster of samples 162-165 etc.
MALYX.M1 (PLS)
t[Comp. 1]/u[Comp. 1]
0.50
y=1*x-1.063e-008
R2=0.4213
63
66
0.00
u[1]
174
172
171
170
-0.50
169
158
159
29
27
136
13
22
40
43
16
59 99 1424 12
9387 88 25
94
15
26
28
976105116
98
91
90
89111
383
115
67
106108
112
125
2
841 107
95
9646
1192
6478
48
35
47
7 114
113
120
119
45
61126
109
110
53
76
7562
121
117
118
77
51 122
57
69
21
70
83
84
123124
52
54
42
910
44
55 49
39
73
56 50
71
23
60 5868
72 17
74
145150
152
37
157
16079
149 147129
65173
15680
86
85
82
81
161
144
148
137
143151
135146
154
175
176
155
139177140
178127
180179
166167
142 128141138
165
-1.00
168
162 164 163
-1.50
-0.80
-0.70
-0.60
-0.50
153
-0.40
136
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
0.50
t[1]
R2X[1] = 0.280355
SIMCA-P+ 11 - 5/3/2005 4:35:30 PM
Plotting the Spectra of selected observations

Press on the CTRL key and mark the cluster of observations
around observations 168 down left, and 27 high right, then right
click and select Plot Xobs
to plot the Spectra in original units.
Zooming on the plot , one clearly sees a separation between the

two groups of spectra.
Loadings
Click on Loadings | Line plot
Remove the series w*c1, select under Items w* and all

components (*) and click on Add Series, then on OK.
MALYX.M1 (PLS)
w*
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-0.10
-0.20
-0.30
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
Num
R2X[1] = 0.280355 R2X[2] = 0.66347
R2X[3] = 0.0369157 R2X[4] = 0.00567819
R2X[5] = 0.00847137 R2X[6] = 0.0010421 R2X[7] = 0.0013327 SIMCA-P+ 11 - 5/3/2005 4:43:42 PM
Components 1 to 3 capture almost 60% of the variation of Y. The

other components are small correction. To display the first three
loadings, open the properties page, mark series 4 to 7 and click on
Remove and Apply.
MALYX.M1 (PLS)
w*
w*[1]
w*[2]
w*[3]
0.060
0.040
0.020
0.000
-0.020
-0.040
-0.060
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
Num
R2X[1] = 0.280355 R2X[2] = 0.66347
R2X[3] = 0.0369157
SIMCA-P+ 11 - 5/3/2005 4:44:47 PM
The regions around 200, 400, 700 -- 800, and 900 capture most of
the information.
Distance to the Model (DmodX)

MALYX.M1 (PLS)
DModX[Last comp.](Normalized)
2.50
DModX[7](Norm)
2.00
1.50
1.00
D-Crit(0.05)
0.50
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
Num
M1-D-Crit[7] = 1.163
1 - R2X(cum)[7] = 0.002735
SIMCA-P+ 11 - 5/3/2005 4:47:27 PM
Several samples have a distance to the model larger than the

critical distance, indicating data inhomogeneity.

The predictions are poor particularly for a cluster of samples as in
the t1 vs. u1 plot. They can be labeled by marking them and then
clicking on the selected item fast button, and selecting labels as
primary obs label.
MALYX.M1 (PLS)
YPred[Last comp.](Var_1)/YVar(Var_1)
1800
y=1*x-7.256e-006
R2=0.7563
1600
YVar(Var_1)
1400
1200
1000
800
600
136
168
500
600
700
800
153164
163
162
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
YPred[7](Var_1)
RMSEE = 139.821
SIMCA-P+ 11 - 5/3/2005 4:49:31 PM
A zoom in on these points shows their labels.
Validating the Model 1

Click on Predictions | Specify Prediction set | Complement Workset.
The prediction list opens. Click on Predictions | Y Predicted |
Scatter plot.
MALYX.M1 (PLS), PS-Complement Model 1

YPredPS[Last comp.](Var_1)/YVarPS(Var_1)
1800
4
101 102
20 104
100
103 30
33
34
315
18
19
1700
YVarPS(Var_1)
1600
1500
32
132
131
133
134
1400
1300
1200
1100
130
1000
1000
1100
1200
1300
1400
1500
1600
1700
1800
YPredPS[7](Var_1)
RMSEP = 110.255
SIMCA-P+ 11 - 5/3/2005 4:57:43 PM
The predictions are reasonable with an RMSEP of 110 compared

to the training set RMSEE of 140.
Orthogonal Signal Correction and Wavelets Compression

This rather poor model may indicate systematic variation in the X
block that is not related to the response Y. This is corroborated by
the similarities between w*2 and w*3 (see loading plot above).
We will apply Orthogonal Signal Correction (OSC) to the X block
(the NIR data) to remove the systematic variation in X not related
to Y and then for speed and efficiency we will wavelets compress
the X block.
Click on Dataset | Spectral Filters | Combination | OSCWavelet
The first variable is marked as Y. Exclude the test set of

observations: 4-5, 18-20, 30-34, 100-104,130-134, and click on
Next.
SIMCA starts OSC and extracts one component; click on Next to

extract the second component as two components are usually
recommended.
The angle of both components was 90 degrees indicating
orthogonality and the remaining Sum of Squares after the second
component is 13%
Hence, 87% of the variation in X was not related to Y and was
removed from X.
Click on Next to perform the wavelet compression.
The wavelet window opens. Select Daubachies 10 wavelet. Select

Variance as compression method and DWT (Discrete Wavelet
Transform) as NIR signals are smooth and DWT is recommended
for low frequency signals. Click on Next.
The wavelet transform is performed, and SIMCA displays a plot of
the percentage of variance explained by the largest coefficients.
We shall select to keep 50 (enter 50) in the box, these 50

coefficients explain 99.93% of the variation of X matrix and click
on Next.
SIMCA-P creates a new project with the OSC and wavelet

compressed data. You can change the default name of the project,
and select a different destination directory.
The test set (excluded observations) are automatically signal
corrected and wavelet compressed, in the4 same way as the
training set, and made into a prediction set. You can change the
default name of the prediction set and click on Finish.
You are switched to the new project.
Model with the Signal corrected and compressed data

Summary of the preprocessed project
Click on Dataset | Filter Summary to display a summary of the
preprocessing done on the project.
Change the default Scaling

Click on Workset | Edit model 1, select the Scale tab and mark all
the X variables 1 to 50 and change the scaling to Ctr, then exit the
workset window.
Fit the PLS Model
Click on Analysis | Autofit
The first component explains very little of the variation, the second
component is highly significant. Together the two components
explain 94% of Y, cross-validated to 93%. This is an excellent
model.
Scores t2 vs. u2
Display the t2 vs. u2 (t1 vs. u1, explained only 11 %). This is now
a good relationship.
Loading plot w*2

MALYX_OSCW.M1 (PLS)
w*[Comp. 2]
0.100
0.080
0.060
0.040
w*[2]
0.020
0.000
-0.020
-0.040
-0.060
-0.080
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
Num
R2X[2] = 0.0786525
SIMCA-P+ 11 - 5/3/2005 6:22:04 PM
This plot is reconstructed, by default, from the wavelet domain to

the original domain. It shows again that the information in the
spectra is located around 400, 700 -- 800, and 900 wavelength, as
in the model with the original data.
The observed vs. Predicted plot is greatly improved from the

previous model based on the unfiltered data.
Validating the Model 2

Click on Predictions | Specify Prediction set | Data set and select
your prediction set. Click on Predictions | Y Predicted | Scatter
The predictions for the test set have greatly improved with the
OSC treated data. The RMSEP is now 87.
Conclusion OSC-Wavelets
This example illustrates how Orthogonal signal correction (OSC)
sometimes greatly improves the calibration model when the signal
contains large systematic variation not related to Y, such as
baseline shifts etc. Wavelet compression efficiently compresses
the signal form 1201 observations to 50 with very little loss of
information.
OPLS (Orthogonal PLS)

Return to original project (Malyx), click on Analysis / Change
Model Type, and select OPLS
Autofit gives 8 components compared with 7 for PLS:

MALYX.M2 (OPLS)
R2Y(cum)
Q2(cum)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
Comp[8]O
Comp[7]O
Comp[6]O
Comp[5]O
Comp[4]O
Comp[3]O
Comp[2]O
0.00
Comp[1]P
0.10
Comp No.
SIMCA-P+ 11 - 5/5/2005 9:40:31 AM
Return to the original PLS model and add one component

(Analysis / next component, or corresponding fast button). Note
that the PLS and OPLS models now have the same R2Y and R2X,
but the OPLS model shows a higher Q2(cum).
The model window (below) now shows the first line as the Yrelated single component, and the following are the Y-orthogonal
ones. The bottom line shows a summary of the model after all
components.
Scores u1 vs. t1
The t/u plot now looks much better than the PLS one, because
OPLS has rotated the solution to put all Y-related variation into the
first component.
Loadings, w1
The first components PLS-weights now look like the spectrum of
the ingredient related to y. This is one of the greatest advantages
with OPLS, it makes the loading interpretable!
Predictions
The prediction set remains the same unless you have changed
something in between. In such case, restore the prediction set to
what it was, and continue.
Under Predictions/ Ypred / scatter plot, the following is obtained:
The RMSEP (prediction SD) is now 111.3, precisely the same as

the original PLS model. This shows that usually OPLS.
Conclusions
OSC, Wavelets, and OPLS are tools that have some additional
features beyond ordinary PLS making these tools useful. OPLS
makes the PLS model easier to interpret only one component,
and an interpretable loading plot. Wavelets compress the spectra
with little loss of information, and, sometimes, especially in
combination with OSC (OSC-Wavelets) even improves the
predictions somewhat.
Batch Modelling with SIMCA-P+
Introduction
The following example is taken from the article:
J.MacGregor and P.Nomikos, Multivariate SPC Charts for
Monitoring Batch Processes, Technometrics Vol. 37 No. 1 (1995)
41-57
The duration of a batch was 2 hours. During this period, 10
variables were measured every 1.2 minutes, for a total of 100
measurements. A quality variable was measured at the completion
of every batch.
Data were collected on 55 batches.
Batches 40 to 42 and 50 to 55 had their quality variable outside the
specification limits. The quality variable of batches 38, 45, 46 and
49 was on the boundary.
Data
Variables
The following 10 variables were measured at equally spaced
intervals during the evolution of a batch.
x1 to x3: Temperature inside the reactor
x6 and x7: Temperature inside the heating- cooling medium
x4,x8 and x9: Pressures variables
x5 and x10: Flow rates of material added to the reactor.
Objectives
20. Develop a model of the evolution of good batches (the
observation level model), and use it to monitor new
batches as they are evolving, in order to detect problems
as early as possible.
21. Make a model of the whole batch based on the scores of
the observation level model, and use this model to
classify the new batches as good or bad ones.
Batch Modelling with SIMCA-P+ 111
Analysis Outline
We will use 18 good batches (1800 observations) to model the
evolution of good batches. This is done by fitting a PLS model
relating Y, the relative batch time, to the 10 measured variables.
This observation level model is used to monitor the evolution of
the new batches, batch 30 to 33 (good batches) and 49 to 55(bad
batches).
We will make a PCA model of the whole batch, with the unfolded
scores of the observation level as X-variables.
The steps in SIMCA-P are:

Create the observation level project, import the primary data
set with the 18 good batches
Fit the observation level model, a PLS with Y, the relative
batch time, and X, the 10 measured variables (Analysis menu).
Display the control charts of the training set . (Analysis
|Batch|Contol Charts menu)
Import the secondary data set with the new batches
Monitor the evolution of the new batches (Prediction|Batch|

Control Charts menu) and use contribution plots to interpret
the seen problems.
Create the whole Batch project and fit a PCA model to the data
Classify the new batches as good or bad using the distance to
the model (DmodX) and use contribution plots to interpret the
results
Create the observation level project

Start a new project. The data set name now is NOM18a.xls
Start SIMCA-P and create a new project from FILE | NEW.

Select the radio button SIMCA-P Batch project and click on Next.
112 Batch Modelling with SIMCA-P+
The second column labelled observation names contains the batch

identifiers.
Both the Batch identifiers and the phase identifiers (when present)
can be located in any variable (column) in the spreadsheet.
Mark this second column and from the combo box (top of column)
select batch identifiers.
In this example you do not need to define phase identifiers, as the
batch process has only one phase.
The following window opens:
click OK and Next.
The Batch page displays the list of batches in the dataset with the
number of observations in each batch.
The Conditional delete allows you to delete batches with fewer
observations than a selected number.
In this example we do not use the Conditional Delete.

Click on Next to display the project specification page and then
click on Finish.
The following message is displayed:
Click on OK.
Analysis
The workset M1 has been prepared with all the 10 measured
variables specified as Xs and the auto generated variable $Time
(relative batch time normalized) specified as Y, and all variables
scaled and centered to unit variance (UV). You are ready to fit the
PLS Batch model.
Click on Autofit.
SIMCA-P takes only 2 components as they explain 85% of X and

the third component explains less than 7%.
The Model window summarises the fit of the model per
component. We have an excellent model with 2 components,
explaining 87% of X and 98% of Y.
Scores Line plot of t1

Click on Scores|Line Plot| t1 to display the first summary variable
t1, summarizing all the 10 variables.
All the 18 batches are within the 2 standard deviation limit.
Loadings p1
Click on Analysis |Loading s| Column | p 1.
With batches we are interested in summarizing the X variables and

the loadings p1 are the weights that indicate the importance of the
original Xs for t1.
We can see here that all the variables participate in forming t1 with
the first 3 variables having positive weights while the others have
negative weights.
Batch Control charts (Training set)

Analysis |Batch |Control Charts | Scores
The Batch Control charts show how t1 and t2 vary with time, for
good batches. A new good batch should evolve in the same way
and its trace should be inside the control limits.
Use the side arrows to move the stack of displayed batches

forward or backward by one batch. You can also use the property
bar.
Properties page
Use the up arrow to display the control chart of t2.
To display the Control chart in Normalized units, from the Limits

and Averages tab, select Remove the average and normalize the
values
and click on Apply.
The plot is displayed in normalized units.
Batch Control Charts DModX, Variables, Hotelling

T2 and Observed vs. predicted
The plots of the distance to the model (DmodX), Hotelling T2, and
Observed vs. Predicted time, with their control limits, are also
important monitoring charts for new batches.
Display univariate Batch Control charts when needed.
Monitoring new batches

Import the secondary data set with the new
batches
Use the menu File | Import Secondary dataset, and import the file
Alpred.xls as a secondary data set.
Mark the 2nd column as the Batch IDs.
Creating a prediction set with the new batches
Click on Predictions | Specify Predictionset | Dataset | Alpred to

select the alpred prediction set.
Control Charts for new batches

Predictions | Batch Control Charts | Scores
Click on Predictions |Batch |Control charts | Scores to display the

new batches in the control charts with the control limits derived
from the training set. Use the Properties page to include batches 50
to 55.
Use the Component tab to display the Control chart of t2.
In both of these control charts, batches 50 to 55 are out of the

control limits in the first time period (0 - 15). Batches 50 - 55 are
also out of the control limits in t1, for the last time period (90 to
100) of the polymerisation process.
Contribution plot
Using the Contribution tool, double click in the t1 control chart on
one of the outlying batches, 50 for example, at time point 4.
The Contribution plot clearly displays variable V-4 (pressure) as

being lower than average trace.
Control Chart of batch 49 and Contribution plot
Batch 49 is slightly out of the control limits around time period 5560.
The Contribution plot around time point 59 shows variable V-10
slightly lower than average good batches.
Prediction | Batch Control Charts | DModX
Batches 50 to 55 are clearly out of the control limit for the time
period 0-20.
Contribution plot
The Contribution plot for any of these batches in that time period
shows again variable V4 (pressure) as being lower than in good
batches.
The Control chart of variable 4 (pressure), double click on it,

clearly shows the problem with the pressure for these 5 batches.
Creating and Modelling the batch level project

Select the menu File | Batch |Create batch level project, mark
scores, and the check box Bring secondary dataset to the batch
level.
In the batch level project, each row has the data from one batch
and consists of the unfolded scores, from the observation level
model, which describe the evolution of each batch.
This example has no initial conditions.
Analysis: Autofit
Click on Analysis | Autofit to fit a PC model. Simca extracts 4
components.
Analysis: Scores
Click on Analysis | Scores | t1 vs t2
The 18 good batches span the space with no outliers.
Analysis |Batch Control Charts | Batch

Variable Importance
This plot, by combining the importance of the scores in the batch

level model, with the weights w* derived from the observation
level model, displays the overall importance of the measured
variables in the whole batch model. Here we see that all the 10
variables are important (this is to be expected as the 10 measured
variables are highly correlated).
Predicting the quality of the new batches

In the menu Predictions | Specify Predictionset | Dataset select the
data set alpred as a prediction set.
It contains the data for batches 1, 30-33 and 49 to 55, one
observation per batch, and the predicted scores of the observation
level as xs.
Predictions: T Predicted
We clearly see that batches 50 to 55 (with the exception of 52) are

outside the Hotelling T2 ellipse and are outliers in the second
dimension.
Predictions: Contribution Scores for batch 51

Using the Contribution tool double click on batch 51.
Double click on the t2-M1:4 and the score variable is resolved

with respect to original variables and displays variable 4 (pressure)
as the problem variable.
Predictions: Distance to the Model (DmodX)
Batches 50 to 55 have their distance to the model way above the

control limit, and batch 49 is also above the control limit. Clearly
these batches are different than the good ones.
Prediction: Contribution | Distance to the model

Using the Contribution tool double click on batch 50
.
Double click on the score t2-M1:3 and the score variable is
resolved with respect to original variables and displays variable 4
(pressure) as the problem variable.
Conclusion
Modelling the evolution of a representative set of good batches
allowed us to construct control charts to monitor new batches
during their evolution. We detected problems in the evolution of
the bad batches and understood why these batches were outside the
control limits.
The model of the whole batch has allowed us to classify the new
batches as good or bad and understand why these batches had an
inferior quality.
Modelling of a Batch Digester
Introduction
The following example is derived from a batch digester.
Batch digesters are used in the pulp and paper industry to produce
pulp from wood chips.
The batch process has 5 phases: chip, acid, cook, blowback and
blow.
In the chip phase, the wood chips are fed into the digester and
steamed.
In the acid phase, the chips are impregnated with an acid.
They are then cooked at high temperature and pressure during the
cook phase. This is the most important phase, as this is where the
de-lignifications happen.
In the blowback phase, the pressure is released and thereby
brought back to atmospheric pressure. The temperatures also drop.
Finally, in the blow phase, the pulp is blown out of the digester.
The duration of a batch varies between 8 and 10 hours, and on the
average, is around 9.4 hours in the present data set.
27 variables (including the sampling time) were measured every 2
minutes during the batch evolution. Different variables are
meaningful in the different phases.
Data were collected on 52 batches. Of these, thirty good batches
are used to build the training set model.
Data
Variables
The following variables are meaningful in the following phases:
Chip and Acid phase:
State of the acid (2 variables)
State of the vent (2 variables)
State of Steam1 (2 variables)
State of Steam2 (2 variables)
Temperature4
Pressure2
Modelling of a Batch Digester 129
Cook phase:
Pressure1
Steam
Temperature1
Temperature2
Temperature3
Temperature4
Temperature5
Pressure2
Temperature6
Pump
Blowback phase:
Pressure1
Temperature2
Temperature3
Temperature4
Temperature5
Relief valve
Blow1
Blow2
Pressure3
Pressure4
State of Dilution (2 variables)
Dilution flow
Objectives
22. To develop a model of the evolution of good batches (the
observation level model), and use the model to monitor
other batches as they are evolving, in order to detect
problems as early as possible.
23. Make a model of the whole batch based on the scores of
the observation level model, and use this model to
classify other batches as good or bad.
Analysis Outline
We will use 30 good batches to develop the model of the evolution
of good batches.
In the analysis, we will combine the chip and acid phase (they are
not meaningful alone) and delete the blow phase which has no
effect on the quality of the pulp.
130 Modelling of a Batch Digester
We will fit 3 different PLS models relating Y, the relative batch

time, to the measured variables in the 3 relevant phases (chp+acid,
cook and Blowback).
These observation level models are used to monitor the evolution
of the other batches, in this example those left out of the training
set.
We will make a PCA model of the good batches at the batch level,
with the unfolded scores of the observation level as X-variables.
The steps in SIMCA-P are:

Create the observation level project, import the primary data
set with the 52 batches, merge phases chip and acid and delete
the blow phase.
In menu workset, select 30 specified good batches and select
the variables relevant in each phase.
Fit the observation level models, one for each phase, by PLS
with Y= relative batch time, and X = the relevant variables in
each phase. (Analysis menu).
Interpret the scores of the cook phase, and display the control
charts of the training set. (Analysis | Batch | Control Charts
menu)
Select the complement of workset (training set) and save it as a

secondary data set (Menu Prediction/Prediction Set)
Monitor the evolution of the batches left out of the training set
(Prediction | Batch | Control Charts menu) and use
contribution plots to interpret the problems with some of the
batches.
Create the whole Batch project and fit a PCA model to the
data. (Menu File/Create Batch Level Project)
Classify the prediction set batches as good or bad using the
distance to the model (DModX), and use contribution plots to
interpret the results (Menu Prediction).
Create the observation level project

Start a new project (Menu File/New). The present data set is
DIGESTER. DIF
Start SIMCA-P and create a new project from FILE | NEW.

Select the radio button SIMCA-P Batch project and click on Next.
The second column labelled observation names contains the batch

and phase identifiers.
Both the Batch identifiers and the phase identifiers (when present)
can be located in the same variable (column) or in separate
variables in the spreadsheet.
Mark this second column and from the combo box (top of
column) select Batch/Phase identifiers.
The following window opens:
The batch identifiers are sequential numbers from 01 to 52 and the

phases are chip, acid, cook, blbk, and blow, click OK .
The batch and phase ID are now in 2 separate variables.
Mark the last column with the sampling time variable and from the
drop down menu, select Y Variable (Time or Maturity) and click
on Next.
The Phase page displays the list of phases in the dataset with the
number of observations and batches in each. Under every phase is
the list of variables.
Using the CTRL key mark both the chip and the acid phases and
click on Merge. Mark the Blow phase and click on Delete.
We now have 3 phases left: chip+acid, cook and blbk. Click on

Next.
The Batch page opens listing all the batches with their numbers of
observations. Listed under every batch are the phases included in
the batch. In our example all the batches include all the phases.
The Conditional delete allows you to delete batches or phases or a

selected phase with fewer observations than a selected number.
In this example we do not use the Conditional Delete.

Click on Next to display the project specification page and then
click on Finish.
The following message is displayed and new variable $Time is

created.
Specify the Workset

MB1 is an umbrella model which has been prepared with 3
unfitted models, one for every phase, and all the measured
variables specified as Xs and the relative sampling time as Y. All
variables are scaled and cantered to unit variance (UV).
We need to edit MB1 to include only the relevant variables in each
phase, and select the 30 good batches.
Click on Workset | Edit MB1 and select the Variables Tab.

Select all the first 6 variables and click on the Configure Phases
button
And assign them to the first phase.

Continue and assign the variables to the respective phases as
specified in the Variables section. The Variables page should be as
follows:
Note the Y variable, sampling time, will automatically be shifted,

to start at 0 for every phase and Normalized for better alignment.
Normalizing the sampling time achieves linear time warping.
Click on the batch page to select the 30 good batches: 1, 4, 6 to
13, 16. 18, 21, 23, 25, 29, 31, 32, 34, 36 to 38, 40, 42, 43, 46 to 49
and 51.
To do this, first press Select All and Exclude. This excludes all
batches. Then use the CTRL key, mark the 30 good batches and
click on Include.
Analysis
Fitting All the Class models
Click on Analysis | Autofit All Class Models, the Specify Autofit
Window opens, click on OK.
The 3 class models are fitted and they all explain more than 80%
of X.
We will examine the cook phase at is the most important.
Scores Line plot of t1, t2 and t3

Double click on the cook model to examine its components.
The first three components are the most important, explaining

together 68% of the variation of X; t1 explains 47%, t2 13% and t3
7%
Click on Scores | Line Plot | t1 to display the first summary
variable t1, summarizing all the variables of the Cook phase.
The 30 batches are all within the 3 sigma limit of t1.

Select t2 from the component combo box in the properties bar
The 30 batches are within the 3 sigma limit of t2.
Select t3.
The score t3 displays more variability, as all of the batches have

some time points above the 3 sigma limits.
Loadings p1, p2 and p3

With batches we are interested in summarizing the X variables,
and the loadings p1, p2 and p3 are the weights that combine the
original X variables to form t1, t2 and t3.
To interpret the first three scores t1, t2 and t3 (new variables
summarizing all the X variables) we look at the loadings p1, p2
and p3.
Click on Analysis | Loadings | Column plot | p1, and then p2 and
p3.
We can see that t1 consists mainly of the first 5 temperatures and

pressure 1.
The second score t2 is primarily pressure1, the steam and
temperature1
The score t3 is again dominated by the pressures (1 and 2) with

steam, temp1 and temp6.
Batch Control charts (Training set)

Analysis |Batch |Control Charts | Scores
All the batches in the training set are aligned to the same length
with the same time points. Hence we can now, at each time point,
compute the average t1 with its standard deviation.
The Batch Control chart of t1 shows how this summary (the
temperature trace) varies during the evolution of the cook phase.
The green line is the average t1 computed from all good batches.
The red limits are the 3 sigma limits computed from the variation
of t1 around its average of all good batches.
This green line represents the finger print of the ideal good batch.
All new good batches should evolve in the same way and should
be inside the red control limits.
Individual batches can be included in this control chart the first

training batch is included as default. More can be included in the
stack of displayed batches by using the properties menu (after right
click).
Use the side arrows to move the stack of displayed batches
forward or backward by one batch. You can also use the properties
bar to select the batches to display.
Properties page
Right click on the plot and from the pop-up menu open the
Properties page.
Mark all the batches you want to display and move them into the
selected window.
In this case the traces of all the good batches are within the red
control limits.
To display the Control chart in Normalized units, from the Limits

and Averages tab (under Properties), select Remove the average
and Normalize the values, and click on Apply.
The plot is now displayed in normalized units.
In the component tab, select the 2nd component from the combo
box to display the Control Chart of t2.
Note that this plot is not in Normalized units.
Batch Control Charts DModX, Hotelling T2 and

Observed vs. predicted
The plots of the distance to the model (DModX), Hotelling T2, and
Observed vs. Predicted time, with their control limits, are also
important monitoring charts for new batches.
Display univariate Batch Control charts when needed by selecting
the Variable Plot.
Monitoring new batches

Creating the Prediction set: Complement of
Workset
Use the menu Prediction | Specify prediction set | Specify
.
Remove all batches from the prediction set (the right window),
select all batches from the left window (the Complement batches
of the Training set) move them to the right window and press OK.
From the Prediction menu, save them as a Secondary data set, give
it the name Pred1 and click OK
Batch Control Chart of the Prediction set

For the cook phase, select Prediction | Batch Control Chart | Scores
and use the Properties page to include all the batches.
In the Control chart of t1 with the average and 3 sigma computed

from the good batches, we can see batch 28 far outside the control
limits.
OOC plot
Right click on the control chart and

This plot displays for every batch the percent of the area outside
the limits relative to the total area inside the limits of the control
chart.
Hence batch 28 has 40% of its area outside the control. Area
Group Contribution plot
Display batch 28, mark the time points outside the 3 sigma and
click on the action plot.
The Contribution plot shows pressure1 being 6 standard deviations

lower than the average batch for these time points, and
temperature2 to temperature5 as also being lower than the average
at these time points
Variable control chart

Double click on Pressure1 to display the control chart of that
variable.
Prediction | Batch Control Charts | DModX
Batch 28 is clearly out of the control limit for the time period 1 to
2 hrs.
Contribution plot
The Contribution plot for batch 28 in that time period shows that
the problem is also associated with pressure 2 and temp6
(correlated with pressure 2)
Double click on pressure2 to display the control chart.
Creating and Modelling the batch level project

Select the menu File | Batch |Create batch level project, mark
scores, and the check box Bring secondary dataset and select the
prediction set Pred1. Click on next, select the batch level name and
click OK.
In the batch level project, each row has the data from one batch
and consists of the unfolded score vectors from the observation
level models, which describe the evolution of each batch.
This example has no initial conditions.
Analysis: Autofit
Click on Analysis | Autofit to fit a PC model. Simca extracts 5
components.
Analysis: Scores
Click on Analysis | Scores | t1 vs t2
Batch 6 is slightly out of the Hotelling T2 confidence interval.

Using the Contribution Tool, clicking on batch 6 gives the
contribution plot.
The Contribution plot is coloured by phases, and shows that t1 in

the cook phase, at time 5.2 hours is lower than the average by 6.5
standard deviations. With the Contribution tool double click on
this bar to resolve this contribution into the original variables,
The temperature2, around time 5.2 hours is lower than the average
of the good batches at the same time point.
Displaying the Control chart of temperature2, by double clicking
on it, we can see that temperature2 at time 5.36 hours is equal to
114.9 degree and is slightly below the control limit. Temperature2
is equal to 141degrees for the average of the good batches at this
time point.
Analysis | Batch Variable Importance

Considering that the different phases have different variables, one
must display the Batch Variable Importance separately for each
phase.
Select the Cook phase, as it is the most important.
This plot, by combining the importance of the scores in the batch

level model, with the weights p derived from the observation level
model, displays the overall importance of the measured variables
for the whole batch model in the cook phase. Here we see that the
temperatures, pressure1 and the steam dominate.
Predicting the quality of the prediction set batches

In the menu Predictions | Specify, select both the training set and
the prediction set batches in Pred1.
Predictions: T Predicted
Select t1 vs. t3. Batches 28 and 26 are outside the Hotelling T2

ellipse.
Predictions: Contribution Scores for batch 28.

Use the Contribution tool double click on batch 28.
What is causing batch 28 to be an outlier? The problem clearly is

the cook phase. Double click on one of the scores with large
deviations, for example t1 at time 1.1 hours, to resolve the
contribution into original variables.
The resolved contribution plot shows pressure1 as being much

lower than average.
The Control chart of pressure1 confirms this and shows the
problem with batch 28.
Predictions: Distance to the Model (DModX)
Batches 28, 26, 33, 50 and 52 have the largest DModXPS.
Contribution Plot
Double click with the contribution tool on batch 33 to display the
contribution plot
The problem seems to be in t2 of the cook phase around time 0.4

hours (the beginning of that phase) and also in the chip+acid
phase.
Double clicking on a large score in the chip and acid phase we see
that the problem was with the steam state.
The resolved contribution for the large t2 in the cook phase shows
both the pressure1 and the steam lower than average, probably due
to the problem with the steam state.
The Control charts for batch 33 of both pressure1 and steam

confirms this.
Conclusion
Modelling the evolution of a representative set of good batches
allowed us to construct control charts to monitor new batches
during their evolution. We detected problems in the evolution of
the bad batches and understood why these batches were outside the
control limits.
The model of the whole batch has allowed us to classify the new
batches as good or bad and understand why these batches had an
inferior quality.

Simca-P 11 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simca-P 11 PDF

Uploaded by

Copyright:

Available Formats

Tutorial

Regular Project (non-Batch) ......................................................................................................1

Mineral sorting at LKAB

Prepare the data ....................................................................................................................... 31

Tutorial SIMCA-P, SIMCA-P+

Summarizing the purification...................................................................................................74

Spectral Filtering and Compression, including OPLS

Batch Modelling with SIMCA-P+

Prediction | Batch Control Charts | DModX............................................................ 123

Modelling of a Batch Digester

Introduction ........................................................................................................................... 129

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA

Regular Project (non-Batch)

The Analysis cycle

Tutorial SIMCA-P, SIMCA-P+

Pre-processing and selection of data: (Dataset and

The Dataset menu allows you to trim / winsorize your

How to get started with SIMCA 1

The Workset menu allows you to modify the starting

Specifying and fitting the model (Analysis menu).

Reviewing the results and performing diagnostics

Using the model for predictions (Predictions menu).

Import the primary data, create a new project

2 How to get started with SIMCA

Tutorial SIMCA-P, SIMCA-P+

Pre-processing the data (Dataset menu)

Prepare the data (Workset menu)

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA 3

Modify the Workset as follows

Develop the model (Analysis menu)

4 How to get started with SIMCA

Tutorial SIMCA-P, SIMCA-P+

Fit the model

Autofit Class Models

Specify Hierarchical Models

Review the fit (Analysis menu)

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA 5

X/Y overview: Cumulative Fit of all variables (Y only in

X/Y/Comp: The Fit of a Variable by Component.

Component Contribution: The contribution of a model

Scores:t1 vs. t2, t1 vs. u1, etc.

Loadings: p1 vs. p2, w*c1vs. W*c2, etc.

VIP (PLS) Variable influence on projection

DMod (X or Y) Distance to the model (X or Y )

10. Observed vs. predicted (PLS)

Select a New Model Type

Predictions (Predictions menu)

Displaying the predictions

6 How to get started with SIMCA

Tutorial SIMCA-P, SIMCA-P+

Road map to SIMCA-P

2. Look at the data

3. Prepare a work copy

4. Fit the model

Batch Projects (SIMCA-P+ 10)

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA 7

Observation level project

Batch Level Project

The Analysis cycle

Loadings: p1 vs. p2, wc1vs. Wc2, etc.