You are on page 1of 159

Tutorial

SIMCA-P, SIMCA-P+
Version 11.0

By Umetrics AB

1992-2005 Umetrics AB
Information in this document is subject to change without notice
and does not represent a commitment on the part of Umetrics AB.
The software, which includes information contained in any
databases, described in this document is furnished under a license
agreement or non-disclosure agreement and may be used or copied
only in accordance with the terms of the agreement. It is against
the law to copy the software except as specifically allowed in the
license or nondisclosure agreement. No part of this manual may be
reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying and recording, for any
purpose, without the express written permission of Umetrics AB.
SIMCA is a registered trademark of Umetrics; Windows is a
trademark of Microsoft Corporation.
Covers products:
SIMCA-P
SIMCA-P+
Manual edition date: May 16, 2005

UMETRICS AB
Box 7960
S-907 19 Ume
Sweden
Tel. +46 (0)90 184800
Fax. +46 (0)90 184899
Email: info@umetrics.com
Home page: www.umetrics.com

Contents
How to get started with SIMCA

Regular Project (non-Batch) ......................................................................................................1


General.........................................................................................................................1
The Analysis cycle.......................................................................................................1
Import the primary data, create a new project .............................................................2
View ............................................................................................................................3
Pre-processing the data (Dataset menu).......................................................................3
Prepare the data (Workset menu).................................................................................3
Develop the model (Analysis menu)............................................................................4
Fit the model................................................................................................................5
Review the fit (Analysis menu) ...................................................................................5
Predictions (Predictions menu) ....................................................................................6
Plots/Lists ....................................................................................................................7
Road map to SIMCA-P ..............................................................................................................7
Batch Projects (SIMCA-P+ 10) .................................................................................................7
General.........................................................................................................................7
The Analysis cycle.......................................................................................................8

Introduction

11

General.....................................................................................................................................11
Plots and Lists............................................................................................................11

Foods

13
Data..........................................................................................................................................13
Data table...................................................................................................................13
Objective ..................................................................................................................................14
Analysis Outline ........................................................................................................14
Define project...........................................................................................................................14
Workset Wizard .......................................................................................................................17
Analysis ...................................................................................................................................19
Scores and Loadings ..................................................................................................20
Third Component.......................................................................................................22
Summary....................................................................................................................22

Mineral sorting at LKAB

25

Introduction..............................................................................................................................25
Data description .........................................................................................................26
Data table...................................................................................................................26
Objective ..................................................................................................................................28
Analysis outline .........................................................................................................28
Create the project .....................................................................................................................29
Tutorial SIMCA-P, SIMCA-P+

Contents i

Prepare the data ....................................................................................................................... 31


Workset Wizard ........................................................................................................ 31
Analysis................................................................................................................................... 33
PC of Y...................................................................................................................... 33
Scores and Loadings ................................................................................................. 34
PLS MODELING...................................................................................................... 37
Refining the model.................................................................................................... 40
Excluding observation 208 using the interactive tool box ........................................ 40
Removing some observations for a test set ............................................................... 41
Observation Risk....................................................................................................... 46
Predictions................................................................................................................. 47
Summary ................................................................................................................... 48

NIR

49
Introduction ............................................................................................................................. 49
Data ......................................................................................................................................... 49
Variables ................................................................................................................... 49
Observations.............................................................................................................. 50
Objective ................................................................................................................................. 51
Analysis Outline ...................................................................................................................... 51
The steps to follow in SIMCA-P are:........................................................................ 51
Create the project..................................................................................................................... 52
Prepare the data ....................................................................................................................... 53
Default Workset ........................................................................................................ 53
Transform the variables............................................................................................. 53
Analysis................................................................................................................................... 55
PLS model of all the samples.................................................................................... 55
Excluding sample 32 ................................................................................................. 61
Separate PLS models for the Sphagnum and Carex .................................................. 62
Sphagnum Model, class 2.......................................................................................... 62
Model class 1 (Carex peat)........................................................................................ 63
Predictions ............................................................................................................................... 63
Making a prediction Set ............................................................................................ 64
Cooman's Plot ........................................................................................................... 64
Summary ................................................................................................................... 64
Plots and Lists ........................................................................................................... 64

Hierarchical Models

67

Introduction ............................................................................................................................. 67
Data ......................................................................................................................................... 67
Objective ................................................................................................................................. 67
Analysis Outline ...................................................................................................................... 68
The steps to follow in SIMCA-P are:........................................................................ 68
Create the project..................................................................................................................... 68
Summarizing the feed.............................................................................................................. 69
Workset ..................................................................................................................... 69
Analysis..................................................................................................................... 70
Summarizing the reactor.......................................................................................................... 72
Workset ..................................................................................................................... 72
Analysis..................................................................................................................... 73
Scores t1 vs. t2 .......................................................................................................... 73
Loadings p1 and p3 (the 2 most important components) .......................................... 74
ii Contents

Tutorial SIMCA-P, SIMCA-P+

Summarizing the purification...................................................................................................74


Workset......................................................................................................................74
Summarizing the less important Y's.........................................................................................76
Workset......................................................................................................................76
Preparing for the hierarchical model........................................................................................77
Workset of the top level model..................................................................................77
Analysis .....................................................................................................................78
The score plot (t1 vs. t2) of the top level model ........................................................79
The w*c plot ..............................................................................................................80
Coefficients................................................................................................................81
Variable Importance (VIP) ........................................................................................82
Observed vs. Predicted ..............................................................................................83
Predictions................................................................................................................................84
DModXPS .................................................................................................................85
Scores tPS1 vs. tPS2 colored by test set and training set...........................................87
Cusum Chart ..............................................................................................................87
Conclusion ...............................................................................................................................89

Spectral Filtering and Compression, including OPLS

91

Introduction..............................................................................................................................91
Data..........................................................................................................................................91
Objective ..................................................................................................................................91
Analysis Outline.......................................................................................................................92
The steps to follow in SIMCA-P are: ........................................................................92
Create the project .....................................................................................................................92
Plotting the Spectra ..................................................................................................................93
Prepare the Data .......................................................................................................................94
Workset......................................................................................................................94
Analysis ...................................................................................................................................95
PLS model .................................................................................................................95
Validating the Model 1 ..........................................................................................................100
Orthogonal Signal Correction and Wavelets Compression....................................................101
Model with the Signal corrected and compressed data ..........................................................103
Summary of the preprocessed project......................................................................103
Change the default Scaling ......................................................................................104
Validating the Model 2 ..........................................................................................................106
Conclusion OSC-Wavelets ....................................................................................................107
OPLS (Orthogonal PLS) ........................................................................................................107
Conclusions............................................................................................................................109

Batch Modelling with SIMCA-P+

111

Introduction............................................................................................................................111
Data........................................................................................................................................111
Objectives ..............................................................................................................................111
Analysis Outline.....................................................................................................................112
The steps in SIMCA-P are:......................................................................................112
Create the observation level project .......................................................................................112
Analysis .................................................................................................................................115
Batch Control charts (Training set)..........................................................................117
Monitoring new batches.........................................................................................................119
Import the secondary data set with the new batches ................................................119
Control Charts for new batches ...............................................................................120
Tutorial SIMCA-P, SIMCA-P+

Contents iii

Prediction | Batch Control Charts | DModX............................................................ 123


Creating and Modelling the batch level project..................................................................... 124
Analysis: Autofit ..................................................................................................... 124
Analysis: Scores ...................................................................................................... 124
Analysis |Batch Control Charts | Batch Variable Importance ................................. 125
Predicting the quality of the new batches .............................................................................. 125
Predictions: T Predicted .......................................................................................... 126
Predictions: Contribution Scores for batch 51......................................................... 126
Conclusion............................................................................................................................. 128

Modelling of a Batch Digester

129

Introduction ........................................................................................................................... 129


Data ....................................................................................................................................... 129
Objectives.............................................................................................................................. 130
Analysis Outline .................................................................................................................... 130
The steps in SIMCA-P are: ..................................................................................... 131
Create the observation level project ...................................................................................... 131
Specify the Workset .............................................................................................................. 136
Analysis................................................................................................................................. 138
Fitting All the Class models .................................................................................... 138
Scores Line plot of t1, t2 and t3 .............................................................................. 139
Loadings p1, p2 and p3 ........................................................................................... 140
Batch Control charts (Training set) ......................................................................... 141
Monitoring new batches ........................................................................................................ 146
Creating the Prediction set: Complement of Workset ............................................. 146
Batch Control Chart of the Prediction set ............................................................... 147
OOC plot ................................................................................................................. 147
Group Contribution plot.......................................................................................... 148
Variable control chart.............................................................................................. 148
Prediction | Batch Control Charts | DModX............................................................ 149
Creating and Modelling the batch level project..................................................................... 150
Analysis: Autofit ..................................................................................................... 151
Analysis: Scores ...................................................................................................... 151
Analysis | Batch Variable Importance..................................................................... 153
Predicting the quality of the prediction set batches ............................................................... 153
Predictions: T Predicted .......................................................................................... 154
Predictions: Contribution Scores for batch 28......................................................... 154
Predictions: Distance to the Model (DModX)......................................................... 156
Contribution Plot..................................................................................................... 156
Conclusion............................................................................................................................. 158

iv Contents

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA

Regular Project (non-Batch)


General
SIMCA-P is organized into projects. A project is a folder
containing the results of the analysis (unlimited number of models)
of a primary dataset.
You start a new project by importing its data (primary dataset).
Unfitted models are implicitly created by SIMCA-P when you
specify a Workset or with an existing Workset when you select
Active Model Type.
At the very beginning of a project, the default Workset consists of
all data with all variables centered and scaled to unit variance and
considered as X, and the model is a principal components model
(PC) of X.
The project window displays, for every model, one line
summarizing the model results.
The active model, the one you are working with, is also listed in a
list box to the left of the gray area (status bar) just beneath the
command menu bar.
To open a model, double click on it in the project window. This
opens a model window with the details (one line per component)
of the model results.
Another way to activate a model (if several are available), is to
select its name from the list box (upper left).

The Analysis cycle

Tutorial SIMCA-P, SIMCA-P+

1.

Pre-processing and selection of data: (Dataset and


Workset menu)

2.

The Dataset menu allows you to trim / winsorize your


data, generate new variables, and perform spectral
filtering, or wavelet compression of the data.
A model is developed from a Workset. The default
Workset, to start, is the whole dataset with all variables as
X and scaled to unit variance. This is also obtained by
Workset | New.

How to get started with SIMCA 1

The Workset menu allows you to modify the starting


Workset.
3.

Specifying and fitting the model (Analysis menu).

4.

Reviewing the results and performing diagnostics


(Analysis menu).

5.

Using the model for predictions (Predictions menu).

Import the primary data, create a new project


File: New
Select to import data from file or databases.
SIMCA-P imports files with the following format types:
DIF: Data interchange format (many applications can export DIF
files).
TXT: Standard delimited text file (one observation per line).
TXT: Free format text, with or without header.
MAT: Matlab version 4.0 files (binary).
XLS: All versions of EXCEL files.
LOTUS 1 2 3 : *.wk1 files
JCAMP-DX : *.jcm, *.dx, *.jdx
ANDI: Chromatography AIA files
NSAS: files
GRAM: Galactic *.spc files
Others (refer to chapter 4), including old SIMCA-P file types.
Select the Source file
Source Directory: The directory that contains the data file
.
Name: Locate the source file, e.g., ENVIRO.DIF
Double click on the name of the source file.
Destination Directory: The directory in which to store the project,
e.g., C:\SIMDATA\ENVIRO.
You may also change the project directory (destination), if you
wish. By default SIMCA-P uses the source directory as the
destination directory.
Indicate file contents
Specify Primary and as many Secondary identifiers as desired for
both variables and observations.
Secondary datasets
Later you may import additional data (secondary datasets) for use
in predictions. You do that in the menu File | Import Secondary
Dataset

2 How to get started with SIMCA

Tutorial SIMCA-P, SIMCA-P+

View
Customize your display and specify the project level option and
general options.

Pre-processing the data (Dataset menu)


Plotting variables or observations from the dataset
Mark the variables or observations you want to plot, right click on
the marked objects and select the desired plots.
To plot all the X observations as a line plot, just right click on the
dataset and select Plot | Xobs.
Use the Dataset menus to view or modify a SIMCA-P dataset as
follows:
Quick Info
Interactive plots tied to the dataset displaying variables or
observations in the time or frequency domain.
Trimming / Winsorizing single, or all variables
Edit dataset
General Edit commands
Generate new variables
Generate new variables as functions of existing ones or from
model results
Spectral filter the dataset with:
Orthogonal Signal Correction (OSC)
Multiple Scatter Correction (MSC)
Standard Normal Variates (SNV)
1st and 2nd Derivatives
Wavelet transform and compression
PLS wavelet transform of time series
Decimation of time series

Prepare the data (Workset menu)


The default Workset, at the project start, is the whole dataset with
variables defined as Xs and Ys as specified at import, and scaled
to unit variance. The associated model (unfitted) is listed in the
active area. You are ready to fit a PLS model (default), or PC of X
or Ys, with all the data of the primary dataset. If this is what you
want, you can go directly to the Analysis menu.
To fit a different model with maybe excluded variables, or
transformations, or different scaling, it is necessary to first modify
the Workset.
An unfitted model is generated by SIMCA-P when you specify the
Workset (select a starting Workset New or As Model).

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA 3

Workset
New
Uses the whole original primary dataset with Xs and Ys as defined
at import

New As Model
Use the Workset of a selected model as starting point.

Modify the Workset as follows


Observations
Include / exclude observations or group them into classes for
classification.
Variables
Define X/Y variables, transformations, scaling tec.
Transform
Transform variables.
Lag
Create lagged variables (SIMCA-P only).
Variables/Block
Select variables, and specify roles.
To select variables as X, Y or excluded, mark the variables as X, Y or
excluded and click on the Set button.
Expand
Expand the X matrix with cross terms, squares or cubes.
Scale
Select scaling base type (UV = cantered, unit variance, Par = cantered
and Pareto, etc.). A modifier can be selected (default = 1.0) that
changes the scaling of a variable relative to its base weight. Block
scaling can also be specified.
Trim / Winsorize variables
Trimming / Winsorizing the workset does not affect the dataset but
just that particular workset (refer to the workset chapter).

Options
Specify the model level options

Develop the model (Analysis menu)


Select model type
The default model is a PCX model if all your variables are defined as
X's, or a PLS model if you have defined both X's and Y's at import.
You can change the model type, and when the Workset specification
allows it, you can select among:

PCX
PC model of the X's.

4 How to get started with SIMCA

Tutorial SIMCA-P, SIMCA-P+

PCY
PC model of the Y's.

PCAll
PC model of all included variables, X and Y.

PC Class
PC of a selected class when your observations are divided into
classes.

PLS
PLS analysis of X and Y

PLS Class
PLS of a selected class when your observations are divided into
classes.

PLSDA
PLS discriminate analysis when your observations are divided into
classes.

Fit the model


Autofit
Rule based fitting.

2 First Components
Calculate two components directly, often used to get a quick
overview of the data.

Next Component
Calculate one component at a time. Here it is possible to force
components to be calculated regardless of significance rules.

Remove Component
Remove the last component

Autofit Class Models


Autofit or takes as many components as specified of all class
models

Specify Hierarchical Models


Specify a model to be Base or Top Hierarchical

Review the fit (Analysis menu)


After a fit, the whole spectrum of plots and lists are available for
model interpretation.

Summary of fit
1.

Tutorial SIMCA-P, SIMCA-P+

Model Overview

How to get started with SIMCA 5

2.

X/Y overview: Cumulative Fit of all variables (Y only in


PLS

3.

X/Y/Comp: The Fit of a Variable by Component.

4.

Component Contribution: The contribution of a model


component to the Fit.

5.

Scores:t1 vs. t2, t1 vs. u1, etc.

6.

Loadings: p1 vs. p2, w*c1vs. W*c2, etc.

7.

Coefficients (PLS)

8.

VIP (PLS) Variable influence on projection

9.

DMod (X or Y) Distance to the model (X or Y )

10. Observed vs. predicted (PLS)


11. Residual plots:
Normal probability plot (for selected Y's)
12. Observation risk
Note: By default, in the Analysis menu, all plots and lists are
displayed for the last component. To select a different component
for display in the plots, and/or a different variable, click on the
right mouse button and select from the available options.

Select a New Model Type


You can, after fitting the model, select a new model type. SIMCAP then creates a new unfitted model with the selected model type.
For example, if you have defined your Workset variables as X's
and Y's, you can first fit a PCY (PC of the responses), then change
the model type to PLS and fit a PLS model (another model) to the
same data.

Predictions (Predictions menu)


Building the Prediction Set
Use the menu Predictions | Specify Prediction set to build your
prediction set from the Primary or any secondary datasets. You can
display the Prediction set as a spreadsheet or just plot or list
results.
When you do not specify a prediction set, the prediction set is by
default the primary dataset with all the data.
You can build the prediction set from observations belonging to
the primary dataset or any secondary dataset that you have
imported. You can also enter the data in the prediction set through
the keyboard when you build the prediction set in the spreadsheet.

Displaying the predictions


All the prediction results (scores, y-values, etc.), computed with
the active model, are displayed as plots or lists.

6 How to get started with SIMCA

Tutorial SIMCA-P, SIMCA-P+

Plots/Lists
Under this menu you can find general plot and list routines. Here it
is possible to plot and list any data, and results from the analysis.
There are scatter, line, column, 3d scatter, histogram, contour,
response surface, normal probability plots, wavelets plots, control
charts and batch control charts available.
Note: Click on the right mouse button to display available
properties for an active plot or list. You can generate lists form
plots and plots from list.

Road map to SIMCA-P


1. Start a project
File New
Read Data File
Specify Label Cols & Rows

2. Look at the data


Data set
Quick Info
Variables or Obs.

3. Prepare a work copy


Workset
variables
observations

4. Fit the model


Analysis
Autofit
ot fast button

5. Plot results
Analysis
Scores, Loadings
Distance to Model

6. Outliers in scores
Polish data
Prepare new workset
Graphically or via Workset

6. No outliers in scores
Continue
Interpret model (plots)
Relate to Objective

7. New data
Predictions
Select Pred.set (observations)
T_pred, Y_pred, DModX, etc.

Batch Projects (SIMCA-P+ 10)


General
A SIMCA-P Batch projects consists of two or more linked
projects. (a) The Observation level project with several
observations per batch with the variables measured during the
evolution of the batch, and (b) the batch level project(s) consisting
of the completed batches, with one batch being one observation
(matrix row). The variables of the Batch level project are the
scores, or original variables of the observation level at every time
point folded out side-wise. Batches may be divided into phases.

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA 7

Observation level project


With Batch data, you start by importing the Observation level data
and create the Observation Level project.
In the data, you must have a Batch identifier, indicating the start
and end of the batch, and if phases are present, also a phase
identifier. You may also have a variable indicating the evolution of
the batch or phase and its end point. This variable can be Time or
Maturity. You can have different Maturity variables for different
phases.
Unfitted batch models are implicitly created by SIMCA-P. When
batches have phases, theses are one PLS class model with Time or
Maturity as Y for each phase. By default all variables in a phase
are scaled to unit variance.
The project window displays, for every model, one line
summarizing the model results.
When Batches have phases the PLS Batch class models (one for
every phase) are grouped under an umbrella call MBxx , xx is a
sequential number.
You can display the results of the analysis of the training set
batches in Control Charts, either as scores, DModX, predicted time
or maturity, or as individual variables.
Secondary datasets can be imported with new batches. These can
also be displayed in Control Charts in the same way.

Batch Level Project


The Batch level project is based on scores or original variables for
completed batches, obtained from the observation level project.
The Batch level project is a regular SIMCA-P project. Batch initial
conditions and quality variables, when present, are automatically
added to the batch level dataset. You can change the default model
type (PCA) to any desired model type allowed by the workset
specification.

The Analysis cycle


Observation level project
13. Pre-processing and selection of data: (Dataset and
Workset menu)

8 How to get started with SIMCA

6.

The Dataset menu allows you to trim / Winsorize your


data, generate new variables, and perform spectral
filtering, or wavelet compression of the data.
A model is developed from the default Workset. The
default Workset consists of PLS Batch class models, one
for every phase.

7.

Fitting the Observation level model (Analysis menu).

8.

Reviewing the results and performing diagnostics


(Analysis menu).

9.

Batch Control Charts for training set batches (Analysis


menu)
Tutorial SIMCA-P, SIMCA-P+

10. Importing a Secondary dataset with new batches and


using the model to display the new batches in the Control
Charts (Prediction | Batch Control Chart).

Batch Level Project


11. Creating the Batch level project (File | Create Batch Level
project)
12. Fitting the Batch level Project
13. Interpretation using score plots, loading plots, DModX,
contribution plots, etc.
14. Predicting and interpreting results for new whole batches.

Tutorial SIMCA-P, SIMCA-P+

How to get started with SIMCA 9

Introduction

General
This tutorial is just a brief introduction to using SIMCA-P on
selected data sets. The user is advised to go through the different
phases of modeling, import data, PC and PLS modeling, and look
at the results in graphs and lists. For a more detailed description of
how to use
SIMCA-P, the USERS GUIDE and the ON-LINE HELP system
(identical) are recommended.
There are five examples in this tutorial.
The first example shows the strength of using projection methods
on food data.
The second example is from a real process at a mineral sorting
plant.
The third example is a multivariate calibration often performed in
analytical chemistry.
The fourth example illustrates hierarchical modeling.
The fifth demonstrates the use of Spectral filtering.
Example six and seven show how to handle batch type of data,
without and with phases.
As a tutorial, this provides just a brief introduction to the main
functionalitys and plots in SIMCA-P. We recommend that you
continue with your own data, and use the Manual for details. The
Help system contains the same information as the Manual, but
organized in a different way.

Plots and Lists


You can display the results of SIMCA-P in numerous graphs and
lists.
From the Analysis and the Prediction menu, results of the active
model are available as plots and lists. With the menu Plot/List, you
have access for plotting or listing, to the raw data and every
computed value from every model. You can even plot vectors from
different models against each other.
Auto and Cross Correlation plots as well as Power Spectrum are
available for all vectors.
In Dataset you can preprocess the data by trimming and winsorizing.
Quick info plots are available with all spreadsheets.
Tutorial SIMCA-P, SIMCA-P+

Introduction 11

12 Introduction

Tutorial SIMCA-P, SIMCA-P+

Foods

Data
Collected data are often presented as data tables which are almost
useless when it comes to extract information. A data table is much
better presented graphically. The example below will illustrate the
principles of projection. The data in this example describes the
consumption of different food items in several European countries.

Variables
The selection of the variables reflects the different traditions and
cultural behavior of the countries.

Observations
16 European countries have been selected.

Data table
1

10

Grain_Coffee

Inst_Coffee

Tea

Sweet

Bisc Pa_Soup

Ti_Soup

In_Potat

Fro_Fish

Fro_Veg
21

Germany

90

49

88

19

57

51

19

21

27

Italy

82

10

60

55

41

France

88

42

63

76

53

11

23

11

Holland

96

62

98

32

62

67

43

14

14

Belgium

94

38

48

11

74

37

23

13

12

Luxembourg

97

61

86

28

79

73

12

26

23

England

27

86

99

22

91

55

76

17

20

24

Portugal

72

26

77

22

34

20

Austria

55

31

61

15

29

33

15

11

10

Switzerland

73

72

85

25

31

69

10

17

19

15

11

Sweden

97

13

93

31

43

43

39

54

45

12

Denmark

96

17

92

35

66

32

17

11

51

42

13

Norway

92

17

83

13

62

51

17

30

15

14

Finland

98

12

84

20

64

27

10

18

12

15

Spain

70

40

40

62

43

14

23

16

Ireland

30

52

99

80

75

18

Tutorial SIMCA-P, SIMCA-P+

11

Foods 13

11

12

13

14

15

18

19

20

Apples

Orang

Ti_Fruit

Jam

Garlic Butter

16

Margarine

Olive_Oil

Youg

Crisp_Bread
26

Germany

81

75

44

71

22

91

85

74

30

Italy

67

71

46

80

66

24

94

18

France

87

84

40

45

88

94

47

36

57

Holland

83

89

61

81

15

31

97

13

53

15

Belgium

76

76

42

57

29

84

80

83

20

Luxembourg

85

94

83

20

91

94

94

84

31

24

England

76

68

89

91

11

95

94

57

11

28

Portugal

22

51

16

89

65

78

92

Austria

49

42

14

41

51

51

72

28

13

11

10

Switzerland

79

70

46

61

64

82

48

61

48

30

11

Sweden

56

78

53

75

68

32

48

93

12

Denmark

81

72

50

64

11

92

91

30

11

34

13

Norway

61

72

34

51

11

63

94

28

62

14

Finland

50

57

22

37

15

96

94

17

15

Spain

59

77

30

38

86

44

51

91

16

13

16

Ireland

57

52

46

89

97

25

31

64

Objective
The objective of this study is to understand how the variation in food
consumption among a number of industrialized countries is related to
culture and tradition and hence find the similarities and dissimilarities
among the countries. Hence data have been collected on 20 variables
and 16 countries. The data show how many percent of households
use 20 food items regularly.

Analysis Outline
The steps to follow in SIMCA-P are:

Import the data set.

Prepare the data (Workset menu).

Fit a PC model and review the fit (Analysis menu).

Interpret the results (Analysis menu).

Define project
Start SIMCA-P and create a new project from FILE | NEW

14 Foods

Tutorial SIMCA-P, SIMCA-P+

Select type of data (XLS) or ALL Supported Files (the default) and
find the data set (FOODS.XLS). Data can be imported from your
hard-disk or from a network drive. Data can be imported in
different formats, so select the one which is appropriate or All
Supported Files. In this example we have the data in a XLS-file
created from Excel.
If the data set is on a floppy disk, we recommend that you first
copy the file to the hard disk.
Tutorial SIMCA-P, SIMCA-P+

Foods 15

If you want to leave open the current project, remove the check
mark from the box Close Current Project.
Note: The data set to import can be located anywhere on an
accessible directory. It does not have to be located where you have
defined the destination directory.
When you click on Open, SIMCA-P opens the Import Wizard.
With SIMCA-P+, mark the radio button SIMCA-P normal project.

SIMCA-P has recognized that this example has observation


numbers and names and variable names, and has correctly color
coded them.
16 Foods

Tutorial SIMCA-P, SIMCA-P+

When you click on Next, the Project specification page opens. You
can change the project name and a destination directory.
Mark the check box Use workset wizard and click on Finish.

Workset Wizard
The workset wizard opens to guide through the creation of the
workset and the fitting of the model.

Tutorial SIMCA-P, SIMCA-P+

Foods 17

Select in the Variable page, which variables are X or Y and which


variables to exclude.
If you mark variables and press Transform, the software checks
and applies transformation (Log Transform) when needed.
For this example, all variables are X and no transformation is
needed; click on Next.

In this page you exclude/include observations or set observations


into classes. The Set class from ObsID uses a selected part of any
observation ID to set classes automatically.

18 Foods

Tutorial SIMCA-P, SIMCA-P+

This example is a PCA to get an overview of the data table, all


observations are included and no classes are specified. Click on
Next to display a summary of the specifications and then click on
Finish to fit the model with cross validation.

Analysis
The plot with the summary of the fit of the model is displayed with
R2X(cum) (fraction of the variation of the data explained after
each component) and Q2(cum) (cross validated R2X(cum)).
Double click on model summary line. The summary of the fit of
the model is displayed with R2X (fraction of the variation of the
data explained by each component) and cumulative R2X(cum), Q2
and Q2(cum) (cross validated R2X and R2X(cum)) as well as the
eigenvalues. The food variables are, as expected, correlated, and
fairly well summarized by three new variables, the scores,
explaining 65% of the variation.

Tutorial SIMCA-P, SIMCA-P+

Foods 19

Scores and Loadings


Scores
Select Analysis | Scores | Scatter Plot or the fast button
to
display the score plot of t1 vs. t2 (default). In the Label Types
page, make sure the secondary identifier Onam is selected.

The ellipse represents the Hotelling T2 with 95% confidence (see


statistical appendix).
The scores t1 and t2, one vector for components 1 and 2, are new
variables computed as linear combinations of all the original
variables to provide a good summary.
The weights combining the original variables are called loadings
(p1 and p2), see below.
The score plot shows 3 groups of countries. One group with the
Scandinavian countries (the North), the second with countries from
the South of Europe, and a third more diffuse with countries from
Central Europe.

20 Foods

Tutorial SIMCA-P, SIMCA-P+

To color the observations (countries) by the values of a variable,


right click, and open the properties. Select color, by categories,
and in the combo box choose a variable (here garlic). In the split
range window, enter 4.

Change the split range as needed in the text boxes on the right.

Garlic separates clearly Northern Europe from Southern Europe.

Loadings
Select Analysis | Loadings | Scatter Plot to display the loadings p1
vs. p2.
The loadings are the weights with which the X-variables are
combined to form the X-scores, t (se above). This plot shows
which variables describe the similarity and dissimilarity between
countries.

Tutorial SIMCA-P, SIMCA-P+

Foods 21

Scandinavians eat crisp bread, frozen fish and vegetables, while in


southern Europe people use garlic and olive oil, and central
Europeans (in particular the French) consume a lot of yogurt.

Third Component
Plot the scores (t1 vs. t3) and loadings (p1 vs. p3). The third
component explains 13.8% of the variation in the data, and mainly
shows high consumption of Tea, Jam and canned soups mainly in
England and Ireland.

Summary
In conclusion, a three components model of the data summarizes
the variation in three major latent variables, describing the main
variation of food consumption in the investigated European
countries.
This example shows a simple PC modeling to get an overview of a
data table. The user is encouraged to continue to play around with

22 Foods

Tutorial SIMCA-P, SIMCA-P+

the data set. Take away observations and/or variables, refit new
models, and interpret at the results.

Tutorial SIMCA-P, SIMCA-P+

Foods 23

Mineral sorting at LKAB

Introduction
The following example is taken from a mineral sorting plant at
LKAB in Malmberget, Sweden. Research engineer Kent Tano, at
LKAB was responsible for this investigation.
In this process, raw iron ore (TON_IN) is divided into finer
material (<100 mm, 50% Fe) passing several grinders. After
grinding, the material is sorted and concentrated in several steps by
magnetic separators. The separation flow is divided in several
parallel lines and there are also feedback systems to get as high Fe
concentration as possible. The concentrated material is divided
into two products, one (PAR) which is sent to a flotation process
and another part (FAR, fines) which is sold as is. For both these
products high Fe content is important.
Twelve process factors were identified. Of these, three important
factors were used to set up a statistical design (RSM). The results
of each experiment were measured in 6 response variables. Several
observations were collected for each design point.
The process is equipped with an ABB Master system with a
SuperView 900 connected to the process data system. Data where
transferred from the ABB system to a personal computer with the
SIMCA-P software for modeling. Models were transferred back to
the SuperView system for on-line monitoring (predictions, score
and loading plots) of the process. The investigation was made in
1992. The multivariate on-line control of the process is still in
work with very good results concerning the quality of the products.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 25

Data description
The following is a description of variables and observations.

Variables
Data from 18 variables were collected.
Process variables (X)
Explanation

Abbr.

RSM

Total load

TON_IN

Design

Load of grinder 30

KR30_IN

Load of grinder 40

KR40_IN

PARmull

PARM

Velocity of separator 1

HS_1

Design

Velocity of separator 2

HS_2

Design

Effect grinder 30

PKR_30

Effect grinder 40

PKR_40

Ore waste

GBA

10

Load of separator 3

TON_S3

11

Waste from grinding

KRAV_F

12

Total waste

TOTAVF

Responses (Y)
Explanation

Abbr.

13

Amount of concentrate type 1

PAR

14

Amount of concentrate type 2

FAR

15

Distribution of type 1 and 2

r-FAR

16

Iron (Fe) in FAR

%Fe_FAR

17

Phosphor (P) in FAR

%P_FAR

18

Iron (Fe) in raw ore

%Fe_malm

Observations
A subset of 231 observations was used for modeling. Each
observation has a name referring to the date and time when data
were collected.

Data table
A subset of the data is shown in Table 1.

26 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

/ Sovr.XLS Last change 930818


/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/Nr
ID (logged time)
Ton_in
KR30_IN KR40_IN PARM HS_1
HS_2
PKR_30 PKR_40 GBA
TON_S3 KRAV_F TOTAVF PAR
FAR
r_FAR %Fe_FAR %P_FAR %Fe_malm
ONUM ONAM
Ton_in
KR30_IN KR40_IN PARM HS_1
HS_2
PKR_30 PKR_40 GBA
TON_S3 KRAV_F TOTAVF PAR
FAR
r_FAR %Fe_FAR %P_FAR %Fe_malm
91
1992030512300000
1271.81
275.81
190.88
62.98
90.41
79.52
57.44
41.35
163.8
203.94
75.34
383.71
307.88
591.75
65.78
66.2
0.24
47.9
92
1992030512310000
1290.56
278.55
208.58
58.08
90.41
79.52
56.85
43.57
156.38
203.64
79.49
384.28
314.63
601.5
65.66
66.2
0.24
47.9
93
1992030512320000
1267.39
278.55
207.38
63.19
90.41
79.52
51.38
42.2
188.57
200.64
81.99
398.21
312.19
585.06
65.21
66.2
0.24
47.9
94
1992030512330000
1250.44
278.06
204.53
57.48
90.41
79.52
54.02
41.22
155.1
206.27
75.79
384.56
298.63
576.75
65.89
66.2
0.24
47.9
95
1992030512340000
1265.51
279.56
190.43
49.31
90.41
79.52
54.74
42.66
169.35
214.82
79.99
415.38
304.13
591.75
66.05
66.2
0.24
47.9
96
1992030512350000
1268.18
276.11
194.63
62.88
90.41
79.52
52.29
42.66
169.8
212.27
80.69
403.71
310.88
592
65.57
66.2
0.24
47.9
97
1992030512360000
1284.3
272.55
211.28
58.68
90.41
79.52
48.03
41.74
174.83
206.57
80.89
405.33
293.13
575.5
66.25
66.2
0.24
47.9
98
1992030512370000
1284.41
275.4
208.28
50.48
90.41
79.52
59.11
43.77
182.25
208.67
76.49
420.53
304.13
580
65.6
66.2
0.24
47.9
99
1992030512380000
1272.79
274.35
207.53
62.68
90.41
79.52
59
44.36
181.5
201.92
74.53
394.57
300.13
598.25
66.59
66.2
0.24
47.9
100
1992030512390000
1317.11
269.81
192.23
56.18
90.41
79.52
56.25
42.59
185.93
199.44
79.49
409.68
311.13
579.75
65.08
66.2
0.24
47.9
101
1992030512400000
1273.16
264.71
195.38
49.56
90.41
79.52
54.5
42.53
186.15
193.82
79.14
405.47
291.63
585.25
66.74
66.2
0.24
47.9
102
1992030512410000
1239.15
264.86
209.93
56.78
90.41
79.52
62.6
44.88
173.78
207.24
83.94
393.18
300.63
601
66.66
66.2
0.24
47.9
103
1992030512420000
1290.86
272.21
201.08
62.83
90.41
79.52
56.28
42.66
159.38
211.52
79.79
389.36
325.63
605.5
65.03
66.2
0.24
47.9
104
1992030512430000
1272.64
267.6
201.38
65.58
90.41
79.52
53.66
40.89
163.8
209.04
78.89
377.68
304.13
592.75
66.09
66.2
0.24
47.9
105
1992030512440000
1285.58
264.26
203.78
55.43
90.41
79.52
52.33
43.38
168.83
213.17
75.69
400.21
314.44
594.56
65.41
66.2
0.24
47.9
106
1992030512450000
1263.75
267.9
187.13
63.58
90.41
79.52
50.4
42.85
176.55
196.74
81.44
390.11
314.88
587.75
65.12
66.2
0.24
47.9
107
1992030512460000
1289.36
264.86
212.63
53.03
90.41
79.52
52.3
42.72
175.65
190.52
76.59
406.36
300.38
593.5
66.4
66.2
0.24
47.9
108
1992030512470000
1309.05
272.55
194.78
50.93
90.41
79.52
50.01
45.47
172.35
200.64
76.54
392.13
297.88
596.25
66.69
66.2
0.24
47.9
109
1992030512480000
1282.01
271.16
209.48
61.83
90.41
79.52
47.49
46.85
197.94
193.59
75.99
397.38
315.88
576.5
64.6
66.2
0.24
47.9
110
1992030512490000
1288.91
264.26
222.83
51.88
90.41
79.52
60.47
44.36
193.07
193.29
77.19
419.35
305.63
577.5
65.39
66.2
0.24
47.9
111
1992030512500000
1289.96
262.46
210.38
45.11
90.41
79.52
51.09
47.77
195.77
189.24
79.14
419.04
310.13
566.25
64.61
66.2
0.24
47.9
131
1992030513100000
1062.49
217.61
170.63
32.41
94.52
74.7
41.08
40.82
130.71
153.08
59.13
310.49
242.56
477.19
66.3
67.2
0.2
51.2
132
1992030513110000
1024.8
218.06
178.43
43.46
94.52
74.7
38.12
38.27
132.58
152.7
62.28
294.07
261.13
505.5
65.94
67.2
0.2
51.2
133
1992030513120000
1070.74
215.06
165.34
38.51
94.52
74.7
40.92
39.25
140.01
147.88
57.33
304
248.56
501.75
66.87
67.2
0.2
51.2
134
1992030513130000
1054.65
216.08
176.03
31.41
94.52
74.7
44.23
39.19
135.88
148.48
59.33
318.05
249.56
523
67.7
67.2
0.2
51.2
135
1992030513140000
1072.05
214.73
166.24
41.08
94.52
74.7
42.43
37.09
127.71
149.7
61.14
302.51
263.56
514.5
66.13
67.2
0.2
51.2
136
1992030513150000
1056.71
224.63
177.68
35.31
94.52
74.7
46.42
39.25
117.51
143.68
57.73
285.5
252.88
522.25
67.38
67.2
0.2
51.2
137
1992030513160000
1025.7
216.68
174.68
29.61
94.52
74.7
46.51
36.76
117.81
148.93
57.13
294.25
257.63
513.25
66.58
67.2
0.2
51.2
138
1992030513170000
1045.91
215.03
171.23
43.26
94.52
74.7
43.41
39.19
129.58
148.63
52.23
290.73
253.63
499.94
66.34
67.2
0.2
51.2
139
1992030513180000
1044.15
219.08
166.88
36.01
94.52
74.7
47.06
39.58
127.41
136.56
56.43
279.88
237.06
508.25
68.19
67.2
0.2
51.2
140
1992030513190000
1106.14
219.08
175.58
27.11
94.52
74.7
41.08
41.94
130.11
149.38
55.28
312.23
243.81
519.25
68.05
67.2
0.2
51.2
141
1992030513200000
1079.55
222.83
199.58
37.91
94.52
74.7
48.49
38.92
121.41
148.33
56.03
285.54
259.88
521.5
66.74
67.2
0.2
51.2
142
1992030513210000
1024.35
213.23
184.43
36.41
94.52
74.7
45.05
42.33
140.53
136.41
57.68
296.2
257.13
515.5
66.72
67.2
0.2
51.2
143
1992030513220000
1071.49
213.41
174.83
27.91
94.52
74.7
44.67
40.17
125.21
141.9
59.73
300.23
266.88
515.5
65.89
67.2
0.2
51.2
144
1992030513230000
1069.65
220.76
192.38
36.41
94.52
74.7
40.41
39.71
122.76
145.63
55.03
287
254.63
531.25
67.6
67.2
0.2
51.2
145
1992030513240000
1055.14
217.76
182.03
36.96
94.52
74.7
38.3
39.51
128.16
142.71
52.73
284.99
265.38
530.5
66.66
67.2
0.2
51.2
146
1992030513250000
1087.2
225.11
157.88
34.98
94.52
74.7
41.79
38.73
131.03
140.03
56.09
292.16
256.38
527.25
67.28
67.2
0.2
51.2
173
1992030513520000
1059.75
224.33
176.48
6
94.52
48.19
46.19
39.58
104.81
99.11
55.68
249.03
238.31
591.75
71.29
64.3
0.39
50.9
174
1992030513530000
1056.15
229.01
164.93
6.5
94.52
48.19
43.1
41.81
115.03
95.66
54.03
258.22
232.81
557.75
70.55
64.3
0.39
50.9
175
1992030513540000
1032.9
221.33
173.48
0.05
94.52
48.19
46.29
39.25
112.56
95.14
57.43
260.62
230.31
564.75
71.03
64.3
0.39
50.9
176
1992030513550000
1059.04
237.26
173.93
6.5
94.52
48.19
45.42
39.38
115.78
96.56
55.93
263.47
223.31
587.5
72.46
64.3
0.39
50.9
177
1992030513560000
1008
228.11
166.28
9.86
94.52
48.19
42.97
41.22
105.58
102.06
57.34
257.74
248.38
567.75
69.57
64.3
0.39
50.9
178
1992030513570000
1079.29
223.61
170.03
9.65
94.52
48.19
46.19
38.07
119.68
102.11
56.03
263.05
225.31
615.75
73.21
64.3
0.39
50.9
179
1992030513580000
1096.84
225.41
177.68
3.15
94.52
48.19
42.26
40.69
120.28
99.71
57.38
278.64
205.06
600.25
74.54
64.3
0.39
50.9
180
1992030513590000
1057.65
225.11
174.98
11.1
94.52
48.19
43
39.97
109.31
98.36
56.13
252.7
216.56
562.75
72.21
64.3
0.39
50.9
181
1992030514000000
1073.29
223.01
168.04
5.65
94.52
48.19
42.33
37.81
120.88
101.14
55.58
268.34
227.56
604.5
72.65
64.3
0.39
50.9
182
1992030514010000
1094.55
224.03
163.24
2
94.52
48.19
45.65
38.27
106.16
102.26
53.58
260.15
221.56
611.5
73.4
64.3
0.39
50.9
183
1992030514020000
1033.5
219.98
168.98
10.9
94.52
48.19
41.81
39.12
111.49
99.64
53.38
265.02
226.06
576.25
71.82
64.3
0.39
50.9
184
1992030514030000
1061.29
214.73
167.44
4.61
94.52
48.19
42.85
39.05
111.56
101.53
58.43
269.5
223.06
571.81
71.94
64.3
0.39
50.9
185
1992030514040000
1037.4
217.43
167.14
1.25
94.52
48.19
42.2
39.19
100.84
99.79
54.83
254.2
218.81
595
73.11
64.3
0.39
50.9
186
1992030514050000
1048.65
219.98
174.53
8.4
94.52
48.19
42.1
37.29
109.84
102.79
52.23
258.65
220.56
571
72.14
64.3
0.39
50.9
198
1992030514170000
1059.9
221.93
152.63
12.9
86.3
74.7
39.63
36.76
101.89
127.78
50.63
267.39
238.31
579.5
70.86
66.3
0.28
51.9
199
1992030514180000
1043.25
215.48
159.98
20.7
86.3
74.7
37.7
37.02
94.16
124.26
49.06
246.78
227.81
595.75
72.34
66.3
0.28
51.9
200
1992030514190000
1054.8
211.43
168.53
17.2
86.3
74.7
39.45
37.4
94.01
124.78
46.86
256.11
217.81
557.25
71.9
66.3
0.28
51.9
201
1992030514200000
1037.85
219.23
168.04
11.4
86.3
74.7
40.28
39.97
90.41
130.86
47.26
262.16
228.56
590
72.08
66.3
0.28
51.9
202
1992030514210000
1062.49
225.56
155.63
19.7
86.3
74.7
41.25
35.65
105.56
137.61
49.91
273.38
228.81
572.25
71.44
66.3
0.28
51.9
203
1992030514220000
1036.05
208.13
163.43
17.2
86.3
74.7
40.86
37.88
83.29
135.06
50.88
258.99
229.31
570.75
71.34
66.3
0.28
51.9
204
1992030514230000
1059.15
210.98
155.03
11.95
86.3
74.7
37.32
35.5
102.19
135.81
51.78
280.93
227.81
586.75
72.03
66.3
0.28
51.9
205
1992030514240000
1033.99
213.11
169.58
18.15
86.3
74.7
38.06
38.14
102.49
128.68
47.26
259.08
218.81
575.75
72.46
66.3
0.28
51.9
206
1992030514250000
1035.04
215.96
161.03
21.4
86.3
74.7
40.17
38.4
109.01
130.33
50.63
261.36
235.81
577.75
71.01
66.3
0.28
51.9
207
1992030514260000
1029.3
208.73
163.09
20.7
86.3
74.7
40.86
36.83
102.11
127.48
48.36
249.91
228.56
568.5
71.32
66.3
0.28
51.9
208
1992030514270000
0
0
0
1.5
86.3
74.7
20.45
15.52
0
0
0
0
182.06
532
74.5
66.3
0.28
51.9
233
1992030514520000
1491.83
297.86
233.51
1.41
86.3
48.19
59.31
48.68
168.23
140.38
75.79
372.87
305.63
861.88
73.82
62.6
0.42
49.8
234
1992030514530000
1489.58
298.46
254.21
6.8
86.3
48.19
58.97
47.57
141.73
140.76
85.74
359.68
299.38
846.38
73.87
62.6
0.42
49.8
235
1992030514540000
1500.08
295.31
235.91
9.75
86.3
48.19
62.99
47.18
144.73
141.28
89.64
365.9
331.88
825.63
71.33
62.6
0.42
49.8
236
1992030514550000
1467.9
305.4
261.26
3.35
86.3
48.19
59.59
49.67
165.53
134.98
90.89
394.19
313.38
838.88
72.8
62.6
0.42
49.8
237
1992030514560000
1477.58
307.8
247.16
6.8
86.3
48.19
62.85
48.75
145.48
138.51
90.79
363.23
331.63
856.13
72.08
62.6
0.42
49.8
238
1992030514570000
1490.55
306.75
251.66
11.3
86.3
48.19
60.13
50.25
159.75
137.31
81.34
372.12
323.13
833.38
72.06
62.6
0.42
49.8
239
1992030514580000
1500.83
296.81
262.16
12.3
86.3
48.19
63.78
51.11
151.35
135.88
81.99
356.92
318.88
857.63
72.9
62.6
0.42
49.8
240
1992030514590000
1506.04
295.01
255.23
5.45
86.3
48.19
59.47
51.83
159.83
133.5
84.54
375.34
323.88
787.44
70.86
62.6
0.42
49.8
241
1992030515000000
1495.2
307.8
271.46
13.4
86.3
48.19
60.11
54.71
171.08
139.03
89.39
382.27
303.88
835.63
73.33
62.6
0.42
49.8
242
1992030515010000
1493.78
310.8
259.46
7.35
86.3
48.19
59.2
49.34
152.1
133.63
84.99
366.02
339.13
848.38
71.44
62.6
0.42
49.8
251
1992030515100000
1269.71
273.11
215.33
21
90.41
66.24
49.55
46.72
152.48
146.03
74.73
347.11
294.63
650.06
68.81
67.1
0.19
50.1
252
1992030515110000
1270.91
270.41
228.71
18.3
90.41
66.25
50.76
51.04
148.48
144.21
72.78
347.16
274.38
652.81
70.41
67.1
0.19
50.1
253
1992030515120000
1279.91
266.21
232.31
16.5
90.41
66.26
46.13
48.36
153
139.41
71.68
350.78
309.63
638.56
67.35
67.1
0.19
50.1
254
1992030515130000
1280.36
267.86
230.21
25.71
90.41
66.26
50.34
46.39
141.13
141.96
75.54
336.06
291.38
616.75
67.91
67.1
0.19
50.1
255
1992030515140000
1249.46
262.61
213.38
23.6
90.41
66.26
56.84
45.8
149.91
139.93
77.49
339.68
299.88
653.06
68.53
67.1
0.19
50.1
256
1992030515150000
1277.63
264.71
211.43
16.7
90.41
66.26
47.34
49.93
162.08
149.53
72.63
362.66
287.13
655.31
69.53
67.1
0.19
50.1
257
1992030515160000
1306.05
260.81
210.23
25.81
90.41
66.26
50.91
46.65
172.2
140.23
75.69
367.11
306.63
635.06
67.44
67.1
0.19
50.1
258
1992030515170000
1243.24
271.01
216.08
20.7
90.41
66.26
47.08
47.05
165.83
148.48
77.89
362.19
288.88
656.06
69.43
67.1
0.19
50.1
259
1992030515180000
1262.59
277.31
205.58
13.65
90.41
66.26
55.09
45.34
152.7
146.76
74.43
364.57
280.88
654.06
69.96
67.1
0.19
50.1
260
1992030515190000
1282.76
262.76
224.03
23.61
90.41
66.26
52.17
47.5
163.43
137.61
77.29
362.88
289.38
664.06
69.65
67.1
0.19
50.1
261
1992030515200000
1277.18
257.66
221.48
17.95
90.41
66.26
50.37
47.24
156.23
145.33
73.18
356.78
288.13
659.06
69.58
67.1
0.19
50.1
262
1992030515210000
1258.46
263.81
203.63
10.4
90.41
66.26
49.32
45.93
154.95
143.16
77.59
368.96
268.13
653.31
70.9
67.1
0.19
50.1
263
1992030515220000
1236.94
255.86
207.53
20.25
90.41
66.26
48.91
47.57
173.1
143.76
75.39
364.57
284.63
632.06
68.95
67.1
0.19
50.1
323
1992030516220000
1247.89
253.31
230.78
6.3
82.19
66.26
56.59
46.06
190.82
140.08
90.44
411.49
257.13
622.25
70.76
64.6
0.24
44
324
1992030516230000
1262.66
258.26
216.83
13.1
82.19
66.26
51.78
52.81
191.34
139.86
90.64
413.86
275.38
628.06
69.52
64.6
0.24
44
325
1992030516240000
1303.99
259.31
238.31
12.45
82.19
66.26
52.94
50.32
196.14
135.88
96.64
416.21
263.13
622.5
70.29
64.6
0.24
44
326
1992030516250000
1280.74
267.71
229.13
5.25
82.19
66.26
56.69
49.53
203.27
138.88
91.49
428.39
273.13
626.06
69.63
64.6
0.24
44
327
1992030516260000
1259.63
263.81
220.43
11.4
82.19
66.26
54.28
53.73
184.95
136.86
84.39
394.79
285.13
622.75
68.59
64.6
0.24
44
328
1992030516270000
1301.21
262.46
223.13
16.15
82.19
66.26
54.95
48.22
196.82
134.76
89.74
407.26
278.13
628.06
69.31
64.6
0.24
44
329
1992030516280000
1282.58
264.11
222.83
22.8
82.19
66.26
51.83
50.58
191.34
136.56
94.84
411.26
290.13
625
68.3
64.6
0.24
44
330
1992030516290000
1252.13
265.01
226.43
11.2
82.19
66.26
54.12
48.16
191.12
135.51
84.89
390.99
268.13
635.06
70.31
64.6
0.24
44
331
1992030516300000
1248.98
270.11
230.63
17
82.19
66.26
54.58
45.93
190.37
133.11
88.34
386.02
270.88
607.75
69.17
64.6
0.24
44

Table 1

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 27

Objective
The objective of this study is to investigate the relationship between
the process variables and the 6 output variables describing the quality
of the final product.

Analysis outline
An Overview of the Responses
A PC model of the responses is made to understand:

How the responses relate to each other and to the


observations.

The similarity and dissimilarity between the observations,


and if there are outliers.

The explanatory power of the variables.

Relating the process conditions to the responses

Understand and interpret the relationship between the


process variables and the responses.

Predict the output of new process conditions.

The steps to follow in SIMCA-P

28 Mineral sorting at LKAB

Define the project: Import the primary data set.

Prepare the data (Workset menu).


Specify which variables are process variables (X) and
which are responses (Y).
Expand the X matrix with the squares and cross terms of
the 3 designed variables.

Fit the models, first PC-Y and then PLS, and review the
fit (Analysis menu).

Refine models if necessary by removing outliers


(Workset menu).

Use the PLS model for predictions (Prediction menu).

Tutorial SIMCA-P, SIMCA-P+

Create the project


Start SIMCA-P and import the data file from FILE | NEW

Find the data set (SOVR.XLS).


If you have SIMCA-P+, select the radio button to create a normal
SIMCA-P project and click on Next.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 29

Click on Commands and create Index Variable to generate


Variables numbers, and mark them as secondary ID's.

Mark the columns (Variables) PAR to the end, use the arrow on
one of the variables, and from the drop down menu, select them as
Ys. This selection becomes the default workset.
Click on Next.
30 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

The Import wizard opens. In the Project specification page, you


can change the project name and destination directory.
Make sure the check box use workset wizard is marked and click
on Finish, the workset wizard opens.

Prepare the data


Workset Wizard
SIMCA-P's default workset consists of all the observations in the
primary data set with all variables, scaled to unit variance and
defined as X's or Ys as specified at import.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 31

To Expand the X matrix with squares and/or cross terms press


Use Advanced Mode and click on Expand.

The three variables TON_IN, HS_1 and HS_2 were varied


according to a statistical design (RSM) supporting a full quadratic
model. We will expand the X matrix with the squares and crossterms of these 3 variables.
Mark TON_IN, HS_1, HS_2. Press the button Sq & Cross and the
squares and cross-terms of these 3 variables are displayed in the
expanded list.
Click on OK to exit the workset menu.
32 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

Analysis
To first get an overview of the responses, we fit a PC model of the
Y variables (PCY).

PC of Y
When you exit the workset window, an unfitted model (M1) is
created with model type PLS (The default for a workset with both
X's and Y's). Click on Analysis | Active Model Type and select
PCY. The model type changes to PC-Y. Click on Analysis | 2 First
Components to fit a PC model of the Y's with 2 components.
The model overview plot opens.

Click on the model summary line to open a table with the


summary of the fit of the model. This table displays R2X (fraction
of the variation of the data explained by each component) and
cumulative R2X(cum), as well as the eigen values and the Q2 and
Q2(cum)(cross validated R2). The six Y's are correlated, and are
summarized by two new variables, the scores t1 and t2, explaining
70.9% of their variation.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 33

Scores and Loadings


Scores
Select Analysis | Scores | Line Plot to display the score plot of t1
vs. t2 with a line drawn between the points. In Label Types mark
Use identifier Obs ID (primary).

34 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

The scores t1 and t2, one vector for dimension 1 and 2, are the new
variables computed as linear combinations of the six responses and
summarizing Y.
The score plot shows that the observations cluster in different
groups. Each group represents a setting of the experimental design.
The process ran for a certain time at each of these settings (design
points) to reach stability. Measurements on the process (the
observations in the score plot) were recorded every minute. No
obvious outliers are present.

Loadings
Select Analysis | Loadings | Scatter Plot to display the loadings p1
vs. p2.
In Label Types mark Use Identifier Var ID (Primary) and click on
Save AS Default Options, to always display variable names.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 35

The loadings are the weights with which the variables are
combined to form the scores, t. The loadings, p, for a selected PC
component, represent the importance of the variables in that
component and show the correlation structure between the
variables, here the responses Y.
In this plot we see that PAR, FAR, %P_FAR is positively
correlated and negatively correlated to %Fe_FAR. r_Far
dominates the second component, is here negatively correlated to
PAR and has only a small correlation to the other variables in
component 2. %Fe-Malm is not correlated to any of these variables
in the first two components.
Click on Analysis | Next Component, and compute a third
component. Display the loadings p1 vs. p3. The third component
(explaining 22% of the variation of the data) is dominated by %FeMalm. In the third component this variable has a small positive
correlation to %Fe-FAR, r_FAR and FAR and little to the others.

36 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

Summary of Overview of Responses


No outliers were detected. All of the responses participate in the
model, and are correlated to each other, with the exception of
%Fe-Malm, which is only slightly correlated to three of them.

PLS MODELING
The main objective is to develop a predictive model, relating the
process variables X's to the output measurements (responses) Y.
The experimental design in three of the process variables accounts
for an important part of the variation of the Y's.

New Model Type


Click on Analysis | Active Model type and select PLS.
Another unfitted model, M2, is created and you are ready to fit a
PLS model.

Autofit
Click on Analysis | Autofit, or the fast button
model, with cross validation.

, to fit a PLS

The Model Overview Plot displays R2Y(cum), the fraction of the


variation of Y (all the responses) explained by the model after each
components, and Q2(cum), the fraction of the variation of Y that
can be predicted by the model according to the cross-validation.
Values of R2Y(cum) and Q2Y(cum) close to 1.0 indicate an
excellent model.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 37

Double click on the model summary line to display a list of the fit
of the model per component.
The present model is indeed excellent and explains 80% of the
variation of Y, with a predictive ability (Q2) of 76%.

Summary: X/Y Overview


Click on Analysis | Summary | X/Y Overview | Plot and display
the cumulative R2Y and Q2Y for every response. With the
exception of %Fe-FAR and %P-FAR, all responses have an
excellent R2 and Q2.

Scores t1 vs. t2
Click on Scores | Scatter plot and t1 vs. t2. Use the marker
to label the outlying observation. Observation 208 lies far
away in the first component.

38 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

Scores t1 vs. u1
Right click and in properties select t1 vs. u1, and in Label Types
mark ObsID (Primary). We have a good relationship between the
first summary of the X's (t1), and the first summary of the Y's (u1),
with the exception of observation 208.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 39

Contribution plot
To understand why observation 208 differs from the others in the
first score (t1), in the t1vs u1 plot double click on observation 208.

This contribution plot displays the differences, in scaled units, for


all the terms in the model, between the outlying observation 208
and the normal (or average) observation, weighted by w1* (the
importance of the X-variables in component 1).
The raw iron ore (TON_IN) as well as the load on the grinders and
the other variables were all far below average. Inspecting the data,
we find TON_IN and load on the grinders to be 0 for observation
208, obviously causing a process upset (an outlier) at time 14:27.

Refining the model


We will remove observation 208, set aside a few observations as a
Test set, and then refit the PLS model.

Excluding observation 208 using the


interactive tool box
In the score plot t1 vs u1, mark observation 208 and click on the
. SIMCA-P excludes observation 208 from
red arrow
the workset and asks if you want to generate a new unfitted model
M3. Say Yes.

40 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

The workset bar opens with the workset for model M3.
Observation 208 is excluded. When you display the Dockable
window Observations, 208 is marked excluded.

Removing some observations for a test set


In the Workset bar, hold the Ctrl key and mark observations 140146, 173-179,350-379,551-555, then right click and select
Exclude.
The deleted observations are also marked on the plot

Autofit
Click on Analysis | Autofit or the fast button, to refit the PLS
model.
The Summary | Model Overview plot is updated as the model is
fitted.. Note the improvement in both R2Y(cum) and Q2(cum).
Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 41

Summary: X/Y Overview


Click on Analysis | Summary |X/Y Overview | Plot to display the
cumulative R2Y and Q2Y for every response.

The responses PAR, FAR and %FE_malm are very well


explained (90% or better) and the others a little less well.

Scores t1 vs. t2
Click on Analysis | Scores | Scatter t1 vs. t2 and display the t1 vs.
t2 plot. We see the observations separated in groups, each group
representing a setting of the experimental design.

42 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

Scores t1 vs. u1
In the Properties change the Scores to t1 vs. u1.

We now have an excellent relationship between t1 and u1 with no


outliers.

Loadings w*c1 vs. w*c2


The w*'s are the weights that combine the original X variables (not
their residuals in contrast to w) to form the scores t. In the first
component w* is equal to w. The w*'s are related to the correlation
between the X variables and the Y scores u. X variables with large
values of w* (positive or negative) are highly correlated with u
(and thereby Y).
The c's are the weights used to combine the Y's (linearly) to form
the scores u. The c's express the correlation between the Y's and
the t's (X-scores).

In the first two component, PAR, and FAR are positively


correlated with all the load variables and negatively correlated
with r_PAR, %Fe-FaR and %Fe_Malm. The model is almost
Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 43

linear except for HS_2 and its squared term dominating the second
component.

Normal Probability plot of residuals


Click on Analysis | Residuals | Normal Probability Plot to display
the Normal probability plot of residuals.

Examining this plots,we see the residuals close to normally


distributed with no outliers. Right click and in the Properties page
shift between different Y variables and/or change options.

Coefficients

Click on Analysis | Coefficients | Plot to display the PLS


regression coefficients (for scaled and centered data) for PAR,
with confidence intervals (the default is 95%). The dominating
factors are TON_IN, KR30_in KR40_in and Ton_S3 with a
positive effect. Use the Property bar to change responses or
components.

44 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

Variable Importance
Click on Analysis | Variable Importance. This plot shows the
importance of the terms in the model, as they correlate with Y (all
the responses) and approximate X.

Distance to the Model


Click on Analysis | Distance to the Model | XBlock to display the
distance to the model (how far away an observation is from the
model hyper-plane) in the X space.
These distances are in normalized units and are the same as the
row residual standard deviations.

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 45

Observation Risk
Click on Analysis | Observation Risk

This plot displays the Observation Risk for every Y and for the
pooled Ys.
Using the zoomer around observation 349 which has a large
observation risk we get the following plot:

Observation 349 for Y Far has a larger Y residual when not in the
training set model than when the observation is included in the
model; hence its prediction is uncertain, risky.
46 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

The following list displays the Y (Far) residual when observation


349 is and is not in the model.

Predictions
We can now use the model to predict the outcome of the process
for the Test set observations.
Click on Prediction | Specify Prediction set | Specify. Remove all
observations from Observations in the Prediction set. In the left
window select Workset Complement, click on Select All and use
the arrow to move all the observations to the left window.
Mark observation 208 and click on Remove to exclude it from the
prediction set. Click on Apply and Close this dialog

Tutorial SIMCA-P, SIMCA-P+

Mineral sorting at LKAB 47

Click on Predictions |Y Predicted | Scatter plot. The observed vs.


predicted plot, for PAR, is displayed.

For PAR and FAR (Select from properties), we have excellent


predictions, they are less good for the other responses.
Also look at DModX (under prediction menu).

Summary
This example shows that statistical design in the dominating
process variables gives data with high quality that can be used to
develop good predictive process models. With multivariate
analysis we extract and display the information in the data.

48 Mineral sorting at LKAB

Tutorial SIMCA-P, SIMCA-P+

NIR

Introduction
The following example originates from a research project on peat
in Sweden. Peat is formed by an aerobic microbiological
decomposition of plants followed by a slow anaerobic chemical
degradation. Peat in Sweden (northern hemisphere in general) is
mainly formed from two types of plants, Sphagnum mosses and
grass of Carex type. Within the main groups there is variation
among the species. Depending on location, climate etc. there are
several other plants involved in the peat forming process.
In the project many different types of chemical analyses were
performed to get detailed information about the material and to
investigate differences among different peat types. Chemical
analysis was performed according to traditional methods (GC,
HPLC, etc.) which often were laborious and time consuming. To
speed up the analysis of samples, Near Infrared Spectroscopy
(NIR) together with multivariate calibration was introduced. This
strategy was found to work very well and after the calibration
phase, samples were analyzed in minutes instead of weeks.
In this tutorial we selected a subset of samples, which represents
the typical variation of peat in Sweden.

Data
Variables
Variables 1-19 represent spectra from the NIR instrument, which
in this case was a 19 channel filter instrument. Spectra are
recorded as Log (Absorbance) and then scatter corrected by a
MSC procedure.
Variables 20-46 represent different chemical analyses, which the
NIR spectra can be calibrated against.

Tutorial SIMCA-P, SIMCA-P+

NIR 49

Var. No.

Type

Name

Explanation

1-19

NIR

Log Absorbance

20

Rhamnos

Mono saccharide

21

Fucos

Mono saccharide

22

Arabinos

Mono saccharide

23

Xylos

Mono saccharide

24

Mannos

Mono saccharide

25

Galaktos

Mono saccharide

26

Glukos

Mono saccharide

27

Klason l

Klason Lignine

28

Bitumen

Bitumen

29

Aspargin

Amino acid

30

Threonin

Amino acid

31

Serin

Amino acid

32

Glutamin

Amino acid

33

Prolin

Amino acid

34

Glycin

Amino acid

35

Alanin

Amino acid

36

valin

Amino acid

37

Methionin

Amino acid

38

Isoleucin

Amino acid

39

leucin

Amino acid

40

Tyrosin

Amino acid

41

Fenylalanin

Amino acid

42

Histidin

Amino acid

43

Lysin

Amino acid

44

Aginin

Amino acid

45

Glucose-amin

Amino sugar

46

Galactos-amin

Amino sugar

Variable 27 (Klason l) is Klason Lignin (rest after hydrolysis) and


variable 28 is Bitumen, which represents carbohydrates solvable in
acetone.

Observations
From a huge number of peat samples 41 were selected,
representing the main variation of peat in Sweden. The sample
(observation) names are coded in all 20 characters. Each position
50 NIR

Tutorial SIMCA-P, SIMCA-P+

in the names carries certain information. In the plots a sub-string


of two characters (position 6 and 7) are often used. Position 6
represents the degree of decomposition, L (low), M (medium) and
H (high). Position 7 represents peat type, S (Sphagnum) and C
(Carex).

Objective
The objective of this study is to model and predict different
constituents of samples of peat directly from their NIR spectra. 41
samples of peat, mainly of two types Sphagnum and Carex, were
subjected to NIR spectroscopy. The spectra were recorded at 19
wavelengths (19 filters) with a reflectance instrument (log(abs))
and scatter corrected before the analysis.
For this objective, we will now develop a PLS model relating the
X variables (NIR spectra) to the Y variables (peat constituent
concentrations measured by traditional analysis).

Analysis Outline

Making a PLS model relating the NIR spectra variables to


the peat constituents in order to:
Understand and interpret the relationship between the
spectra (X) and peat composition (Y variables).

Develop separate PLS model for each type of peat


(Sphagnum and Carex), to:
1) Increase the precision of the calibration.
2) Be able to classify and predict peat types.

The steps to follow in SIMCA-P are:

Tutorial SIMCA-P, SIMCA-P+

Define the project: Import the primary data set.

Prepare the data (Workset menu).


a) Specify which variables are process variables (X) and
which are responses (Y)
b) Transform the variables
The responses are concentrations of the chemical
constituents of peat, and their variation is non linear, a
Log transformation is warranted.
(log Y + 0.1) with 0.1 to make sure that all values are
positive before the transformation.
c) Group the observations in 2 classes for peat type
Sphagnum and Carex.

Fit the model, a PLS of all the data (Analysis menu).

Fit a PLS model for each of the peat type, Sphagnum and
Carex.

Use the PLS model for predictions and classification


(Prediction menu).

NIR 51

Create the project


Start a new project. The data set name now is NIRKHAM.XLS
Start SIMCA-P and create a new project from FILE: NEW.
If you have SIMCA-P+, select normal SIMCA-P project.
The import wizard opens.

The first two columns are correctly marked as observations


numbers and names and the first raw is variable names.
Mark the first raw and click on Variable secondary Ids to have a
variable index.
Mark the variables starting with Ramos to end, and from the
combo box select Y Variable.

52 NIR

Tutorial SIMCA-P, SIMCA-P+

SIMCA-P marks these variables as Y (response) variables.


Click on Next to open the Project specification page. You can
change, as desired, the destination folder, or the project name.
Click on Finish, the data set Nirkham is imported.

Prepare the data


Default Workset
SIMCA-Ps default workset consists of all the observations in the
primary data set with all variables, scaled to unit variance and
defined as X's or Ys as specified at import. This is the starting
workset when you select Workset | New.

Transform the variables


Click on Workset | New and select the Transform tab.
Mark all the Y variables, select Log, with C1= 1 and C2=0.1
(some of the concentrations are 0.0), and click on Set.

Tutorial SIMCA-P, SIMCA-P+

NIR 53

Group observations in classes:


Select the Observations tab and display the secondary ID's
Right click on the Primary ID's and select Observation label.

To group observations in 2 classes the Carex and Sphagnum click


on From Obs ID and select Obs Sec ID.

54 NIR

Tutorial SIMCA-P, SIMCA-P+

And select start position 7 for length 1.

The Carex are set to class 1 and the Sphagnum to class2


One observation, 21 (not Sphagnum or Carex type) is set to class 3
as it belongs to neither group. Mark it and set to no class.
Click on OK to exit the Workset window.

Analysis
When you exit the workset window, an unfitted model (M1) is
created with model type PLS class (The default for a workset with
both X's and Y's and classes). In Analysis Model Type change it to
PLS. You are ready to fit a PLS model.

PLS model of all the samples


Autofit
Change the model type to PLS and click on Analysis | Autofit. The
model overview plot is updated as the model is fitted. This plot
displays R2Y cumulative by component and Q2 Y cumulative by
Tutorial SIMCA-P, SIMCA-P+

NIR 55

component. R2 Y is the fraction of the variation of Y (all the


responses) explained by the model after each component, and Q2Y
is the fraction of the variation of Y that can be predicted by the
model according to the cross-validation. Values of R2Y(cum) and
Q2Y(cum) close to 1.0 indicate an excellent model.

Double click on the Model Summary line to display the


corresponding list.

Multivariate calibration with NIR spectra often leads to many


components due to the high precision of the data. The present
model is indeed excellent and explains 88.2% of the variation of
Y, with a predictive ability (Q2) of 73.9%.

Summary: X/Y Overview


Click on Analysis | Summary | X/Y Overview | Plot and display
the cumulative R2Y and Q2Y for every response. Use the
Properties page to select variable labels and Click on Save As
default Options to always have variable names.. With the
exception of Bitumen all responses have an excellent R2 and Q2.

56 NIR

Tutorial SIMCA-P, SIMCA-P+

Scores t1 vs. t2
Click on Analysis | Scores | t1 vs. t2 plot. Use the Marker to mark
the outlying observation, and then use the label button to label it.
Observation 32 lies far away in the second component, indicating
that sample 32 is different with respect to NIR spectra.

Comparing the spectra of observation 32 and 39


Mark both observations, right click and select Plot Xobs to display
the spectra of these 2 observations.

Tutorial SIMCA-P, SIMCA-P+

NIR 57

Scores t1 vs. u1
We have a good relationship between the first summary of the X's
(t1), and the first summary of the Y's (u1), with some spread in the
data.

To display informative labels, select in properties Obs Sec ID, start


in position 6 for length 2.

58 NIR

Tutorial SIMCA-P, SIMCA-P+

You can now distinguish two groups of observations, S Sphagnum


peat and C Carex peat.

Scores u1 vs. u2
The projection of the samples in the Y space (traditional chemical
analyses) does not show observation 32 as outlier as in the Scores
plot.. NIR spectroscopy can detect very small changes in chemical
composition (PPM level) compared to the traditional analyses
which typically have large measurements errors (3-50%). With
NIR spectroscopy one achieves better control of the samples.

Contribution plot
To understand why sample 32 differs from the others, double click
on observation 32 in the Scores t1 vs. t2.

Tutorial SIMCA-P, SIMCA-P+

NIR 59

This contribution plot displays the differences, in scaled units, for


all the terms in the model, between the outlying observation 32
and the normal (or average) observation, weighted by w*1 w*2 (the
importance of the X-variables in component 1, 2
In the plot we see some spectral variables close to 8 standard
deviations, indicating some contamination in this sample. We shall
remove sample 32.

Loadings w*c1 vs. w*c2


The w*'s are the weights that combine the original X variables (not
their residuals in contrast to w) to form the scores t. In the first
component w* is equal to w. The w*'s are related to the correlation
between the X variables and the Y scores u. X variables with large
values of w* (positive or negative) are highly correlated with u
(and thereby Y).
The c's are the weights used to combine the Y's (linearly) to form
the scores u. The c's express the correlation between the Y's and
the t's (X-scores).

60 NIR

Tutorial SIMCA-P, SIMCA-P+

This plot shows how the different chemical compounds correlate


to the different parts of the NIR spectra. Plots displaying the
loadings, one component at a time, may be more informative.

Loadings: Column plot w*c1


Click on Analysis | Loadings | Column plot w*c1.. This plot shows
the importance of different parts of the NIR spectra, in the first
component, to explain the variation among the constituents of the
peat.

Excluding sample 32
Display the Score plot (t1 vs t2), mark observation 32 and click on
the red arrow to exclude this sample from the workset. SIMCA-P
excludes this sample from the workset and creates a new unfitted
model with model type PLS class(1)

Tutorial SIMCA-P, SIMCA-P+

NIR 61

Separate PLS models for the Sphagnum and


Carex
Autofit class models

Click on the fast button Autofit class models to fit both classes.

Sphagnum Model, class 2


Note that the model for class (2), the Sphagnum has only one
component as the second component was not significant. When we
take the second component and the third, we find it significant. We
continue taking components until not significant. The model has 7
significant components.

The present model is excellent and explains 86% of the variation


of Y, with a predictive ability (Q2) of 41%.

62 NIR

Tutorial SIMCA-P, SIMCA-P+

Summary: X/Y Overview


Click on Analysis | Summary | X/Y Overview | Plot to display the
cumulative R2Y and Q2Y for every response. All responses have
excellent R2 and good Q2 values.

Scores t1 vs. u1 and t1 vs. t2


These plots do not show any outliers.

Model class 1 (Carex peat)


The model with 7 components is not as good as the preceding one,
and though it explains 84.4% of the variation of Y, it only has a
predictive ability (Q2) of 12.6%.

This is mainly due to the fact that the Carex peat is not as rich in
carbohydrates as the Sphagnum peat, and the variations in the
chemical constituents are small.

Scores t1 vs. u1 and t1 vs. t2


These plots do not show any outliers.

Predictions
We now have two good models describing the relation between
NIR spectra and Chemical composition of peat and they can be
used to classify peat samples as Sphagnum or Carex.

Tutorial SIMCA-P, SIMCA-P+

NIR 63

In this tutorial we do not have new peat samples. However, we


will use the data set and classify every sample with respect to the
two models. We first will want to remove sample 32.

Making a prediction Set


By default the prediction set is all of the primary data set.

Cooman's Plot
Exclude sample 32 from the Prediction set (Predictions | Specify
Prediction set | Remove observation 32 from prediction set) and
display the Cooman's plot.
This plot displays the Distance to the model of every observation
with respect to model M2 and M3, and shows a very good
separation between the Sphagnum and the Carex peat samples.

Sample 21 is correctly classified as being neither a Sphagnum


sample nor a Carex peat sample.

Summary
As a tutorial, this provides just a brief introduction to the main
functionalitys and plots in SIMCA-P. We recommend that you
continue with your own data, may be another tutorial, and then
look in the Manual for details. The Help system contains the same
information as the Manual, but organized in a different way.

Plots and Lists


You can display the results of SIMCA-P in numerous graphs and
lists.
From the Analysis and Prediction menu, results of the active
model are available as quick plots and lists. With the menu
64 NIR

Tutorial SIMCA-P, SIMCA-P+

Plot/List, you have access, to the raw data and every computed
value from every model. You can even plot coefficient vectors
from different models against each other.

Tutorial SIMCA-P, SIMCA-P+

NIR 65

Hierarchical Models

Introduction
This example illustrates the use of hierarchical multivariate
modeling (PCA and PLS), using a small set of process data.
Details of the process are not revealed for proprietary reasons, but
a general outline is given below.

Data
In this process, raw materials are combined and reacted to give a
product with certain properties measured by 8 y-variables. Two of
these, y6=impurity level, and y8=yield are the most important.
The feed is described by 7 input X-variables (x1-x7), and 18
intermediate process variables from steps such as reaction (x8x15) and purification (x16-x25) are also available.
The data are collected hourly, and comprise 92 observations. The
process functioned fairly well to around obs. 79, but then went out
of control and was closed down at point 92.

Objective
To understand the relationship between the two most important y
variables (y6= impurity, and y8= yield) and the three steps of the
process, feed (x1-x7), reactor (x8-x15), and purification and work
up (x16-x25).
We shall do the following, using obs. 1-79 as a training set:
15. PLS model of X= feed (x1-x7) with y6 and y8 (Block 1)
16. PLS model of X= reactor (x8-x15) with y6 and y8
(Block2)
17. PLS model of X= purification (x16-x5) with y6 and y8
(Block 3)
18. PCA model of less important y's (y1 to y7 not including
y6) (block4)
19. Top level hierarchical model with scores of blocks 1 3 as
X and scores of block 4 plus y6 and y8 as Y.
The objective of Block Models 1 to 4 is to summarize the various
steps of the process by scores to then be used as X variables the
top level model.
Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 67

Analysis Outline
The steps to follow in SIMCA-P are:

Create the project by importing the data set

Generate and fit the three PLS model for X-blocks 1-3
(obs.1-79), and mark them as Base hierarchical.

Generate and fit the PC model for block 4., and mark it as
base hierarchical

Generate and fit the top level hierarchical

Interpret the hierarchical model

Validate the hierarchical model with the test set (obs. 8092)

Create the project


Start a new project. The data set name proc1a.dif
Start SIMCA-P and create a new project from FILE | NEW
If you have SIMCA-P+ make sure to select Create a SIMCA-P
normal project.

Holding the CTRL key, mark Y6+ and Y8+, click on X variables
and select Y variables, to make these 2 variables Ys.

68 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Click on Command | Create Index | Variable and generate a


variable index.
Click on Next. Now you can change the project name and
destination directory. Click on Finish and the project is imported.

Summarizing the feed


Workset
Select Workset | New. In variable blocks, keep x1 to x7 as X's and
y6 and y8 as Y's, exclude all other variables. Right click on
variables name and mark variable secondary ID's to display
variable number.

In Observations, exclude observations 80 to end, for a test set.


Click on OK to exit the workset.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 69

Analysis
Autofit the model, the model window opens and is updated as the
model fits. One component is significant. Take 3 more component
as the objective here is to summarize the X block (feed).
Double click on the model title and call the model Feed.
Double click on the Summary line of the model to display its
details:

The model explains 65% of X, hence the scores of model M1are a


good summary of the feed.

Scores t1 vs. t2
Click on the fast button

to display the scores t1 vs t2.

Observation 1 is an outlier. Double click on it with the


Contribution tool and SIMCA-P displays the Contribution plot.

70 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Position the cursor on x6in, to shows its value in the data set (4.54)
The trend plot of that variable (double click on it) shows clearly an
abnormal value of that variable in observation 1.
We shall exclude observation 1 using the interactive marker and
the red arrow and refit the model.

Fitting the model without observation 1

Model M2 is very similar to model M1 (make sure you take 4


components) and explains 64% of the variation of the feed.

Loadings p1
The loading p1 is the vector of weights that combine the original X
variables to form the scores t1. The first dimension explains 32%
of the X, i.e., the feed. You can think of t1 as a new variable,
summarizing the feed and explaining 32% of their variation.
To display p1, click on Analysis | Loadings | Column plot and p1.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 71

In the first dimension, all the feed variables, with the exception of
x4 and x5 are well summarized by t1.

Summarizing the reactor


Workset
Prepare a workset with the reactor variables as X (variables 16 to
23), and y6+ and y8+ as Y's. Select only observations 2 to 79.
Use the menus Workset | New as model M2, then select the
variables in Variable Blocks. The included observations will be 2
to 79.
The workset should look like this:

Click on OK to exit the workset.


72 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Analysis
Click on Autofit and take 2 extra components.

With 4 components model M2 is a good summary of the reactor


explaining 76% of the variation of X. Call the model reactor.

Scores t1 vs. t2
This plot shows no serious outliers.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 73

Loadings p1 and p3 (the 2 most important


components)

Summarizing the purification


Workset
Prepare a workset with the purification variables as X (variables
24 to 33), and y6+ and y8+ as Y's. Select only observation 2 to 79.
Use the menus Workset | New as model M2, then select the
variables in Variable Blocks. The observations will be correct.
The workset should look like this:

74 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Click OK to exit the worksheet and Autofit the model.


There are 4 significant components, explaining 65% of X.

The third component explains the most of X. Click on Analysis |


Loadings | Column and select p3.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 75

Summarizing the less important Y's


Workset
In Workset | New, Variable Blocks, select as X variables, variables
8 to 12 and 14.
Select only observation 2 to 79. The workset will look like this:

Exit the workset and Autofit the PC model.


SIMCA-P will extract 0 components, as none are significant. Take
3 components.

76 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Preparing for the hierarchical model


Right click on model M2 and check hierarchical base Model
Scores.
The scores of model M2 are added to the workset as new variables.

Do the same for models M3 to M5. All these Models are marked
B.

Workset of the top level model


In Workset | New, Variable Blocks start by selection All and
Exclude.
Select as X's, all the scores from models M2, M3 and M4.
Select as Y's, y6+ and y8+ and all the scores of model M5.
Select observation 2-79, start with New | As model M2.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 77

(continued, upper window scrolled down)

Exit the workset.

Analysis
Autofit the model.
There are 4 significant components explaining 53% of the Y's.

78 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

The Summary | X/Y Overview, shows that the 2 most important


variables y6+ and y8+ are well explained and predicted.

The score plot (t1 vs. t2) of the top level model

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 79

This plot is colored by the values of y6+ in Model M6 , the side


product. To display the legend, use Plot Settings | Plot area from
the pop up menu. The process starts up to the right with high
values of y6+, moves down to the left with lower values, and then
is manipulated to give lower values of y6+ (upper left quadrant).
The process then becomes unstable and moves back to the right.

The w*c plot

The important y variable (y6+), the side product, is on the right of


the plot. Positively correlated to y6+ are the first component of the
feed, and the second component of the reactor and purification.
We also see that y6+ is negatively correlated to the first
component of model M5 (a summary of the less important Y's).
Y8+, the yield is negatively correlated to the first component of
the purification.
With the contribution tool, double click on any score variable
point, and the corresponding loadings opens. This plot shows us
the important original variables in that score.
For example M3 t2 was positively correlated with y6+. In the
loadings of M3 (reactor) second component, we see that variables
2, 3, and 8 are positively correlated with y6+, while variables 1, 4
and 7 are negatively correlated with y6+.

80 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

This gives us a zoom in zoom out picture. In the wc plot we


understand relationships between the 2 important y's and the
section of the process, i.e. the feed, reactor, purification. In the
loadings plot we understand which variables in the feed, reactor,
purification, dominates and its relationship to the y's.
Click on the other score variables to display their loadings.

Coefficients
Click on Analysis | Coefficients.

For y6+ the dominating variables are M2 (the feed)t1 (first


component), M3 (reactor) t2 and t4 (second and fourth component)
and
Use the contribution tool and double click on any of these
variables to open the corresponding loading plot.
For example double click on M3 (reactor) t4.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 81

Again we see here the importance of variable 3 as being positively


correlated with y6+.

Variable Importance (VIP)

The most important variables, for all the y's, are t2 and t1 of the
reactor, and t1 and t2 of the purification. You can use the
contribution tool to display the corresponding loadings.

82 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Observed vs. Predicted

Observation 41 is an outlier, and has a large residual.


Using the contribution tool, double click on observation 41.

t4 of purification is the culprit variable. Double click on it, to


display the contribution with the original variables. This plot
points to variable 6 in the purification as being much too low.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 83

The time series plot of this variable (double click on it) in the data
set, shows an abnormal value for observation 41.

Predictions
Make sure Model M6 (the top level) is the active model.
To specify the prediction set click on Predictions | Specify
Predictions set | Specify and remove observation 1 (outlier).
Remember that obs. 2-79 actually comprise the training set. They
are still included below for comparison in the plots.

84 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

DModXPS

From observation 80 on the process becomes unstable with


DModXPS quickly increasing.
The contribution plot for observation 91 (large DModXPS) shows
the feed (t2, t3 and t4) as being the problem.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 85

With the Contribution tool, double click on the feed, in all 3


components points to variable 6.

The trend plot confirms the problem with this variable

86 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Scores tPS1 vs. tPS2 colored by test set and


training set

The model was based on normal operation up to observation 79.


The predicted scores from observation 80 on, colored red are show
clearly that the process is going out of control.

Cusum Chart
Click on Predictions | Control charts | Cusum and select subgroup
1.

A contribution plot around observation 76, shows both the feed


and purification were related to the problem.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 87

Double clicking on both the feed and the purification shows the
culprit variables.

88 Hierarchical Models

Tutorial SIMCA-P, SIMCA-P+

Conclusion
Hierarchical approach to multivariate analysis greatly enhances
our ability to understand complex problems.
The zoom in zoom out capabilities, allows us first to understand
complex relationships in terms of components of a process and
then zoom in on a single component to resolve the details in terms
of the process variables.

Tutorial SIMCA-P, SIMCA-P+

Hierarchical Models 89

Spectral Filtering and


Compression, including OPLS

Introduction
This example illustrates the use of spectral filtering and wavelet
compression with multivariate calibration. The recently added
OPLS approach (Orthogonal OPLS) is also demonstrated.
The data set of this example was collected at Akzo Nobel,
rnskldsvik, in Sweden. The raw material for their cellulose
derivative process is delivered to the factory in form of cellulose
sheets. Before entering the process the cellulose sheets are
controlled by a viscosity measurement, which functions as a
steering parameter for that particular batch.
In this data set NIR spectra for 180 cellulose sheets were collected
after the sheets had been sent through a grinding process. Hence
the NIR spectra were measured on the cellulose raw material in
powder form. For calculation of a calibration model 160 samples
spectra were used. A selection of 20 spectra was used for model
validation.

Data
The data consists of:
X: 1201 wavelengths in the VIS-NIR region
Y: Viscosity of cellulose powder.

Objective
The objective of this study is to develop a good calibration model
with the 160 samples and validate this model with the test set of 20
samples.
We will use orthogonal signal correction (OSC) to improve the
calibration model, and we will compress the X matrix, with
orthogonal wavelets, for efficiency and fast computation.
The results of the model after OSC and wavelets compression will
be compared to the results of the model with the original data.
Finally OPLS will be run on the same data.

Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 91

Analysis Outline

Make a PLS model relating the NIR spectra variables to


the viscosity with the original data.

Review and validate the calibration model with the test


samples.

Apply OSC and wavelet compression to the X matrix

Make a PLS model with the OSC and wavelet


compressed data.

Review and validate this model and compare the results


to the calibration model made with the original data.

Finally, OPLS is run and the results are compared to the


previous analyses.

The steps to follow in SIMCA-P are:

Define the project: Import the primary data set.

Prepare the data (Workset menu).


a) Specify which variables are process variables (X) and
which are responses (Y)

b) Exclude 20 specific samples from the training set for a


test set.

Fit the calibration model, and review the fit (Analysis


menu).

Validate the model with the test set. (Prediction menu)

Use Spectral Filter, OSC, followed by wavelet


compression (Dataset menu)

Prepare the data (Workset menu)

Fit a PLS model on the spectral filtered data (Analysis


menu)

Validate this model with the test set and compare the
results (Prediction menu)

Return to the first project (original data) and change to


OPLS (Analysis / Change model type), Fit model with
Autofit, Predict (use same prediction set), and compare.

Create the project


Start a new project with the data set Malyx.mat
Start SIMCA-P and define a new project from FILE: NEW.
The file is a Matlab file. Note that there are no variables names, no
observations numbers or names.

92 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

The import wizard opens, make sure to select a normal SIMCA-P


project, if you are using SIMCA-P+.
Make 1.st column as Y: In the first column top cell, click on the
arrow and from the Combo select Y Variable (Viscosity).

Click on Next to open the Project specification page. You can


change, as desired, the destination folder, or the project name.
Click on finish, the data set Malyx (we named it Malyx) is
imported.

Plotting the Spectra


With the dataset open and active, right click and select Plot Xobs
to plot the spectra.

Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 93

All spectra are plotted together:


MALYX.DS1 MALYX
Observation
1.20
1.10
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0

100

200

300

400

500

600

700

800

900

1000

1100

1200

Num
SIMCA-P+ 11 - 5/3/2005 4:17:38 PM

Prepare the Data


Workset
SIMCA-Ps default workset consists of all the observations in the
primary data set with all variables, scaled to unit variance and
defined as X's or Ys as specified at import This is the starting
workset when you select Workset | New.

Workset | New
The Workset window opens with the variable names, the variable
block X or Y, scaling (default UV), and the observations numbers.
94 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

Change the scaling of the X variables to CTR


(centered only)
Click on Scale, mark variables 2 to 1202, select in Base Ctr and
click on Set. The X variables are now just centered, and not scaled.

Exclude observations for the Training set


Click on Observations and mark the following 20 observations: 45, 18-20, 30-34, 100-104, and 130-134 and click on Exclude.

All these observations are now excluded from the training set.
Click on OK and exit the workset menu.

Analysis
When you exit the workset window, an unfitted model (M1) is
created with model type PLS (The default for a workset with both
X's and Y's). You are ready to fit a PLS model.

PLS model
Autofit
Click on Analysis | Autofit, or use the fast button.
Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 95

The model overview plot updates as the model is fitted.


MALYX.M1 (PLS)

R2Y(cum)
Q2(cum)

1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20

Comp[7]P

Comp[6]P

Comp[5]P

Comp[4]P

Comp[3]P

Comp[2]P

0.00

Comp[1]P

0.10

Comp No.
SIMCA-P+ 11 - 5/3/2005 4:28:42 PM

Double click in the project window on the model summary line to


display the details by component.

R2Y(cum) the fraction of the variation of Y explained by the


model after 7 components, is equal to 0.756 and Q2(cum) the
fraction of the variation of Y that can be predicted by the model
according to the cross-validation is equal to 0.686. Values of
R2Y(cum) and Q2Y(cum) close to 1.0 indicate an excellent model.
For a calibration model, model M1 is a rather poor model.

Scores t1 vs. u1
Click on Scores: t1 vs. u1 to display the t1 vs. u1 plot. The
relationship between t1 and u1 is not very good in particular for
the cluster of samples 162-165 etc.

96 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

MALYX.M1 (PLS)
t[Comp. 1]/u[Comp. 1]
0.50

y=1*x-1.063e-008
R2=0.4213
63
66

0.00

u[1]

174
172
171
170

-0.50

169

158
159

29
27
136
13
22
40
43
16
59 99 1424 12
9387 88 25
94
15
26
28
976105116
98
91
90
89111
383
115
67
106108
112
125
2
841 107
95
9646
1192
6478
48
35
47
7 114
113
120
119
45
61126
109
110
53
76
7562
121
117
118
77
51 122
57
69
21
70
83
84
123124
52
54
42
910
44
55 49
39
73
56 50
71
23
60 5868
72 17
74
145150
152
37
157
16079
149 147129
65173
15680
86
85
82
81
161
144
148
137
143151
135146
154
175
176
155
139177140
178127
180179

166167

142 128141138

165

-1.00

168
162 164 163

-1.50
-0.80

-0.70

-0.60

-0.50

153
-0.40

136

-0.30

-0.20

-0.10

0.00

0.10

0.20

0.30

0.40

0.50

t[1]
R2X[1] = 0.280355

SIMCA-P+ 11 - 5/3/2005 4:35:30 PM

Plotting the Spectra of selected observations


Press on the CTRL key and mark the cluster of observations
around observations 168 down left, and 27 high right, then right
click and select Plot Xobs
to plot the Spectra in original units.

Zooming on the plot , one clearly sees a separation between the


two groups of spectra.

Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 97

Loadings
Click on Loadings | Line plot

Remove the series w*c1, select under Items w* and all


components (*) and click on Add Series, then on OK.

98 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

MALYX.M1 (PLS)
w*
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-0.10
-0.20
-0.30
0

100

200

300

400

500

600

700

800

900

1000

1100

1200

Num
R2X[1] = 0.280355 R2X[2] = 0.66347
R2X[3] = 0.0369157 R2X[4] = 0.00567819
R2X[5] = 0.00847137 R2X[6] = 0.0010421 R2X[7] = 0.0013327 SIMCA-P+ 11 - 5/3/2005 4:43:42 PM

Components 1 to 3 capture almost 60% of the variation of Y. The


other components are small correction. To display the first three
loadings, open the properties page, mark series 4 to 7 and click on
Remove and Apply.
MALYX.M1 (PLS)
w*

w*[1]
w*[2]
w*[3]

0.060
0.040
0.020
0.000
-0.020
-0.040
-0.060
0

100

200

300

400

500

600

700

800

900

1000

1100

1200

Num
R2X[1] = 0.280355 R2X[2] = 0.66347

R2X[3] = 0.0369157
SIMCA-P+ 11 - 5/3/2005 4:44:47 PM

The regions around 200, 400, 700 -- 800, and 900 capture most of
the information.

Distance to the Model (DmodX)


MALYX.M1 (PLS)
DModX[Last comp.](Normalized)
2.50

DModX[7](Norm)

2.00

1.50

1.00

D-Crit(0.05)

0.50

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

Num
M1-D-Crit[7] = 1.163

Tutorial SIMCA-P, SIMCA-P+

1 - R2X(cum)[7] = 0.002735
SIMCA-P+ 11 - 5/3/2005 4:47:27 PM

Spectral Filtering and Compression, including OPLS 99

Several samples have a distance to the model larger than the


critical distance, indicating data inhomogeneity.

Observed vs. Predicted


The predictions are poor particularly for a cluster of samples as in
the t1 vs. u1 plot. They can be labeled by marking them and then
clicking on the selected item fast button, and selecting labels as
primary obs label.
MALYX.M1 (PLS)
YPred[Last comp.](Var_1)/YVar(Var_1)

1800

y=1*x-7.256e-006
R2=0.7563

1600

YVar(Var_1)

1400
1200
1000
800
600

136
168
500

600

700

800

153164
163
162

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

YPred[7](Var_1)
RMSEE = 139.821

SIMCA-P+ 11 - 5/3/2005 4:49:31 PM

A zoom in on these points shows their labels.

Validating the Model 1


Click on Predictions | Specify Prediction set | Complement Workset.
The prediction list opens. Click on Predictions | Y Predicted |
Scatter plot.

100 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

MALYX.M1 (PLS), PS-Complement Model 1


YPredPS[Last comp.](Var_1)/YVarPS(Var_1)
1800

4
101 102
20 104
100
103 30
33
34
315
18
19

1700

YVarPS(Var_1)

1600
1500

32

132
131
133
134

1400
1300
1200
1100

130
1000
1000

1100

1200

1300

1400

1500

1600

1700

1800

YPredPS[7](Var_1)
RMSEP = 110.255

SIMCA-P+ 11 - 5/3/2005 4:57:43 PM

The predictions are reasonable with an RMSEP of 110 compared


to the training set RMSEE of 140.

Orthogonal Signal Correction and Wavelets Compression


This rather poor model may indicate systematic variation in the X
block that is not related to the response Y. This is corroborated by
the similarities between w*2 and w*3 (see loading plot above).
We will apply Orthogonal Signal Correction (OSC) to the X block
(the NIR data) to remove the systematic variation in X not related
to Y and then for speed and efficiency we will wavelets compress
the X block.
Click on Dataset | Spectral Filters | Combination | OSCWavelet

The first variable is marked as Y. Exclude the test set of


observations: 4-5, 18-20, 30-34, 100-104,130-134, and click on
Next.

Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 101

SIMCA starts OSC and extracts one component; click on Next to


extract the second component as two components are usually
recommended.
The angle of both components was 90 degrees indicating
orthogonality and the remaining Sum of Squares after the second
component is 13%
Hence, 87% of the variation in X was not related to Y and was
removed from X.
Click on Next to perform the wavelet compression.

The wavelet window opens. Select Daubachies 10 wavelet. Select


Variance as compression method and DWT (Discrete Wavelet
Transform) as NIR signals are smooth and DWT is recommended
for low frequency signals. Click on Next.
The wavelet transform is performed, and SIMCA displays a plot of
the percentage of variance explained by the largest coefficients.

We shall select to keep 50 (enter 50) in the box, these 50


coefficients explain 99.93% of the variation of X matrix and click
on Next.

102 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

SIMCA-P creates a new project with the OSC and wavelet


compressed data. You can change the default name of the project,
and select a different destination directory.
The test set (excluded observations) are automatically signal
corrected and wavelet compressed, in the4 same way as the
training set, and made into a prediction set. You can change the
default name of the prediction set and click on Finish.

You are switched to the new project.

Model with the Signal corrected and compressed data


Summary of the preprocessed project
Click on Dataset | Filter Summary to display a summary of the
preprocessing done on the project.

Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 103

Change the default Scaling


Click on Workset | Edit model 1, select the Scale tab and mark all
the X variables 1 to 50 and change the scaling to Ctr, then exit the
workset window.
Fit the PLS Model
Click on Analysis | Autofit

The first component explains very little of the variation, the second
component is highly significant. Together the two components
explain 94% of Y, cross-validated to 93%. This is an excellent
model.

Scores t2 vs. u2
Display the t2 vs. u2 (t1 vs. u1, explained only 11 %). This is now
a good relationship.

104 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

Loading plot w*2


MALYX_OSCW.M1 (PLS)
w*[Comp. 2]
0.100
0.080
0.060
0.040

w*[2]

0.020
0.000
-0.020
-0.040
-0.060
-0.080
0

100

200

300

400

500

600

700

800

900

1000

1100

1200

Num
R2X[2] = 0.0786525

SIMCA-P+ 11 - 5/3/2005 6:22:04 PM

This plot is reconstructed, by default, from the wavelet domain to


the original domain. It shows again that the information in the
spectra is located around 400, 700 -- 800, and 900 wavelength, as
in the model with the original data.
Observed vs. Predicted

Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 105

The observed vs. Predicted plot is greatly improved from the


previous model based on the unfiltered data.

Validating the Model 2


Click on Predictions | Specify Prediction set | Data set and select
your prediction set. Click on Predictions | Y Predicted | Scatter

The predictions for the test set have greatly improved with the
OSC treated data. The RMSEP is now 87.

106 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

Conclusion OSC-Wavelets
This example illustrates how Orthogonal signal correction (OSC)
sometimes greatly improves the calibration model when the signal
contains large systematic variation not related to Y, such as
baseline shifts etc. Wavelet compression efficiently compresses
the signal form 1201 observations to 50 with very little loss of
information.

OPLS (Orthogonal PLS)


Return to original project (Malyx), click on Analysis / Change
Model Type, and select OPLS

Autofit gives 8 components compared with 7 for PLS:


MALYX.M2 (OPLS)

R2Y(cum)
Q2(cum)

1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20

Comp[8]O

Comp[7]O

Comp[6]O

Comp[5]O

Comp[4]O

Comp[3]O

Comp[2]O

0.00

Comp[1]P

0.10

Comp No.
SIMCA-P+ 11 - 5/5/2005 9:40:31 AM

Return to the original PLS model and add one component


(Analysis / next component, or corresponding fast button). Note
that the PLS and OPLS models now have the same R2Y and R2X,
but the OPLS model shows a higher Q2(cum).
The model window (below) now shows the first line as the Yrelated single component, and the following are the Y-orthogonal
Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 107

ones. The bottom line shows a summary of the model after all
components.

Scores u1 vs. t1
The t/u plot now looks much better than the PLS one, because
OPLS has rotated the solution to put all Y-related variation into the
first component.

Loadings, w1
The first components PLS-weights now look like the spectrum of
the ingredient related to y. This is one of the greatest advantages
with OPLS, it makes the loading interpretable!

108 Spectral Filtering and Compression, including OPLS

Tutorial SIMCA-P, SIMCA-P+

Predictions
The prediction set remains the same unless you have changed
something in between. In such case, restore the prediction set to
what it was, and continue.
Under Predictions/ Ypred / scatter plot, the following is obtained:

The RMSEP (prediction SD) is now 111.3, precisely the same as


the original PLS model. This shows that usually OPLS.

Conclusions
OSC, Wavelets, and OPLS are tools that have some additional
features beyond ordinary PLS making these tools useful. OPLS
makes the PLS model easier to interpret only one component,
and an interpretable loading plot. Wavelets compress the spectra
with little loss of information, and, sometimes, especially in
combination with OSC (OSC-Wavelets) even improves the
predictions somewhat.
Tutorial SIMCA-P, SIMCA-P+

Spectral Filtering and Compression, including OPLS 109

Batch Modelling with SIMCA-P+

Introduction
The following example is taken from the article:
J.MacGregor and P.Nomikos, Multivariate SPC Charts for
Monitoring Batch Processes, Technometrics Vol. 37 No. 1 (1995)
41-57
The duration of a batch was 2 hours. During this period, 10
variables were measured every 1.2 minutes, for a total of 100
measurements. A quality variable was measured at the completion
of every batch.
Data were collected on 55 batches.
Batches 40 to 42 and 50 to 55 had their quality variable outside the
specification limits. The quality variable of batches 38, 45, 46 and
49 was on the boundary.

Data
Variables
The following 10 variables were measured at equally spaced
intervals during the evolution of a batch.
x1 to x3: Temperature inside the reactor
x6 and x7: Temperature inside the heating- cooling medium
x4,x8 and x9: Pressures variables
x5 and x10: Flow rates of material added to the reactor.

Objectives
20. Develop a model of the evolution of good batches (the
observation level model), and use it to monitor new
batches as they are evolving, in order to detect problems
as early as possible.
21. Make a model of the whole batch based on the scores of
the observation level model, and use this model to
classify the new batches as good or bad ones.

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 111

Analysis Outline
We will use 18 good batches (1800 observations) to model the
evolution of good batches. This is done by fitting a PLS model
relating Y, the relative batch time, to the 10 measured variables.
This observation level model is used to monitor the evolution of
the new batches, batch 30 to 33 (good batches) and 49 to 55(bad
batches).
We will make a PCA model of the whole batch, with the unfolded
scores of the observation level as X-variables.

The steps in SIMCA-P are:


Create the observation level project, import the primary data
set with the 18 good batches
Fit the observation level model, a PLS with Y, the relative
batch time, and X, the 10 measured variables (Analysis menu).
Display the control charts of the training set . (Analysis
|Batch|Contol Charts menu)

Import the secondary data set with the new batches

Monitor the evolution of the new batches (Prediction|Batch|


Control Charts menu) and use contribution plots to interpret
the seen problems.
Create the whole Batch project and fit a PCA model to the data
Classify the new batches as good or bad using the distance to
the model (DmodX) and use contribution plots to interpret the
results

Create the observation level project


Start a new project. The data set name now is NOM18a.xls
Start SIMCA-P and create a new project from FILE | NEW.

The import wizard opens.


Select the radio button SIMCA-P Batch project and click on Next.
112 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

The second column labelled observation names contains the batch


identifiers.
Both the Batch identifiers and the phase identifiers (when present)
can be located in any variable (column) in the spreadsheet.
Mark this second column and from the combo box (top of column)
select batch identifiers.
In this example you do not need to define phase identifiers, as the
batch process has only one phase.

The following window opens:

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 113

click OK and Next.

The Batch page displays the list of batches in the dataset with the
number of observations in each batch.
The Conditional delete allows you to delete batches with fewer
observations than a selected number.

In this example we do not use the Conditional Delete.


Click on Next to display the project specification page and then
click on Finish.

114 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

The following message is displayed:

Click on OK.

Analysis
The workset M1 has been prepared with all the 10 measured
variables specified as Xs and the auto generated variable $Time
(relative batch time normalized) specified as Y, and all variables
scaled and centered to unit variance (UV). You are ready to fit the
PLS Batch model.
Click on Autofit.

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 115

SIMCA-P takes only 2 components as they explain 85% of X and


the third component explains less than 7%.
The Model window summarises the fit of the model per
component. We have an excellent model with 2 components,
explaining 87% of X and 98% of Y.

Scores Line plot of t1


Click on Scores|Line Plot| t1 to display the first summary variable
t1, summarizing all the 10 variables.

116 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

All the 18 batches are within the 2 standard deviation limit.

Loadings p1
Click on Analysis |Loading s| Column | p 1.

With batches we are interested in summarizing the X variables and


the loadings p1 are the weights that indicate the importance of the
original Xs for t1.
We can see here that all the variables participate in forming t1 with
the first 3 variables having positive weights while the others have
negative weights.

Batch Control charts (Training set)


Analysis |Batch |Control Charts | Scores
The Batch Control charts show how t1 and t2 vary with time, for
good batches. A new good batch should evolve in the same way
and its trace should be inside the control limits.

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 117

Use the side arrows to move the stack of displayed batches


forward or backward by one batch. You can also use the property
bar.
Properties page
Use the up arrow to display the control chart of t2.

To display the Control chart in Normalized units, from the Limits


and Averages tab, select Remove the average and normalize the
values

118 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

and click on Apply.

The plot is displayed in normalized units.

Batch Control Charts DModX, Variables, Hotelling


T2 and Observed vs. predicted
The plots of the distance to the model (DmodX), Hotelling T2, and
Observed vs. Predicted time, with their control limits, are also
important monitoring charts for new batches.
Display univariate Batch Control charts when needed.

Monitoring new batches


Import the secondary data set with the new
batches
Use the menu File | Import Secondary dataset, and import the file
Alpred.xls as a secondary data set.
Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 119

Mark the 2nd column as the Batch IDs.

Creating a prediction set with the new batches

Click on Predictions | Specify Predictionset | Dataset | Alpred to


select the alpred prediction set.

Control Charts for new batches


Predictions | Batch Control Charts | Scores

120 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

Click on Predictions |Batch |Control charts | Scores to display the


new batches in the control charts with the control limits derived
from the training set. Use the Properties page to include batches 50
to 55.
Use the Component tab to display the Control chart of t2.

In both of these control charts, batches 50 to 55 are out of the


control limits in the first time period (0 - 15). Batches 50 - 55 are
also out of the control limits in t1, for the last time period (90 to
100) of the polymerisation process.

Contribution plot
Using the Contribution tool, double click in the t1 control chart on
one of the outlying batches, 50 for example, at time point 4.

The Contribution plot clearly displays variable V-4 (pressure) as


being lower than average trace.

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 121

Control Chart of batch 49 and Contribution plot

Batch 49 is slightly out of the control limits around time period 5560.
The Contribution plot around time point 59 shows variable V-10
slightly lower than average good batches.

122 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

Prediction | Batch Control Charts | DModX

Batches 50 to 55 are clearly out of the control limit for the time
period 0-20.

Contribution plot
The Contribution plot for any of these batches in that time period
shows again variable V4 (pressure) as being lower than in good
batches.

The Control chart of variable 4 (pressure), double click on it,


clearly shows the problem with the pressure for these 5 batches.

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 123

Creating and Modelling the batch level project


Select the menu File | Batch |Create batch level project, mark
scores, and the check box Bring secondary dataset to the batch
level.
In the batch level project, each row has the data from one batch
and consists of the unfolded scores, from the observation level
model, which describe the evolution of each batch.
This example has no initial conditions.

Analysis: Autofit
Click on Analysis | Autofit to fit a PC model. Simca extracts 4
components.

Analysis: Scores
Click on Analysis | Scores | t1 vs t2
124 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

The 18 good batches span the space with no outliers.

Analysis |Batch Control Charts | Batch


Variable Importance

This plot, by combining the importance of the scores in the batch


level model, with the weights w* derived from the observation
level model, displays the overall importance of the measured
variables in the whole batch model. Here we see that all the 10
variables are important (this is to be expected as the 10 measured
variables are highly correlated).

Predicting the quality of the new batches


In the menu Predictions | Specify Predictionset | Dataset select the
data set alpred as a prediction set.
It contains the data for batches 1, 30-33 and 49 to 55, one
observation per batch, and the predicted scores of the observation
level as xs.

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 125

Predictions: T Predicted

We clearly see that batches 50 to 55 (with the exception of 52) are


outside the Hotelling T2 ellipse and are outliers in the second
dimension.

Predictions: Contribution Scores for batch 51


Using the Contribution tool double click on batch 51.

Double click on the t2-M1:4 and the score variable is resolved


with respect to original variables and displays variable 4 (pressure)
as the problem variable.

126 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

Predictions: Distance to the Model (DmodX)

Batches 50 to 55 have their distance to the model way above the


control limit, and batch 49 is also above the control limit. Clearly
these batches are different than the good ones.

Prediction: Contribution | Distance to the model


Using the Contribution tool double click on batch 50

Tutorial SIMCA-P, SIMCA-P+

Batch Modelling with SIMCA-P+ 127

.
Double click on the score t2-M1:3 and the score variable is
resolved with respect to original variables and displays variable 4
(pressure) as the problem variable.

Conclusion
Modelling the evolution of a representative set of good batches
allowed us to construct control charts to monitor new batches
during their evolution. We detected problems in the evolution of
the bad batches and understood why these batches were outside the
control limits.
The model of the whole batch has allowed us to classify the new
batches as good or bad and understand why these batches had an
inferior quality.

128 Batch Modelling with SIMCA-P+

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester

Introduction
The following example is derived from a batch digester.
Batch digesters are used in the pulp and paper industry to produce
pulp from wood chips.
The batch process has 5 phases: chip, acid, cook, blowback and
blow.
In the chip phase, the wood chips are fed into the digester and
steamed.
In the acid phase, the chips are impregnated with an acid.
They are then cooked at high temperature and pressure during the
cook phase. This is the most important phase, as this is where the
de-lignifications happen.
In the blowback phase, the pressure is released and thereby
brought back to atmospheric pressure. The temperatures also drop.
Finally, in the blow phase, the pulp is blown out of the digester.
The duration of a batch varies between 8 and 10 hours, and on the
average, is around 9.4 hours in the present data set.
27 variables (including the sampling time) were measured every 2
minutes during the batch evolution. Different variables are
meaningful in the different phases.
Data were collected on 52 batches. Of these, thirty good batches
are used to build the training set model.

Data
Variables
The following variables are meaningful in the following phases:
Chip and Acid phase:
State of the acid (2 variables)
State of the vent (2 variables)
State of Steam1 (2 variables)
State of Steam2 (2 variables)
Temperature4
Pressure2
Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 129

Cook phase:
Pressure1
Steam
Temperature1
Temperature2
Temperature3
Temperature4
Temperature5
Pressure2
Temperature6
Pump
Blowback phase:
Pressure1
Temperature2
Temperature3
Temperature4
Temperature5
Relief valve
Blow1
Blow2
Pressure3
Pressure4
State of Dilution (2 variables)
Dilution flow

Objectives
22. To develop a model of the evolution of good batches (the
observation level model), and use the model to monitor
other batches as they are evolving, in order to detect
problems as early as possible.
23. Make a model of the whole batch based on the scores of
the observation level model, and use this model to
classify other batches as good or bad.

Analysis Outline
We will use 30 good batches to develop the model of the evolution
of good batches.
In the analysis, we will combine the chip and acid phase (they are
not meaningful alone) and delete the blow phase which has no
effect on the quality of the pulp.
130 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

We will fit 3 different PLS models relating Y, the relative batch


time, to the measured variables in the 3 relevant phases (chp+acid,
cook and Blowback).
These observation level models are used to monitor the evolution
of the other batches, in this example those left out of the training
set.
We will make a PCA model of the good batches at the batch level,
with the unfolded scores of the observation level as X-variables.

The steps in SIMCA-P are:


Create the observation level project, import the primary data
set with the 52 batches, merge phases chip and acid and delete
the blow phase.
In menu workset, select 30 specified good batches and select
the variables relevant in each phase.
Fit the observation level models, one for each phase, by PLS
with Y= relative batch time, and X = the relevant variables in
each phase. (Analysis menu).
Interpret the scores of the cook phase, and display the control
charts of the training set. (Analysis | Batch | Control Charts
menu)

Select the complement of workset (training set) and save it as a


secondary data set (Menu Prediction/Prediction Set)

Monitor the evolution of the batches left out of the training set
(Prediction | Batch | Control Charts menu) and use
contribution plots to interpret the problems with some of the
batches.
Create the whole Batch project and fit a PCA model to the
data. (Menu File/Create Batch Level Project)
Classify the prediction set batches as good or bad using the
distance to the model (DModX), and use contribution plots to
interpret the results (Menu Prediction).

Create the observation level project


Start a new project (Menu File/New). The present data set is
DIGESTER. DIF
Start SIMCA-P and create a new project from FILE | NEW.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 131

The import wizard opens.


Select the radio button SIMCA-P Batch project and click on Next.

The second column labelled observation names contains the batch


and phase identifiers.
Both the Batch identifiers and the phase identifiers (when present)
can be located in the same variable (column) or in separate
variables in the spreadsheet.

132 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Mark this second column and from the combo box (top of
column) select Batch/Phase identifiers.

The following window opens:

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 133

The batch identifiers are sequential numbers from 01 to 52 and the


phases are chip, acid, cook, blbk, and blow, click OK .
The batch and phase ID are now in 2 separate variables.

Mark the last column with the sampling time variable and from the
drop down menu, select Y Variable (Time or Maturity) and click
on Next.
The Phase page displays the list of phases in the dataset with the
number of observations and batches in each. Under every phase is
the list of variables.
Using the CTRL key mark both the chip and the acid phases and
click on Merge. Mark the Blow phase and click on Delete.

134 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

We now have 3 phases left: chip+acid, cook and blbk. Click on


Next.
The Batch page opens listing all the batches with their numbers of
observations. Listed under every batch are the phases included in
the batch. In our example all the batches include all the phases.

The Conditional delete allows you to delete batches or phases or a


selected phase with fewer observations than a selected number.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 135

In this example we do not use the Conditional Delete.


Click on Next to display the project specification page and then
click on Finish.

The following message is displayed and new variable $Time is


created.

Specify the Workset


MB1 is an umbrella model which has been prepared with 3
unfitted models, one for every phase, and all the measured
variables specified as Xs and the relative sampling time as Y. All
variables are scaled and cantered to unit variance (UV).
We need to edit MB1 to include only the relevant variables in each
phase, and select the 30 good batches.
136 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Click on Workset | Edit MB1 and select the Variables Tab.


Select all the first 6 variables and click on the Configure Phases
button

And assign them to the first phase.


Continue and assign the variables to the respective phases as
specified in the Variables section. The Variables page should be as
follows:

Note the Y variable, sampling time, will automatically be shifted,


to start at 0 for every phase and Normalized for better alignment.
Normalizing the sampling time achieves linear time warping.
Click on the batch page to select the 30 good batches: 1, 4, 6 to
13, 16. 18, 21, 23, 25, 29, 31, 32, 34, 36 to 38, 40, 42, 43, 46 to 49
and 51.
To do this, first press Select All and Exclude. This excludes all
batches. Then use the CTRL key, mark the 30 good batches and
click on Include.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 137

Click on OK to exit the workset.

Analysis
Fitting All the Class models
Click on Analysis | Autofit All Class Models, the Specify Autofit
Window opens, click on OK.

138 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

The 3 class models are fitted and they all explain more than 80%
of X.
We will examine the cook phase at is the most important.

Scores Line plot of t1, t2 and t3


Double click on the cook model to examine its components.

The first three components are the most important, explaining


together 68% of the variation of X; t1 explains 47%, t2 13% and t3
7%
Click on Scores | Line Plot | t1 to display the first summary
variable t1, summarizing all the variables of the Cook phase.

The 30 batches are all within the 3 sigma limit of t1.


Select t2 from the component combo box in the properties bar
The 30 batches are within the 3 sigma limit of t2.
Select t3.
Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 139

The score t3 displays more variability, as all of the batches have


some time points above the 3 sigma limits.

Loadings p1, p2 and p3


With batches we are interested in summarizing the X variables,
and the loadings p1, p2 and p3 are the weights that combine the
original X variables to form t1, t2 and t3.
To interpret the first three scores t1, t2 and t3 (new variables
summarizing all the X variables) we look at the loadings p1, p2
and p3.
Click on Analysis | Loadings | Column plot | p1, and then p2 and
p3.

We can see that t1 consists mainly of the first 5 temperatures and


pressure 1.
The second score t2 is primarily pressure1, the steam and
temperature1

140 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

The score t3 is again dominated by the pressures (1 and 2) with


steam, temp1 and temp6.

Batch Control charts (Training set)


Analysis |Batch |Control Charts | Scores
All the batches in the training set are aligned to the same length
with the same time points. Hence we can now, at each time point,
compute the average t1 with its standard deviation.
The Batch Control chart of t1 shows how this summary (the
temperature trace) varies during the evolution of the cook phase.
The green line is the average t1 computed from all good batches.
The red limits are the 3 sigma limits computed from the variation
of t1 around its average of all good batches.
This green line represents the finger print of the ideal good batch.
All new good batches should evolve in the same way and should
be inside the red control limits.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 141

Individual batches can be included in this control chart the first


training batch is included as default. More can be included in the
stack of displayed batches by using the properties menu (after right
click).
Use the side arrows to move the stack of displayed batches
forward or backward by one batch. You can also use the properties
bar to select the batches to display.

Properties page
Right click on the plot and from the pop-up menu open the
Properties page.

142 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Mark all the batches you want to display and move them into the
selected window.
In this case the traces of all the good batches are within the red
control limits.

To display the Control chart in Normalized units, from the Limits


and Averages tab (under Properties), select Remove the average
and Normalize the values, and click on Apply.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 143

The plot is now displayed in normalized units.

In the component tab, select the 2nd component from the combo
box to display the Control Chart of t2.

144 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Note that this plot is not in Normalized units.

Batch Control Charts DModX, Hotelling T2 and


Observed vs. predicted

The plots of the distance to the model (DModX), Hotelling T2, and
Observed vs. Predicted time, with their control limits, are also
important monitoring charts for new batches.
Display univariate Batch Control charts when needed by selecting
the Variable Plot.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 145

Monitoring new batches


Creating the Prediction set: Complement of
Workset
Use the menu Prediction | Specify prediction set | Specify

.
Remove all batches from the prediction set (the right window),
select all batches from the left window (the Complement batches
of the Training set) move them to the right window and press OK.
From the Prediction menu, save them as a Secondary data set, give
it the name Pred1 and click OK

146 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Batch Control Chart of the Prediction set


For the cook phase, select Prediction | Batch Control Chart | Scores
and use the Properties page to include all the batches.

In the Control chart of t1 with the average and 3 sigma computed


from the good batches, we can see batch 28 far outside the control
limits.

OOC plot

Right click on the control chart and


This plot displays for every batch the percent of the area outside
the limits relative to the total area inside the limits of the control
chart.
Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 147

Hence batch 28 has 40% of its area outside the control. Area

Group Contribution plot

Display batch 28, mark the time points outside the 3 sigma and
click on the action plot.

The Contribution plot shows pressure1 being 6 standard deviations


lower than the average batch for these time points, and
temperature2 to temperature5 as also being lower than the average
at these time points

Variable control chart


Double click on Pressure1 to display the control chart of that
variable.
148 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Prediction | Batch Control Charts | DModX

Batch 28 is clearly out of the control limit for the time period 1 to
2 hrs.

Contribution plot
The Contribution plot for batch 28 in that time period shows that
the problem is also associated with pressure 2 and temp6
(correlated with pressure 2)

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 149

Double click on pressure2 to display the control chart.

Creating and Modelling the batch level project


Select the menu File | Batch |Create batch level project, mark
scores, and the check box Bring secondary dataset and select the
prediction set Pred1. Click on next, select the batch level name and
click OK.
In the batch level project, each row has the data from one batch
and consists of the unfolded score vectors from the observation
level models, which describe the evolution of each batch.
This example has no initial conditions.

150 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Analysis: Autofit
Click on Analysis | Autofit to fit a PC model. Simca extracts 5
components.

Analysis: Scores
Click on Analysis | Scores | t1 vs t2

Batch 6 is slightly out of the Hotelling T2 confidence interval.


Using the Contribution Tool, clicking on batch 6 gives the
contribution plot.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 151

The Contribution plot is coloured by phases, and shows that t1 in


the cook phase, at time 5.2 hours is lower than the average by 6.5
standard deviations. With the Contribution tool double click on
this bar to resolve this contribution into the original variables,

The temperature2, around time 5.2 hours is lower than the average
of the good batches at the same time point.
Displaying the Control chart of temperature2, by double clicking
on it, we can see that temperature2 at time 5.36 hours is equal to
114.9 degree and is slightly below the control limit. Temperature2
is equal to 141degrees for the average of the good batches at this
time point.

152 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

Analysis | Batch Variable Importance


Considering that the different phases have different variables, one
must display the Batch Variable Importance separately for each
phase.
Select the Cook phase, as it is the most important.

This plot, by combining the importance of the scores in the batch


level model, with the weights p derived from the observation level
model, displays the overall importance of the measured variables
for the whole batch model in the cook phase. Here we see that the
temperatures, pressure1 and the steam dominate.

Predicting the quality of the prediction set batches


In the menu Predictions | Specify, select both the training set and
the prediction set batches in Pred1.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 153

Predictions: T Predicted

Select t1 vs. t3. Batches 28 and 26 are outside the Hotelling T2


ellipse.

Predictions: Contribution Scores for batch 28.


Use the Contribution tool double click on batch 28.

What is causing batch 28 to be an outlier? The problem clearly is


the cook phase. Double click on one of the scores with large
deviations, for example t1 at time 1.1 hours, to resolve the
contribution into original variables.

154 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

The resolved contribution plot shows pressure1 as being much


lower than average.
The Control chart of pressure1 confirms this and shows the
problem with batch 28.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 155

Predictions: Distance to the Model (DModX)

Batches 28, 26, 33, 50 and 52 have the largest DModXPS.

Contribution Plot
Double click with the contribution tool on batch 33 to display the
contribution plot

The problem seems to be in t2 of the cook phase around time 0.4


hours (the beginning of that phase) and also in the chip+acid
phase.
Double clicking on a large score in the chip and acid phase we see
that the problem was with the steam state.

156 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

The resolved contribution for the large t2 in the cook phase shows
both the pressure1 and the steam lower than average, probably due
to the problem with the steam state.

The Control charts for batch 33 of both pressure1 and steam


confirms this.

Tutorial SIMCA-P, SIMCA-P+

Modelling of a Batch Digester 157

Conclusion
Modelling the evolution of a representative set of good batches
allowed us to construct control charts to monitor new batches
during their evolution. We detected problems in the evolution of
the bad batches and understood why these batches were outside the
control limits.
The model of the whole batch has allowed us to classify the new
batches as good or bad and understand why these batches had an
inferior quality.

158 Modelling of a Batch Digester

Tutorial SIMCA-P, SIMCA-P+

You might also like