Professional Documents
Culture Documents
Please read carefully. For more detailed instructions, with figures, see the document in
the \Program Files\ParLeS\Help directory of your computer.
Example of format for calibration file, containing labels, a single response variable
(OC) and NIR spectra (700-2500nm):
Note: you can have more than 1 response variable in your files. You will be asked to
select the y-variable you want to model or test in the appropriate sections of the
software.
If you have more than one response variable then place them in the second third, etc.
columns after the sample labels and before the predictor variables (i.e. before the X-
data).
Example of format for calibration file, containing labels, three response variables (OC,
pH and N) and NIR spectra (700-2500nm):
" If you want to test your models with independent test data then your file format
will be as in a. above, i.e. including the response variable data to be used to test the
models. Remember to also include headers as in a. above.
Prepare your data files for import into ParLeS and save them as
tab delimited (ASCII) text files.
(ii) by checking the box labelled 'Check to join files from a directory' select to merge
multiple spectroscopic files with x,y format (e.g. where x is frequency and y is
reflectance), into a single file.
You may then run the program using the 'IMPORT DATA FOR MODELLING' button.
Once the software has run, a sample of the merged spectra will be displayed. This may
take some time depending on the number of files that you have. If the sample spectra do
not appear to plot properly, then an error has occurred and you should check that you
have the correct directory or that you have the correct file extension.
The merged file may be saved by checking the 'SAVE MERGED FILE' control or it
may be further analysed in ParLeS (see below).
For the prediction of 'unknowns' the file requires a column of zeroes replacing the y
variable. In this case your 'Total number of y variables' will be '1' and the 'Select y
variable for prediction' will also be '1'.
If the file format is incorrect or you have incorrectly identified the total number of y
variables in your data file then you will be able to see this in the sample data windows
and more than likely your spectra will not plot correctly.
The dropdown menus you can perform the following transformations and
preprocessing:
o Data transformation - transform diffuse reflectance (R) data to Log(1/R) or
Kubelka-Munk units K/S = (1-R)^2/2R. You may also transform from Log(1/R) to R.
o Light scatter and baseline corrections - correct data for light scattering effects,
etc. using Multiplicative Signal Correction (MSC), Standard Normal Variate (SNV),
SNV with quadratic detrending, Wavelet de-trending or SNV with wavelet detrending.
The wavelet de-trending level specifies the number of levels of the wavelet
decomposition, which is approximately (1 - trend level*log2(Ls), where Ls is the signal
length. When trend level is zero, signal trend is equal to zero, and signal detrended is
identical to signal in. It may be thought of as a form of baseline correction.
o De-noising/Smoothing - de-noise data using a Median filter or the Savitzky-
Golay or Wavelet de-noising. For the Median Filter select the rank to be used in the
filtering. For the Savitzky-Golay first select the number of data points to fit the curve
and then the order of the polynomial you wish to fit. For the Wavelet de-noising select
the desired wavelet scale for de-noising. ParLeS uses a Daubechies wavelet with 4
vanishing moments.
o Differentiation - correct the data for baseline, particle size, etc. using first or
second derivatives together with the desired sampling interval.
The software also offers a number of methods for pretreating the predictor data. Using
the drop-down menu you can select which data pretreatment (or enhancement) to use
before you move onto the multivariate modelling. The choices include:
- Mean centre,
- Variance scale,
- Mean centre & variance scale
NOTE: it is common practice, although not imperative, to 'Mean Centre' your data
before PCA and PLSR
Once the particular combination is selected, press the 'RUN SELECTION' button. The
first graph will show your raw data and the graph on the bottom part of the ParLeS
window will show you the combined transformed, preprocessed and pre-treated spectra.
You may investigate the effect of each algorithm separately by selecting it and then
pressing the 'RUN SELECTION' button.
For example if you have diffuse reflectance data you may choose to transform these to
Log(1/R); correct for light scattering effects using the MSC; de-noise your signal using
the wavelet de-noising at scale = 2; take the first derivative and mean centre your data
before you perform PCA or PLSR.
You can save the manipulated data to a file using the 'SAVE MANIPULATED DATA'
(called the 'SAVE PREPROCESSED DATA' in earlier versions of ParLeS). The saved
file will be a tab delimited text file.
Note that in ParLeS version 3.1 you can interact with the scores vs. scores and loadings
vs. loadings plot. Glide your mouse over the data points and click on the point that you
want to identify. The point will change colour and its label will be briefly displayed on
the graph.
The PCA scores and loadings can be saved to tab delimited text files by checking the
'SAVE PCA SCORES & LOADINGS' check box. Two separate dialogues will appear
once you check to save: the first will ask you to give a name for the scores file and the
second will ask you to provide a name for the loadings file.
With large data sets it may be too computationally expensive to use leave-one-out so
you could for example use leave-ten-out. To do this, type the number of samples 'n' to
leave out. To help you decide, the total number of samples in your dataset are given in
the numeric indicator 'No. Samples'.
To start the cross validation, press the 'RUN X-VAL' button. The progress bar indicates
how much of the data has been cross validated.
The results of the cross validation is displayed in the following graphics:
- the root mean squared error of cross validation (RMSE) vs. the number of factors
- R2 and Q2 statistics vs. the number of factors
- the Akaike Information Criterion (AIC) vs. the number of factors. Note the AIC
preserves model parsimony.
- the observed vs. cross validation predictions for a selected number of factors, where
the user may select the cross validate predictions to plot using the numeric control
'Select X-Val model to plot'. The fitted line and equation are also given. For this cross
validated model, various assessment statistics are given: R2, R2adjusted, RMSE, mean
error (ME) the standard deviation of the error (SDE) and the RPD.
The cross validation results can be saved by checking the 'SAVE X-VAL RESULTS'
check box. Two separate dialogs will appear once you check to save: the first will ask
you to give a name for the assessment statistics file and the second will ask you to
provide a name for the observed vs. cross validation predictions for the selected number
of factors.
.
Note if you do not need to cross-validate, proceed to the PLSR modelling tab.
Once the number of factors to model are selected, run the software using the 'RUN
PLSR MODELLING' button. Results from the PLSR modelling are shown in a number
of graphs:
- Scores vs. scores plot
- Scores vs. y plot
- Regression coefficients (B) vs. wavelength/wavenumber plot
- Spectral loadings (P) and loading weights (W) vs. wavelength/wavenumber plot
- Variable importance for projection (VIP) vs. wavelength/wavenumber plot
- Sorted VIP and wavelength/wavenumber table
- the percent variation of each the predictor and response data that is explained by each
factor in the PLSR model
Note that in ParLeS version 3.1 you can interact with the scores vs. scores; scores vs. y
plot; regression coefficients vs. wavelength/wavenumber plot and the VIP vs.
wavelength/wavenumber plot. Glide your mouse over the data points and click on the
point that you want to identify. The point will change colour and its label will be briefly
displayed on the graph.
The PLSR model (scores, regression coefficients (b), the intercept (b0), spectral
loadings and loading weights) as wells as the VIP results can be saved to tab delimited
text files by checking the 'SAVE SCORES; b, b0, p, w; and VIP' check box. Three
separate dialogues will appear once you check to save: the first will ask you to give a
name for the PLSR scores file; the second will ask you to provide a name for the
regression coefficients and the third for the VIP results.
7. Prediction
To make PLSR predictions press 'RUN PREDICTIONS' to run the PLSR predictions
using the selected model selected in the 'PLSR Model' tab (see 6. above). The program
will run and results and assessment statistics will be displayed.
The results from the PLSR predictions are displayed in a number of graphics and
assessed using various statistics:
- a sample of the spectra used for predictions
- the predicted values
- when using a test data set, the residuals (observed - predicted)
- when using a test data set, the observed vs. predicted and the fitted line, also showing
its equation
- the following assessment statistics: R2, R2adjusted, RMSE and confidence intervals,
mean error (ME) the standard deviation of the error (SDE) and the RPD
- a histogram of the predicted values and their descriptive statistics
The predictions can be saved to a file using the 'SAVE PREDCITIONS' check-box.
The results from bagging-PLSR are displayed in a number of graphics and assessed
using various statistics:
- the observed vs. predicted from the bootstraps
- the out-of-bag statistics, which may also be used to evaluate the models
- a plot of the predicted values and their 95% confidence intervals
- the descriptive statistics of the predictions
- the observed vs. predicted and the fitted line, also showing its equation
- the following assessment statistics: R2, R2adjusted, RMSE and confidence intervals,
mean error (ME) the standard deviation of the error (SDE) and the RPD
The bagging-PLSR predictions and confidence intervals can be saved to a file using the
'SAVE BAGGED' check-box.
Once finished you can exit ParLeS using the 'EXI PROGRAM' button.
9. Errors
If incorrect file format, the software will not run, or run incorrectly.
You may not use the software for commercial purposes, unless you have obtained
permission, in writing, from Raphael VISCARRA ROSSEL (r.viscarra-
rossel@usyd.edu.au or tel. +61 413 326 457)
If the ParLeS is used in research you agree to cite the following reference: