You are on page 1of 3

regression - How to correctly use the GPML Ma... http://stats.stackexchange.com/questions/12378...

sign up log in tour help

_
Cross Validated is a question and Here's how it works:
answer site for people interested in
statistics, machine learning, data
analysis, data mining, and data
visualization. It's 100% free, no
registration required. Anybody can ask Anybody can The best answers are voted
a question answer up and rise to the top
Sign up

How to correctly use the GPML Matlab code for an actual (non-demo) problem?

I have downloaded the most recent GPML Matlab code GPML Matlab code and I have read the documentation and ran the regression demo
without any problems. However, I am having diculty understanding how to apply it to a regression problem that I am faced with.

The regression problem is dened as follows:

Let xi R20 be an input vector and yi R25 be its corresponding target. The set of M inputs are arranged into a matrix
X = [x1 , , xM ] and their corresponding targets are stored in a matrix Y = [y1 y, , yM y], with y being the mean target
value in Y.

I wish to train a GPR model G = {X, Y, } using the squared exponential function:

k(xi , xj ) = 2 exp ( 1
(xi xj )2 ) + 2 ij,
2 2

where ij equals 1 if i = j and 0 otherwise. The hyperparameters are = (, , ) with being the assumed noise level in the training data
and is the length-scale.

To train the model, I need to minimise the negative log marginal likelihood with respect to the hyperparameters:

log p(Y X, ) = 12 tr(Y K1 Y) + 12 log K + c,

where c is a constant and the matrix K is a function of the hyperparameters (see equation k(xi,xj) = ...).

Based on the demo given on the GPML website, my attempt at implementing this using the GPML Matlab code is below.

covfunc = @covSEiso;
likfunc = @likGauss;
sn = 0.1;
hyp.lik = log(sn);
hyp2.cov = [0;0];
hyp2.lik = log(0.1);
hyp2 = minimize(hyp2, @gp, -100, @infExact, [], covfunc, likfunc, X1, Y1(:, n));
exp(hyp2.lik)
nlml2 = gp(hyp2, @infExact, [], covfunc, likfunc, X1, Y1(:, n));
[m s2] = gp(hyp2, @infExact, [], covfunc, likfunc, X1, Y1(:, n), X2);
Y2r(:, n) = m;

X1 contains the training inputs

X2 contains the test inputs

Y1 contains the training targets

Y2r are the estimates from applying the model

n is the index used to regress each element in the output vector

Given the problem, is this the correct way to train and apply the GPR model? If not, what do I need to change?

regression machine-learning matlab gaussian-process

asked Jun 26 '11 at 18:39


Josh
195 1 11

2 Answers

The GP does a good job for your problem's training data. However, it's not so great for the test
data. You've probably already ran something like the following yourself:

load('../XYdata_01_01_ab.mat');

for N = 1 : 25

1 of 3 22/01/16 21:53
regression - How to correctly use the GPML Ma... http://stats.stackexchange.com/questions/12378...

% normalize
m = mean(Y1(N,:));
s = std(Y1(N,:));
Y1(N,:) = 1/s * (Y1(N,:) - m);
Y2(N,:) = 1/s * (Y2(N,:) - m);

covfunc = @covSEiso;
ell = 2;
sf = 1;
hyp.cov = [ log(ell); log(sf)];

likfunc = @likGauss;
sn = 1;
hyp.lik = log(sn);

hyp = minimize(hyp, @gp, -100, @infExact, [], covfunc, likfunc, X1', Y1(N,:)');
[m s2] = gp(hyp, @infExact, [], covfunc, likfunc, X1', Y1(N,:)', X1');
figure;
subplot(2,1,1); hold on;
title(['N = ' num2str(N)]);
f = [m+2*sqrt(s2); flipdim(m-2*sqrt(s2),1)];
x = [1:length(m)];
fill([x'; flipdim(x',1)], f, [7 7 7]/8);
plot(Y1(N,:)', 'b');
plot(m, 'r');
mse_train = mse(Y1(N,:)' - m);

[m s2] = gp(hyp, @infExact, [], covfunc, likfunc, X1', Y1(N,:)', X2');


subplot(2,1,2); hold on;
f = [m+2*sqrt(s2); flipdim(m-2*sqrt(s2),1)];
x = [1:length(m)];
fill([x'; flipdim(x',1)], f, [7 7 7]/8);
plot(Y2(N,:)', 'b');
plot(m, 'r');
mse_test = mse(Y2(N,:)' - m);

disp(sprintf('N = %d -- train = %5.2f test = %5.2f', N, mse_train,


mse_test));
end

Tuning the hyperparameters manually and not using the minimize function it is possible to
balance the train and test error somewhat, but tuning the method by looking at the test error is
not what you're supposed to do. I think what's happening is heavy overtting to your three
subjects that generated the training data. No method will out-of-the-box do a good job here,
and how could it? You provide the training data, so the method tries to get as good as possible
on the training data without overtting. And it fact, it doesn't overt in the classical sense. It
doesn't overt to the data, but it overts to the three training subjects. E.g., cross-validating with
the training set would tell us that there's no overtting. Still, your test set will be explained
poorly.

What you can do is:

1. Get data from more subjects for training. This way your fourth person will be less likely to
look like an "outlier" as it does currently. Also, you have just one sequence of each person,
right? Maybe it would help to record the sequence multiple times.

2. Somehow incorporate prior knowledge about your task that would keep a method from
overtting to specic subjects. In a GP that could be done via the covariance function, but
it's probably not that easy to do ...

3. If I'm not mistaken, the sequences are in fact time-series. Maybe it would make sense to
exploit the temporal relations, for instance using recurrent neural networks.

There's most denitely more, but those are the things I can think of right now.

edited Jun 30 '11 at 23:06 answered Jun 28 '11 at 19:29


ahans
196 4

I am assuming a zero-mean Gaussian process. Since the targets do not have a zero mean I centre them by
subtracting their mean. You're right about the redundancy; I have removed those two lines. I don't believe the
covariance function is correct given the problem, and the I'm not condent about the intitialisation of the
hyperparameters. My doubts come about from the results. The residuals are practically the same as those
for ridge regression, and my data is known to be highly nonlinear. Josh Jun 29 '11 at 9:36

You're right about the substraction; it shouldn't hurt in any case. I'd normally add that to the covariance
function, like covfunc = { 'covSum', { 'covSEiso' } } I don't quite see how this takes care of noisy
data now, it seems the toolbox has changed quite a lot since I last used it, will have a closer look at it later.
ahans Jun 29 '11 at 22:02

What do you know about your problem that makes you think that the covSEiso isn't a reasonable choice?
And the ridge regression you use, is that a kernlized one or linear? If you use kernels, it's not that surprising
that you get similar results. ahans Jun 29 '11 at 22:05

Can you provide sample data of your problem? That would make things a bit easier, perhaps with just one
target dimension. ahans Jun 29 '11 at 22:07

1 @Masood You're right, but when n > 30 the Student t distribution is very close to the gaussian distribution.
Even with n>20, we generally have a good approximation. chl Sep 13 '11 at 22:23

I think the problem may be one of model mis-specication. If your targets are angles wrapped
to +-180 degrees, then the "noise process" for your data may be suciently non-Guassian that

2 of 3 22/01/16 21:53
regression - How to correctly use the GPML Ma... http://stats.stackexchange.com/questions/12378...

the Baysian evidence is not a good way to optimise the hyper-parameters. For instance,
consider what happens when "noise" causes the signal to wrap-around. In that case it may be
wise to either perform model selection by minimising the cross-validation error (there is a public
domain implementation of the Nelder-Mead simplex method here if you don't have the
optimisation toolbox). The cross-validation estimate of performance is not so sensitive to model
mis-specication as it is a direct estimate of test performance, whereas the marginal likelihood
of the model is the evidence in suport of the model given that the model assumptions are
correct. See the discussion starting on page 123 of Rasmussen and Williams' book.

Another approach would be to re-code the outputs so that a Gaussian noise model is more
appropriate. One thing you could do is some form of unsupervised dimensionality reduction, as
there are non-linear relationships between your targets (as there are only a limited way in
which a body can move), so there will be a lower-dimensional manifold that your targets live on,
and it would be better to regress the co-ordinates of that manifold rather than the angles
themselves (there may be fewer targets that way as well).

Also some sort of Procrustes analysis might be a good idea to normalise the dierences
between subjects before training the model.

You may nd some of the work done by Neil Lawrence on human pose recovery of interest. I
remember seeing a demo of this at a conference a few years ago and was very impressed.

answered Jul 1 '11 at 6:54


Dikran Marsupial
23.8k 50 100

From my analysis, I have noticed that the discontinuities in the output space cause a number of problems. I
have considered using joint locations rather than joint angles to overcome this problem. By dimensionality
reduction, did you have a particular method in mind? Unlike image-based approaches, I don't see how the
dierences in subjects (other than their movement patterns) would eect the training of the model, given that
I am using orientations of IMU sensors which are consistently placed and post-processed to be aligned
between subjects. Josh Jul 2 '11 at 16:19

I have come across Lawrence's paper before. Since only 1 sequence was mentioned in the paper, it seems
that some form of k-fold CV was performed. In which case, the problem becomes almost trivial.
Same-subject mappings of an activity, particular one that is cyclical, are typically straightforward to obtain
decent pose estimates. I have compared same-subject and inter-subject mappings, and the dierence is very
signicant. Unfortunately, research in this area is basically incomparable due to everyone using their own
regression framework, mocap data, error metrics, and input/output structures. Josh Jul 2 '11 at 16:32

3 of 3 22/01/16 21:53

You might also like