PLS regression method
2025-08-12
PLS_method.Rmd
What is PLS regression ?
PLS is above all a dimension reduction method equivalent to PCA, with the creation of new components, but it remains a supervised method. PLS can be adapted to regression (in which case it is referred to as PLS1), to classification (PLS-DA), and can also be multivariate (PLS2).
We then consider the dataset divided into two other centered and
standardized datasets
and
of sizes
and
respectively.
The
new components created are denoted
,
,
…,
for the
dataset and
,
,
…,
for the
dataset.
They are a linear combination of the variables
,
…,
as well as the variables
,
…,
,
respectively.
library(sgPLSdevelop)
data <- data.create(n = 50, q = 5)
X <- data$X
Y <- data$Y
How to compute the first component ?
Here it is the expressions for the first component of and .
with ,…, and ,…, the associated weights.
Theses weights are therefore obtained by maximizing the covariance between and , that is, by determining:
under the constraint .
To do this, one must decompose the matrix product into singular values:
.
- is the diagonal matrix containing the singular values of , i.e., the square roots of the eigenvalues of .
- is the matrix containing the eigenvectors of .
- is the matrix containing the eigenvectors of .
The vectors and that maximize this covariance are respectively the first columns of matrices and .
How to compute all components ?
From the second component onward,
becomes greater than
,
and new matrices and new weights are defined; it is therefore necessary
to introduce new notations.
,
the expressions become:
We thus define the matrices , , as well as new weight vectors such that:
The weights are thus obtained by maximizing the covariance between and , that is, by determining:
with the constraint .
To do so, the matrix product is decomposed as:
Computation of and
To determine and , we first compute the vectors and using the formulas:
Remark:
- In the case of PLS1, since the method is univariate for
,
we have
and
.
- In the canonical mode of PLS, the matrix deflation expressions are:
Theses vectors and matrices (except the deflated matrices and when ) can be found with the PLS function.
model <- PLS(X,Y,ncomp = 10,mode = "regression")
mat.t <- model$variates$X
mat.s <- model$variates$Y
mat.c <- model$mat.c
mat.d <- model$mat.d
mat.e <- model$mat.e # "NA" printed because of regression mode
How to make predictions ?
Predictions are given by :
with :
→
→ , coefficients matrix of on
→ , coefficients matrix of on .
We hence find columns by the following expression :
It is also possible to make predictions about the new components by calculating :
We also have the relation :
NB : the newdata are scaled according the attributes, we have to unscale by using attributes. and are parts of training set in this context.