Skip to contents

What is PLS regression ?

PLS is above all a dimension reduction method equivalent to PCA, with the creation of HH new components, but it remains a supervised method. PLS can be adapted to regression (in which case it is referred to as PLS1), to classification (PLS-DA), and can also be multivariate (PLS2).

We then consider the dataset divided into two other centered and standardized datasets XX and YY of sizes n×pn \times p and n×qn \times q respectively.
The HH new components created are denoted t1t_1, t2t_2, …, tHt_H for the XX dataset and u1u_1, u2u_2, …, uHu_H for the YY dataset.
They are a linear combination of the variables X1X_1, …, XpX_p as well as the variables Y1Y_1, …, YqY_q, respectively.

library(sgPLSdevelop)
data <- data.create(n = 50, q = 5)
X <- data$X
Y <- data$Y

How to compute the first component ?

Here it is the expressions for the first component of XX and YY.

t1=j=1pujXj=Xut_1 = \sum_{j=1}^{p} u_j X_j = Xu

s1=j=1qvjYj=Yvs_1 = \sum_{j=1}^{q} v_j Y_j = Yv with u1u_1,…,upu_p and v1v_1,…,vqv_q the associated weights.

Theses weights are therefore obtained by maximizing the covariance between t1t_1 and s1s_1, that is, by determining:

argmaxu,vCov(Xu,Yv)\operatorname*{argmax}_{u,v} \; \operatorname{Cov}(Xu, Yv)

under the constraint u=v=1u = v = 1.

To do this, one must decompose the matrix product M=XTYM = X^T Y into singular values:

M=UΔVTM = U \Delta V^T.

  • Δ\Delta is the diagonal matrix containing the singular values of MM, i.e., the square roots of the eigenvalues of MMTMM^T.
  • UU is the matrix containing the eigenvectors of MMTMM^T.
  • VV is the matrix containing the eigenvectors of MTMM^TM.

The vectors uu and vv that maximize this covariance are respectively the first columns of matrices UU and VV.

svd <- svd(t(X)%*%Y)
u <- svd$u[,1]
v <- svd$v[,1]
t <- X%*%u
s <- Y%*%v

How to compute all components ?

From the second component onward, hh becomes greater than 11, and new matrices and new weights are defined; it is therefore necessary to introduce new notations.
h1,...,H\forall h \in 1,...,H, the expressions become:

th=j=1puj(h)Xj(h)=X(h)u(h) t_{h} = \sum_{j=1}^{p} u^{(h)}_j X^{(h)}_{j} = X^{(h)}u^{(h)}

sh=j=1qvj(h)Yj(h)=Y(h)v(h) s_{h} = \sum_{j=1}^{q} v^{(h)}_j Y^{(h)}_{j} = Y^{(h)}v^{(h)}

We thus define the matrices X(1),,X(h)X^{(1)}, \ldots, X^{(h)}, Y(1),,Y(h)Y^{(1)}, \ldots, Y^{(h)}, as well as new weight vectors such that:

  • X(1)=XX^{(1)} = X
  • X(h+1)=X(h)thchTX^{(h+1)} = X^{(h)} - t_{h} c^T_{h}
  • u(1)=uu^{(1)} = u
  • Y(1)=YY^{(1)} = Y
  • Y(h+1)=Y(h)thdhTY^{(h+1)} = Y^{(h)} - t_{h} d^T_{h}
  • v(1)=vv^{(1)} = v

The weights are thus obtained by maximizing the covariance between tht_h and shs_h, that is, by determining:

argmaxu(h),v(h)Cov(X(h)u(h),Y(h)v(h)) \operatorname*{argmax}_{u^{(h)},v^{(h)}} \; \operatorname{Cov}(X^{(h)}u^{(h)}, Y^{(h)}v^{(h)})

with the constraint u(h)=v(h)=1\| u^{(h)} \| = \| v^{(h)} \| = 1.

To do so, the matrix product is decomposed as:

M(h)=X(h)TY(h)=U(h)ΔV(h)T M^{(h)} = X^{(h)T}Y^{(h)} = U^{(h)} \Delta V^{(h)T}

Computation of X(h)X^{(h)} and Y(h)Y^{(h)}

To determine X(h)X^{(h)} and Y(h)Y^{(h)}, we first compute the vectors ch1c_{h-1} and dh1d_{h-1} using the formulas:

ch=X(h)TththTth c_h = \frac{X^{(h)T} t_h}{t^T_h t_h}

dh=Y(h)TththTth d_h = \frac{Y^{(h)T} t_h}{t^T_h t_h}

Remark:

  • In the case of PLS1, since the method is univariate for YY, we have u=1u = 1 and H=1H = 1.
  • In the canonical mode of PLS, the matrix deflation expressions are:

Y(h)=Y(h1)th1dh1T=Y(k)tkdkT Y^{(h)} = Y^{(h-1)} - t_{h-1} d^T_{h-1} = Y^{(k)} - t_k d^T_k

eh=Y(h)TshshTsh e_h = \frac{Y^{(h)T} s_h}{s^T_h s_h}

Theses vectors and matrices (except the deflated matrices X(h)X^{(h)} and Y(h)Y^{(h)} when h>1h>1) can be found with the PLS function.

model <- PLS(X,Y,ncomp = 10,mode = "regression")
mat.t <- model$variates$X
mat.s <- model$variates$Y
mat.c <- model$mat.c
mat.d <- model$mat.d
mat.e <- model$mat.e # "NA" printed because of regression mode

How to make predictions ?

Predictions are given by :

Ŷnew=XnewB=XnewU(CTU)1B\hat{Y}_{new} = X_{new}B' = X_{new}U(C^TU)^{-1}B with :

U=(u1|u2|...|uH)U = (u_1 | u_2 | ... | u_H)

C=(c1|c2|...|cH)C = (c_1 | c_2 | ... | c_H), p×Hp \times H coefficients matrix of XX on TT

B=(TTT)1TTYB = (T^TT)^{-1}T^TY, H×kH \times k coefficients matrix of TT on YY.

We hence find CC columns by the following expression :

ch=X(h)TththTthc_h = \frac{X^{(h)T} t_h}{t^T_h t_h}

It is also possible to make predictions about the new components by calculating :

T̂new=XnewU(CTU)1\hat{T}_{new} = X_{new}U(C^TU)^{-1}

We also have the relation : Ŷnew=T̂newB\hat{Y}_{new} = \hat{T}_{new}B

NB : the newdata XnewX_{new} are scaled according the XX attributes, we have to unscale Ŷnew\hat{Y}_{new} by using YY attributes. XX and YY are parts of training set in this context.