Skip to contents

What is PLS-DA ?

With the ending of its acronym Discriminant Analysis, PLS-DA turns out to be a classification method using dimension reduction and discriminant analysis. This often shows good results when it is possible to discriminate the categories (also called classes) of variables in YY based on variables in XX.

This time, we consider a dataset (X,Y)(X,Y) where the only variable YY is categorical and composed of kk classes. YY is then recoded into a one-hot model and is represented as a matrix of size n×kn \times k;

Ỹ=(Ỹ1|Ỹ2|...|Ỹk) \tilde{Y} = (\tilde{Y}_1 | \tilde{Y}_2 | ... | \tilde{Y}_k)

The latent variables t(h)t^{(h)} and s(h)s^{(h)} are determined using the same procedure as for PLS, this time using the matrix Ỹ\tilde{Y} in one-hot encoding. We therefore have in particular s(h)=Ỹ(h)u(h)s^{(h)} = \tilde{Y}^{(h)}u^{(h)}, h1,...,H\forall h \in 1,...,H.

How to make predictions ?

The H-component predictions are given by Ỹ̂new=XnewU(CTU)1B\hat{\tilde{Y}}_{new} = X_{new}U(C^TU)^{-1}B with:

UU the p×Hp \times H matrix of component weights in XX

CC the p×Hp \times H matrix of coefficients of XX on TT

BB the H×kH \times k matrix of coefficients of TT on YY

We therefore find the columns of CC by the following expression:

ch=X(h)TththTth c_h = \frac{X^{(h)T} t_h}{t^T_h t_h}

we also have the expression :

B=(TTT)1TTỸ B = (T^TT)^{-1}T^T\tilde{Y}

RemarkRemark

We can acknowledge the same expression Ỹ̂new=XnewU(CTU)1B\hat{\tilde{Y}}_{new} = X_{new}U(C^TU)^{-1}B in the PLS method except that YY and Ŷnew\hat{Y}_{new} are not one-hot encoded.

Since the matrix Ỹ̂new\hat{\tilde{Y}}_{new} is of size nnew×kn_{new} \times k, the assignment of classes of Ŷnew\hat{Y}_{new} is therefore done using a distance calculation method called ; there are three of them:

  • the Maximum distance

  • the Centroid distance

  • the Mahalanobis distance

Maximum distance

The distance from the maximum is the simplest. We start from Ỹ̂new\hat{\tilde{Y}}_{new} where the values represent scores for each individual and for each class. The class to predict will therefore be the one with the highest score. In other words, we have:

ĉi=argmaxl(Ỹ̂new)i,li1,...,n\hat{c}_i = \operatorname*{argmax}_l (\hat{\tilde{Y}}_{new})_{i,l} \forall i \in {1,...,n}

With ii the row index and ll the column index of (Ỹ̂new)i,l(\hat{\tilde{Y}}_{new})_{i,l}.

Centroïd and Mahalanobis distance

The centroid and Mahalanobis distances are based on the trained latent variable t̂(h)\hat{t}^{(h)} and on the predicted latent variables from t̂new(1)=Xnew(1)u\hat{t}^{(1)}_{new} = X^{(1)}_{new}u to t̂new(h)=Xnew(h)u\hat{t}^{(h)}_{new} = X^{(h)}_{new}u.

Concretely, this involves calculating a Euclidean distance between the center of gravity of t(h)t^{(h)} and the predicted latent variables. The class for which the distance is the smallest is then selected.

TcT_c denotes a portion of the matrix TT containing only the individuals of class cc. The predicted component matrix T̂=T̂new=XnewU(CTU)1\hat{T} = \hat{T}_{new} = X_{new}U(C^TU)^{-1} is also denoted.

Centroïd distance

  • we therefore calculate TcT_c
  • we then calculate the centroid vector Gc=1nc𝟙ncTTc=(1nc1nc...1nc)TcGc = \frac{1}{n_c}\mathbb{1}_{n_c}^TT_c = (\frac{1}{n_c} \frac{1}{n_c}...\frac{1}{n_c})T_c with ncn_c the frequency of class cc.
  • we find the matrix GG composed of the row vectors GcG_c
  • for each row vector of T̂new\hat{T}_{new}, we calculate the Euclidean distance with each of the row vectors of GG. Finally, we assign the class cc for which the distance is the smallest. In other words:

Ŷi=argmin1ckdist(T̂i,Gc)=argmin1ckh=1H(T̂i,hGc,h)2\hat{Y}_i = \operatorname*{argmin}_{1\leq c \leq k} dist(\hat{T}_i,G_c) = \operatorname*{argmin}_{1\leq c \leq k} \sqrt{\sum_{h=1}^{H} (\hat{T}_{i,h} - G_{c,h})^2}

i1,...,n\forall i \in {1,...,n}

Mahalanobis distance

For this distance, we calculate the matrix GG as before, then for each row vector of T̂\hat{T} we calculate the Mahalanobis distance with each of the row vectors of GG.

Ŷi=argmin1ckdist(T̂i,Gc)=argmin1ck(T̂iGc)TCov1(T̂iGc)\hat{Y}_i = \operatorname*{argmin}_{1\leq c \leq k} dist(\hat{T}_i,G_c) = \operatorname*{argmin}_{1\leq c \leq k} \sqrt{(\hat{T}_i - G_c)^TCov^{-1}(\hat{T}_i - G_c)}

i1,...,n\forall i \in {1,...,n} and with Cov1Cov^{-1} the inverse of the covariance matrix CovCov applied to the vector (T̂iGc)(\hat{T}_i - G_c).