PLS-DA method
2025-08-13
PLSDA_method.Rmd
What is PLS-DA ?
With the ending of its acronym Discriminant Analysis, PLS-DA turns out to be a classification method using dimension reduction and discriminant analysis. This often shows good results when it is possible to discriminate the categories (also called classes) of variables in based on variables in .
This time, we consider a dataset where the only variable is categorical and composed of classes. is then recoded into a one-hot model and is represented as a matrix of size ;
The latent variables and are determined using the same procedure as for PLS, this time using the matrix in one-hot encoding. We therefore have in particular , .
How to make predictions ?
The H-component predictions are given by with:
→ the matrix of component weights in
→ the matrix of coefficients of on
→ the matrix of coefficients of on
We therefore find the columns of by the following expression:
we also have the expression :
We can acknowledge the same expression in the PLS method except that and are not one-hot encoded.
Since the matrix is of size , the assignment of classes of is therefore done using a distance calculation method called ; there are three of them:
the Maximum distance
the Centroid distance
the Mahalanobis distance
Maximum distance
The distance from the maximum is the simplest. We start from where the values represent scores for each individual and for each class. The class to predict will therefore be the one with the highest score. In other words, we have:
With the row index and the column index of .
Centroïd and Mahalanobis distance
The centroid and Mahalanobis distances are based on the trained latent variable and on the predicted latent variables from to .
Concretely, this involves calculating a Euclidean distance between the center of gravity of and the predicted latent variables. The class for which the distance is the smallest is then selected.
denotes a portion of the matrix containing only the individuals of class . The predicted component matrix is also denoted.
Centroïd distance
- we therefore calculate
- we then calculate the centroid vector with the frequency of class .
- we find the matrix composed of the row vectors
- for each row vector of , we calculate the Euclidean distance with each of the row vectors of . Finally, we assign the class for which the distance is the smallest. In other words: