gPLS performance

Introduction

This page presents an application of the gPLS performance assessment. Like the gPLS method, there are several predictions according to the components number selected in the model. The goal is to choose the best number of component in gPLS regression in order to compute the best possible predictions but also to select the best number of variables. For that, we will use two datasets:

one dataset with 15 $X$ variables ans 12 $Y$ variables
one dataset with 200 $X$ variables ans 100 $Y$ variables

To access to predefined functions from sgPLSdevelop package and manipulate these datasets, run these lines :

library(sgPLSdevelop)

data1 <- data.create(p = 15, q = 12)
data2 <- data.spls.create(p100 = 2, q100 = 1)

## [1] "First dataset dimensions : 40 x 27"

## [1] "Second dataset dimensions : 100 x 300"

Now, it’s time to train a PLS model for each dataset built or imported.

ncomp.max <- 2

# First model
X <- data1$X
Y <- data1$Y
model1 <- gPLS(X,Y,mode = "regression", ncomp = ncomp.max, keepX = c(2,2), keepY = c(2,2), ind.block.x = seq(3,12,3), ind.block.y = seq(3,9,3))

# Second model
X <- data2$X
Y <- data2$Y
model2 <- gPLS(X,Y,mode = "regression", ncomp = ncomp.max, keepX = c(2,2), keepY = c(2,2), ind.block.x = data2$ind.block.x, ind.block.y = data2$ind.block.y)

sPLS performance assessment using MSEP

An good way to assess such a model performance consists by using $MSEP$ criterion. $MSEP$ is computed as follow :

$MSEP = \frac{1}{nq} \sum_{i=1}^{n} \sum_{j=1}^{q} (Y_{i,j} - \hat{Y}_{i,j})^2$

2025-09-03

Introduction

sPLS performance assessment using MSEP