gPLS performance
2025-09-03
gPLS_perf.Rmd
Introduction
This page presents an application of the gPLS performance assessment. Like the gPLS method, there are several predictions according to the components number selected in the model. The goal is to choose the best number of component in gPLS regression in order to compute the best possible predictions but also to select the best number of variables. For that, we will use two datasets:
one dataset with 15 variables ans 12 variables
one dataset with 200 variables ans 100 variables
To access to predefined functions from sgPLSdevelop package and manipulate these datasets, run these lines :
library(sgPLSdevelop)
data1 <- data.create(p = 15, q = 12)
data2 <- data.spls.create(p100 = 2, q100 = 1)
## [1] "First dataset dimensions : 40 x 27"
## [1] "Second dataset dimensions : 100 x 300"
Now, it’s time to train a PLS model for each dataset built or imported.
ncomp.max <- 2
# First model
X <- data1$X
Y <- data1$Y
model1 <- gPLS(X,Y,mode = "regression", ncomp = ncomp.max, keepX = c(2,2), keepY = c(2,2), ind.block.x = seq(3,12,3), ind.block.y = seq(3,9,3))
# Second model
X <- data2$X
Y <- data2$Y
model2 <- gPLS(X,Y,mode = "regression", ncomp = ncomp.max, keepX = c(2,2), keepY = c(2,2), ind.block.x = data2$ind.block.x, ind.block.y = data2$ind.block.y)