Skip to contents

Introduction

This page presents an application of the gPLS performance assessment. Like the gPLS method, there are several predictions according to the components number selected in the model. The goal is to choose the best number of component in gPLS regression in order to compute the best possible predictions but also to select the best number of variables. For that, we will use two datasets:

  • one dataset with 15 XX variables ans 12 YY variables

  • one dataset with 200 XX variables ans 100 YY variables

To access to predefined functions from sgPLSdevelop package and manipulate these datasets, run these lines :

library(sgPLSdevelop)

data1 <- data.create(p = 15, q = 12)
data2 <- data.spls.create(p100 = 2, q100 = 1)
## [1] "First dataset dimensions : 40 x 27"
## [1] "Second dataset dimensions : 100 x 300"

Now, it’s time to train a PLS model for each dataset built or imported.

ncomp.max <- 2

# First model
X <- data1$X
Y <- data1$Y
model1 <- gPLS(X,Y,mode = "regression", ncomp = ncomp.max, keepX = c(2,2), keepY = c(2,2), ind.block.x = seq(3,12,3), ind.block.y = seq(3,9,3))

# Second model
X <- data2$X
Y <- data2$Y
model2 <- gPLS(X,Y,mode = "regression", ncomp = ncomp.max, keepX = c(2,2), keepY = c(2,2), ind.block.x = data2$ind.block.x, ind.block.y = data2$ind.block.y)

sPLS performance assessment using MSEP

An good way to assess such a model performance consists by using MSEPMSEP criterion. MSEPMSEP is computed as follow :

MSEP=1nqi=1nj=1q(Yi,jŶi,j)2MSEP = \frac{1}{nq} \sum_{i=1}^{n} \sum_{j=1}^{q} (Y_{i,j} - \hat{Y}_{i,j})^2