Skip to contents

Introduction

This page presents an application of the sPLSDA performance assessment. The PLS method is a quite particular method : there are several predictions according to the components number selected in the model. It is the same with sPLSDA. The goal is almost to choose the best number of component in order to compute the best possible predictions. For that, we will use two datasets:

  • one is a dataset with only five predictor variable X=(X1,X2,X3,X4,X5)X = (X1,X2,X3,X4,X5) and two classes.

  • the other is a dataset with forty predictor variables X=(X1,X2,...,X40)X = (X1,X2,...,X40) ans three classes. With p>np > n, this dataset approches realist conditions for PLS training.

To access to predefined functions from sgPLSdevelop package and manipulate these datasets, run these lines :

library(sgPLSdevelop)

data1 <- data.cl.create(p = 5, list = TRUE) # 2 classes by default
data2 <- data.cl.create(n = 30, p = 40, classes = 3, list = TRUE)

Now, it’s time to train a PLS model for each dataset built.

ncomp.max <- 5
#keepX <- rep(4,ncompmax)

# First model
X <- data1$X
Y <- as.factor(data1$Y)
model1 <- sPLSda(X,Y, ncomp = ncomp.max)

# Second model
X <- data2$X
Y <- as.factor(data2$Y)
model2 <- sPLSda(X,Y, ncomp = ncomp.max)

Leave-one-out CV

First model

perf.res1 <- perf.sPLSda(model1)

h.best <- perf.res1$h.best
keepX.best <- perf.res1$keepX.best

The perf.sPLSda gives us an optimal components number equal to H=H = 1, therefore we suggest to select 1 components in our first model. The function also indicates us to select 5 variables for each component.

Second model

perf.res2 <- perf.sPLSda(model2)

h.best <- perf.res2$h.best
keepX.best <- perf.res2$keepX.best

The perf.sPLSda gives us an optimal components number equal to H=H = 1, therefore we suggest to select 1 components in our first model. The function also indicates us to select 40 variables for each component.

10-fold CV

First model

perf.res1 <- perf.sPLSda(model1, K = 10)

h.best <- perf.res1$h.best
keepX.best <- perf.res1$keepX.best

The perf.sPLSda gives us an optimal components number equal to H=H = 1, therefore we suggest to select 1 components in our first model. The function also indicates us to select 5 variables for each component.

Second model

perf.res2 <- perf.sPLSda(model2, K = 10)

h.best <- perf.res2$h.best
keepX.best <- perf.res2$keepX.best

The perf.sPLSda gives us an optimal components number equal to H=H = 1, therefore we suggest to select 1 components in our first model. The function also indicates us to select 40 variables for each component.

5-fold CV

First model

perf.res1 <- perf.sPLSda(model1, K = 5)

h.best <- perf.res1$h.best
keepX.best <- perf.res1$keepX.best

The perf.sPLSda gives us an optimal components number equal to H=H = 1, therefore we suggest to select 1 components in our first model. The function also indicates us to select 5 variables for each component.

Second model

perf.res2 <- perf.sPLSda(model2, K = 5)

h.best <- perf.res2$h.best
keepX.best <- perf.res2$keepX.best

The perf.sPLSda gives us an optimal components number equal to H=H = 1, therefore we suggest to select 1 components in our first model. The function also indicates us to select 40 variables for each component.