Sparse Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)

Function to perform sparse group Partial Least Squares to classify samples (supervised analysis) and select variables.

Usage

sgPLSda(X, Y, ncomp = 2, keepX = rep(ncol(X), ncomp),
       max.iter = 500, tol = 1e-06, ind.block.x,
     alpha.x, upper.lambda = 10 ^ 5)

Arguments

X: numeric matrix of predictors. NAs are allowed.
Y: a factor or a class vector for the discrete outcome.
ncomp: the number of components to include in the model (see Details).
keepX: numeric vector of length ncomp, the number of variables to keep in \(X\)-loadings. By default all variables are kept in the model.
max.iter: integer, the maximum number of iterations.
tol: a positive real, the tolerance used in the iterative algorithm.
ind.block.x: a vector of integers describing the grouping of the \(X\)-variables. (see an example in Details section)
alpha.x: The mixing parameter (value between 0 and 1) related to the sparsity within group for the \(X\) dataset.
upper.lambda: By default upper.lambda=10 ^ 5. A large value specifying the upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.

Details

sgPLSda function fit sgPLS models with \(1, \ldots ,\)ncomp components to the factor or class vector Y. The appropriate indicator (dummy) matrix is created.

ind.block.x <- c(3,10,15) means that \(X\) is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to X\(p\) where \(p\) is the number of variables in the \(X\) matrix.

Value

sPLSda returns an object of class "sPLSda", a list that contains the following components:

X: the centered and standardized original predictor matrix.
Y: the centered and standardized indicator response vector or matrix.
ind.mat: the indicator matrix.
ncomp: the number of components included in the model.
keepX: number of \(X\) variables kept in the model on each component.
mat.c: matrix of coefficients to be used internally by predict.
variates: list containing the variates.
loadings: list containing the estimated loadings for the X and Y variates.
names: list containing the names to be used for individuals and variables.
tol: the tolerance used in the iterative algorithm, used for subsequent S3 methods
max.iter: the maximum number of iterations, used for subsequent S3 methods
iter: Number of iterations of the algorthm for each component
ind.block.x: a vector of integers describing the grouping of the X variables.
alpha.x: The mixing parameter related to the sparsity within group for the \(X\) dataset.
upper.lambda: The upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.

References

Liquet Benoit, Lafaye de Micheaux Pierre , Hejblum Boris, Thiebaut Rodolphe (2016). A group and Sparse Group Partial Least Square approach applied in Genomics context. Bioinformatics.

On sPLS-DA: Le Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12:253.

Author

Benoit Liquet and Pierre Lafaye de Micheaux.

Examples

data(simuData)
X <- simuData$X
Y <- simuData$Y
ind.block.x <- seq(100, 900, 100)
ind.block.x[2] <- 250
#To add some noise in the second group
model <- sgPLSda(X, Y, ncomp = 3,ind.block.x=ind.block.x, keepX = c(2, 2, 2)
, alpha.x = c(0.5,0.5,0.99))
result.sgPLSda <- select.sgpls(model)
result.sgPLSda$group.size.X
#>    size comp1 comp2 comp3
#> 1   100     0   100     0
#> 2   150     0     0   101
#> 3    50     0     0     0
#> 4   100   100     0     0
#> 5   100     0     0     0
#> 6   100     0   100     0
#> 7   100     0     0   100
#> 8   100     0     0     0
#> 9   100   100     0     0
#> 10  100     0     0     0
##perf(model,criterion="all",validation="loo") -> res
##res$error.rate