Sparse Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
sgPLSda.Rd
Function to perform sparse group Partial Least Squares to classify samples (supervised analysis) and select variables.
Arguments
- X
numeric matrix of predictors.
NA
s are allowed.- Y
a factor or a class vector for the discrete outcome.
- ncomp
the number of components to include in the model (see Details).
- keepX
numeric vector of length
ncomp
, the number of variables to keep in \(X\)-loadings. By default all variables are kept in the model.- max.iter
integer, the maximum number of iterations.
- tol
a positive real, the tolerance used in the iterative algorithm.
- ind.block.x
a vector of integers describing the grouping of the \(X\)-variables. (see an example in Details section)
- alpha.x
The mixing parameter (value between 0 and 1) related to the sparsity within group for the \(X\) dataset.
- upper.lambda
By default
upper.lambda=10 ^ 5
. A large value specifying the upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.
Details
sgPLSda
function fit sgPLS models with \(1, \ldots ,\)ncomp
components
to the factor or class vector Y
. The appropriate indicator (dummy)
matrix is created.
ind.block.x <- c(3,10,15)
means that \(X\) is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to X\(p\) where \(p\) is the number of variables in the \(X\) matrix.
Value
sPLSda
returns an object of class "sPLSda"
, a list
that contains the following components:
- X
the centered and standardized original predictor matrix.
- Y
the centered and standardized indicator response vector or matrix.
- ind.mat
the indicator matrix.
- ncomp
the number of components included in the model.
- keepX
number of \(X\) variables kept in the model on each component.
- mat.c
matrix of coefficients to be used internally by
predict
.- variates
list containing the variates.
- loadings
list containing the estimated loadings for the
X
andY
variates.- names
list containing the names to be used for individuals and variables.
- tol
the tolerance used in the iterative algorithm, used for subsequent S3 methods
- max.iter
the maximum number of iterations, used for subsequent S3 methods
- iter
Number of iterations of the algorthm for each component
- ind.block.x
a vector of integers describing the grouping of the X variables.
- alpha.x
The mixing parameter related to the sparsity within group for the \(X\) dataset.
- upper.lambda
The upper bound of the intervall of lambda values for searching the value of the tuning parameter (lambda) corresponding to a non-zero group of variables.
References
Liquet Benoit, Lafaye de Micheaux Pierre , Hejblum Boris, Thiebaut Rodolphe (2016). A group and Sparse Group Partial Least Square approach applied in Genomics context. Bioinformatics.
On sPLS-DA: Le Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12:253.
Examples
data(simuData)
X <- simuData$X
Y <- simuData$Y
ind.block.x <- seq(100, 900, 100)
ind.block.x[2] <- 250
#To add some noise in the second group
model <- sgPLSda(X, Y, ncomp = 3,ind.block.x=ind.block.x, keepX = c(2, 2, 2)
, alpha.x = c(0.5,0.5,0.99))
result.sgPLSda <- select.sgpls(model)
result.sgPLSda$group.size.X
#> size comp1 comp2 comp3
#> 1 100 0 100 0
#> 2 150 0 0 101
#> 3 50 0 0 0
#> 4 100 100 0 0
#> 5 100 0 0 0
#> 6 100 0 100 0
#> 7 100 0 0 100
#> 8 100 0 0 0
#> 9 100 100 0 0
#> 10 100 0 0 0
##perf(model,criterion="all",validation="loo") -> res
##res$error.rate