Group Sparse Partial Least Squares Discriminant Analysis (sPLS-DA)
gPLSda.Rd
Function to perform group Partial Least Squares to classify samples (supervised analysis) and select variables.
Arguments
- X
numeric matrix of predictors.
NA
s are allowed.- Y
a factor or a class vector for the discrete outcome.
- ncomp
the number of components to include in the model (see Details).
- keepX
numeric vector of length
ncomp
, the number of variables to keep in \(X\)-loadings. By default all variables are kept in the model.- max.iter
integer, the maximum number of iterations.
- tol
a positive real, the tolerance used in the iterative algorithm.
- ind.block.x
a vector of integers describing the grouping of the \(X\)-variables. (see an example in Details section)
Details
gPLSda
function fit gPLS models with \(1, \ldots ,\)ncomp
components
to the factor or class vector Y
. The appropriate indicator (dummy)
matrix is created.
ind.block.x <- c(3,10,15)
means that \(X\) is structured into 4 groups: X1 to X3; X4 to X10, X11 to X15 and X16 to X\(p\) where \(p\) is the number of variables in the \(X\) matrix.
Value
sPLSda
returns an object of class "sPLSda"
, a list
that contains the following components:
- X
the centered and standardized original predictor matrix.
- Y
the centered and standardized indicator response vector or matrix.
- ind.mat
the indicator matrix.
- ncomp
the number of components included in the model.
- keepX
number of \(X\) variables kept in the model on each component.
- mat.c
matrix of coefficients to be used internally by
predict
.- variates
list containing the variates.
- loadings
list containing the estimated loadings for the
X
andY
variates.- names
list containing the names to be used for individuals and variables.
- tol
the tolerance used in the iterative algorithm, used for subsequent S3 methods
- max.iter
the maximum number of iterations, used for subsequent S3 methods
- iter
Number of iterations of the algorthm for each component
- ind.block.x
a vector of integers describing the grouping of the X variables.
References
Liquet Benoit, Lafaye de Micheaux Pierre , Hejblum Boris, Thiebaut Rodolphe (2016). A group and Sparse Group Partial Least Square approach applied in Genomics context. Bioinformatics.
On sPLS-DA: Le Cao, K.-A., Boitard, S. and Besse, P. (2011). Sparse PLS Discriminant Analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics 12:253.
Examples
data(simuData)
X <- simuData$X
Y <- simuData$Y
ind.block.x <- seq(100, 900, 100)
model <- gPLSda(X, Y, ncomp = 3,ind.block.x=ind.block.x, keepX = c(2, 2, 2))
result.gPLSda <- select.sgpls(model)
result.gPLSda$group.size.X
#> size comp1 comp2 comp3
#> 1 100 0 100 0
#> 2 100 0 0 100
#> 3 100 0 0 0
#> 4 100 100 0 0
#> 5 100 0 0 0
#> 6 100 0 100 0
#> 7 100 0 0 100
#> 8 100 0 0 0
#> 9 100 100 0 0
#> 10 100 0 0 0
# perf(model,criterion="all",validation="loo") -> res
# res$error.rate