Sparse Partial Least Squares (sPLS)

Function to perform sparse Partial Least Squares (sPLS). The sPLS approach combines both integration and variable selection simultaneously on two data sets in a one-step strategy.

Usage

sPLS(X, Y, ncomp, mode = "regression",
     max.iter = 500, tol = 1e-06, keepX = rep(ncol(X), ncomp), 
     keepY = rep(ncol(Y), ncomp),scale=TRUE)

Arguments

X: Numeric matrix of predictors.
Y: Numeric vector or matrix of responses (for multi-response models).
ncomp: The number of components to include in the model (see Details).
mode: Character string. What type of algorithm to use, (partially) matching one of "regression" or "canonical". See Details.
max.iter: Integer, the maximum number of iterations.
tol: A positive real, the tolerance used in the iterative algorithm.
keepX: Numeric vector of length ncomp, the number of variables to keep in \(X\)-loadings. By default all variables are kept in the model.
keepY: Numeric vector of length ncomp, the number of variables to keep in \(Y\)-loadings. By default all variables are kept in the model.
scale: a logical indicating if the orignal data set need to be scaled. By default scale=TRUE

Details

sPLS function fit sPLS models with \(1, \ldots ,\)ncomp components. Multi-response models are fully supported.

The type of algorithm to use is specified with the mode argument. Two sPLS algorithms are available: sPLS regression ("regression") and sPLS canonical analysis ("canonical") (see References).

Value

sPLS returns an object of class "sPLS", a list that contains the following components:

X: The centered and standardized original predictor matrix.
Y: The centered and standardized original response vector or matrix.
ncomp: The number of components included in the model.
mode: The algorithm used to fit the model.
keepX: Number of \(X\) variables kept in the model on each component.
keepY: Number of \(Y\) variables kept in the model on each component.
mat.c: Matrix of coefficients to be used internally by predict.
variates: List containing the variates.
loadings: List containing the estimated loadings for the \(X\) and \(Y\) variates.
names: List containing the names to be used for individuals and variables.
tol: The tolerance used in the iterative algorithm, used for subsequent S3 methods
max.iter: The maximum number of iterations, used for subsequent S3 methods

References

Liquet Benoit, Lafaye de Micheaux Pierre, Hejblum Boris, Thiebaut Rodolphe. A group and Sparse Group Partial Least Square approach applied in Genomics context. Submitted.

Le Cao, K.-A., Martin, P.G.P., Robert-Grani\', C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34.

Le Cao, K.-A., Rossouw, D., Robert-Grani\'e, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.

Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034.

Tenenhaus, M. (1998). La r\'egression PLS: th\'eorie et pratique. Paris: Editions Technic.

Wold H. (1966). Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P. R. (editors), Multivariate Analysis. Academic Press, N.Y., 391-420.

Author

Benoit Liquet and Pierre Lafaye de Micheaux.

Examples


## Simulation of datasets X and Y with group variables
n <- 100
sigma.gamma <- 1
sigma.e <- 1.5
p <- 400
q <- 500
theta.x1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5),
      rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 325))
theta.x2 <- c(rep(0, 320), rep(1, 15), rep(0, 5), rep(-1, 15),
      rep(0, 5), rep(1.5, 15), rep(0, 5), rep(-1.5, 15),
      rep(0, 5))

theta.y1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5), 
      rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 425))
theta.y2 <- c(rep(0, 420), rep(1, 15), rep(0, 5), rep(-1, 15)
      ,rep(0, 5), rep(1.5, 15), rep(0, 5), rep(-1.5, 15)
      , rep(0, 5))                            


Sigmax <- matrix(0, nrow = p, ncol = p)
diag(Sigmax) <- sigma.e ^ 2
Sigmay <- matrix(0, nrow = q, ncol = q)
diag(Sigmay) <- sigma.e ^ 2

set.seed(125)

gam1 <- rnorm(n)
gam2 <- rnorm(n)

X <- matrix(c(gam1, gam2), ncol = 2, byrow = FALSE) %*% matrix(c(theta.x1, theta.x2),
     nrow = 2, byrow = TRUE) + rmvnorm(n, mean = rep(0, p), sigma =
     Sigmax, method = "svd")
Y <- matrix(c(gam1, gam2), ncol = 2, byrow = FALSE) %*% matrix(c(theta.y1, theta.y2),
     nrow = 2, byrow = TRUE) + rmvnorm(n, mean = rep(0, q), sigma =
     Sigmay, method = "svd")


ind.block.x <- seq(20, 380, 20)
ind.block.y <- seq(20, 480, 20)


#### sPLS model
model.sPLS <- sPLS(X, Y, ncomp = 2, mode = "regression", keepX = c(60, 60), 
                     keepY = c(60, 60))
result.sPLS <- select.spls(model.sPLS)
result.sPLS$select.X
#> [[1]]
#>  X1  X2  X3  X4  X5  X6  X7  X8  X9 X10 X11 X12 X13 X14 X15 X21 X22 X23 X24 X25 
#>   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  21  22  23  24  25 
#> X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 
#>  26  27  28  29  30  31  32  33  34  35  41  42  43  44  45  46  47  48  49  50 
#> X51 X52 X53 X54 X55 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 
#>  51  52  53  54  55  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75 
#> 
#> [[2]]
#> X321 X322 X323 X324 X325 X326 X327 X328 X329 X330 X331 X332 X333 X334 X335 X341 
#>  321  322  323  324  325  326  327  328  329  330  331  332  333  334  335  341 
#> X342 X343 X344 X345 X346 X347 X348 X349 X350 X351 X352 X353 X354 X355 X361 X362 
#>  342  343  344  345  346  347  348  349  350  351  352  353  354  355  361  362 
#> X363 X364 X365 X366 X367 X368 X369 X370 X371 X372 X373 X374 X375 X381 X382 X383 
#>  363  364  365  366  367  368  369  370  371  372  373  374  375  381  382  383 
#> X384 X385 X386 X387 X388 X389 X390 X391 X392 X393 X394 X395 
#>  384  385  386  387  388  389  390  391  392  393  394  395 
#> 
result.sPLS$select.Y
#> [[1]]
#>  Y1  Y2  Y3  Y4  Y5  Y6  Y7  Y8  Y9 Y10 Y11 Y12 Y13 Y14 Y15 Y21 Y22 Y23 Y24 Y25 
#>   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  21  22  23  24  25 
#> Y26 Y27 Y28 Y29 Y30 Y31 Y32 Y33 Y34 Y35 Y41 Y42 Y43 Y44 Y45 Y46 Y47 Y48 Y49 Y50 
#>  26  27  28  29  30  31  32  33  34  35  41  42  43  44  45  46  47  48  49  50 
#> Y51 Y52 Y53 Y54 Y55 Y61 Y62 Y63 Y64 Y65 Y66 Y67 Y68 Y69 Y70 Y71 Y72 Y73 Y74 Y75 
#>  51  52  53  54  55  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75 
#> 
#> [[2]]
#> Y421 Y422 Y423 Y424 Y425 Y426 Y427 Y428 Y429 Y430 Y431 Y432 Y433 Y434 Y435 Y441 
#>  421  422  423  424  425  426  427  428  429  430  431  432  433  434  435  441 
#> Y442 Y443 Y444 Y445 Y446 Y447 Y448 Y449 Y450 Y451 Y452 Y453 Y454 Y455 Y461 Y462 
#>  442  443  444  445  446  447  448  449  450  451  452  453  454  455  461  462 
#> Y463 Y464 Y465 Y466 Y467 Y468 Y469 Y470 Y471 Y472 Y473 Y474 Y475 Y481 Y482 Y483 
#>  463  464  465  466  467  468  469  470  471  472  473  474  475  481  482  483 
#> Y484 Y485 Y486 Y487 Y488 Y489 Y490 Y491 Y492 Y493 Y494 Y495 
#>  484  485  486  487  488  489  490  491  492  493  494  495 
#>