Sparse Partial Least Squares (sPLS)
sPLS.Rd
Function to perform sparse Partial Least Squares (sPLS). The sPLS approach combines both integration and variable selection simultaneously on two data sets in a one-step strategy.
Arguments
- X
Numeric matrix of predictors.
- Y
Numeric vector or matrix of responses (for multi-response models).
- ncomp
The number of components to include in the model (see Details).
- mode
Character string. What type of algorithm to use, (partially) matching one of
"regression"
or"canonical"
. See Details.- max.iter
Integer, the maximum number of iterations.
- tol
A positive real, the tolerance used in the iterative algorithm.
- keepX
Numeric vector of length
ncomp
, the number of variables to keep in \(X\)-loadings. By default all variables are kept in the model.- keepY
Numeric vector of length
ncomp
, the number of variables to keep in \(Y\)-loadings. By default all variables are kept in the model.- scale
a logical indicating if the orignal data set need to be scaled. By default
scale
=TRUE
Details
sPLS
function fit sPLS models with \(1, \ldots ,\)ncomp
components.
Multi-response models are fully supported.
The type of algorithm to use is specified with the mode
argument. Two sPLS
algorithms are available: sPLS regression ("regression")
and sPLS canonical analysis
("canonical")
(see References).
Value
sPLS
returns an object of class "sPLS"
, a list
that contains the following components:
- X
The centered and standardized original predictor matrix.
- Y
The centered and standardized original response vector or matrix.
- ncomp
The number of components included in the model.
- mode
The algorithm used to fit the model.
- keepX
Number of \(X\) variables kept in the model on each component.
- keepY
Number of \(Y\) variables kept in the model on each component.
- mat.c
Matrix of coefficients to be used internally by
predict
.- variates
List containing the variates.
- loadings
List containing the estimated loadings for the \(X\) and \(Y\) variates.
- names
List containing the names to be used for individuals and variables.
- tol
The tolerance used in the iterative algorithm, used for subsequent S3 methods
- max.iter
The maximum number of iterations, used for subsequent S3 methods
References
Liquet Benoit, Lafaye de Micheaux Pierre, Hejblum Boris, Thiebaut Rodolphe. A group and Sparse Group Partial Least Square approach applied in Genomics context. Submitted.
Le Cao, K.-A., Martin, P.G.P., Robert-Grani\', C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34.
Le Cao, K.-A., Rossouw, D., Robert-Grani\'e, C. and Besse, P. (2008). A sparse PLS for variable selection when integrating Omics data. Statistical Applications in Genetics and Molecular Biology 7, article 35.
Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034.
Tenenhaus, M. (1998). La r\'egression PLS: th\'eorie et pratique. Paris: Editions Technic.
Wold H. (1966). Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P. R. (editors), Multivariate Analysis. Academic Press, N.Y., 391-420.
Examples
## Simulation of datasets X and Y with group variables
n <- 100
sigma.gamma <- 1
sigma.e <- 1.5
p <- 400
q <- 500
theta.x1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5),
rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 325))
theta.x2 <- c(rep(0, 320), rep(1, 15), rep(0, 5), rep(-1, 15),
rep(0, 5), rep(1.5, 15), rep(0, 5), rep(-1.5, 15),
rep(0, 5))
theta.y1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5),
rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 425))
theta.y2 <- c(rep(0, 420), rep(1, 15), rep(0, 5), rep(-1, 15)
,rep(0, 5), rep(1.5, 15), rep(0, 5), rep(-1.5, 15)
, rep(0, 5))
Sigmax <- matrix(0, nrow = p, ncol = p)
diag(Sigmax) <- sigma.e ^ 2
Sigmay <- matrix(0, nrow = q, ncol = q)
diag(Sigmay) <- sigma.e ^ 2
set.seed(125)
gam1 <- rnorm(n)
gam2 <- rnorm(n)
X <- matrix(c(gam1, gam2), ncol = 2, byrow = FALSE) %*% matrix(c(theta.x1, theta.x2),
nrow = 2, byrow = TRUE) + rmvnorm(n, mean = rep(0, p), sigma =
Sigmax, method = "svd")
Y <- matrix(c(gam1, gam2), ncol = 2, byrow = FALSE) %*% matrix(c(theta.y1, theta.y2),
nrow = 2, byrow = TRUE) + rmvnorm(n, mean = rep(0, q), sigma =
Sigmay, method = "svd")
ind.block.x <- seq(20, 380, 20)
ind.block.y <- seq(20, 480, 20)
#### sPLS model
model.sPLS <- sPLS(X, Y, ncomp = 2, mode = "regression", keepX = c(60, 60),
keepY = c(60, 60))
result.sPLS <- select.spls(model.sPLS)
result.sPLS$select.X
#> [[1]]
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X21 X22 X23 X24 X25
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 21 22 23 24 25
#> X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50
#> 26 27 28 29 30 31 32 33 34 35 41 42 43 44 45 46 47 48 49 50
#> X51 X52 X53 X54 X55 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75
#> 51 52 53 54 55 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#>
#> [[2]]
#> X321 X322 X323 X324 X325 X326 X327 X328 X329 X330 X331 X332 X333 X334 X335 X341
#> 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 341
#> X342 X343 X344 X345 X346 X347 X348 X349 X350 X351 X352 X353 X354 X355 X361 X362
#> 342 343 344 345 346 347 348 349 350 351 352 353 354 355 361 362
#> X363 X364 X365 X366 X367 X368 X369 X370 X371 X372 X373 X374 X375 X381 X382 X383
#> 363 364 365 366 367 368 369 370 371 372 373 374 375 381 382 383
#> X384 X385 X386 X387 X388 X389 X390 X391 X392 X393 X394 X395
#> 384 385 386 387 388 389 390 391 392 393 394 395
#>
result.sPLS$select.Y
#> [[1]]
#> Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Y21 Y22 Y23 Y24 Y25
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 21 22 23 24 25
#> Y26 Y27 Y28 Y29 Y30 Y31 Y32 Y33 Y34 Y35 Y41 Y42 Y43 Y44 Y45 Y46 Y47 Y48 Y49 Y50
#> 26 27 28 29 30 31 32 33 34 35 41 42 43 44 45 46 47 48 49 50
#> Y51 Y52 Y53 Y54 Y55 Y61 Y62 Y63 Y64 Y65 Y66 Y67 Y68 Y69 Y70 Y71 Y72 Y73 Y74 Y75
#> 51 52 53 54 55 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
#>
#> [[2]]
#> Y421 Y422 Y423 Y424 Y425 Y426 Y427 Y428 Y429 Y430 Y431 Y432 Y433 Y434 Y435 Y441
#> 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 441
#> Y442 Y443 Y444 Y445 Y446 Y447 Y448 Y449 Y450 Y451 Y452 Y453 Y454 Y455 Y461 Y462
#> 442 443 444 445 446 447 448 449 450 451 452 453 454 455 461 462
#> Y463 Y464 Y465 Y466 Y467 Y468 Y469 Y470 Y471 Y472 Y473 Y474 Y475 Y481 Y482 Y483
#> 463 464 465 466 467 468 469 470 471 472 473 474 475 481 482 483
#> Y484 Y485 Y486 Y487 Y488 Y489 Y490 Y491 Y492 Y493 Y494 Y495
#> 484 485 486 487 488 489 490 491 492 493 494 495
#>