Partial Least Squares (PLS)
PLS.Rd
A function for performing a partial least squares (OLS) method on two data sets with more columns than rows. The PLS approach is therefore useful for data sets where OLS regression is not possible.
Arguments
- X
numeric matrix of predictors.
- Y
numeric vector or matrix of responses (for multi-response models).
- ncomp
the number of components to include in the model (see Details).
- mode
character string. What type of algorithm to use, (partially) matching one of
"regression"
or"canonical"
. See Details.- scale
a logical indicating if the orignal data set need to be scaled. By default
scale
=TRUE
Details
PLS
function fits PLS models with \(1, \ldots ,\)ncomp
components.
Multi-response models are fully supported.
The type of algorithm to use is specified with the mode
argument. Two PLS
algorithms are available: PLS regression ("regression")
and PLS canonical analysis
("canonical")
(see References).
Value
PLS
returns an object of class "PLS"
, a list
that contains the following components:
- X
the centered and standardized original predictor matrix.
- Y
the centered and standardized original response vector or matrix.
- ncomp
the number of components included in the model.
- mode
the algorithm used to fit the model.
- mat.c
matrix of coefficients to be used internally by
predict
.- variates
list containing the variates.
- loadings
list containing the estimated loadings for the \(X\) and \(Y\) variates.
- names
list containing the names to be used for individuals and variables.
References
Liquet Benoit, Lafaye de Micheaux Pierre, Hejblum Boris, Thiebaut Rodolphe. Group and sparse group partial least square approaches applied in genomics context. bioinformatics
Examples
# Simulation of datasets X and Y with group variables
library(sgPLSdevelop)
## First example
### paramaters
n <- 100
sigma.gamma <- 1
sigma.e <- 1.5
p <- 400
q <- 500
theta.x1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5), rep(1.5,15),
rep(0, 5), rep(-1.5, 15), rep(0, 325))
theta.x2 <- c(rep(0, 320), rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5),
rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 5))
theta.y1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5), rep(1.5, 15),
rep(0, 5), rep(-1.5, 15), rep(0, 425))
theta.y2 <- c(rep(0, 420), rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5),
rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 5))
### covariance matrices
Sigmax <- matrix(0, nrow = p, ncol = p)
diag(Sigmax) <- sigma.e ^ 2
Sigmay <- matrix(0,nrow = q, ncol = q)
diag(Sigmay) <- sigma.e ^ 2
set.seed(125)
gam1 <- rnorm(n)
gam2 <- rnorm(n)
GAM <- matrix(c(gam1, gam2), ncol = 2, byrow = FALSE)
Thetax <- matrix(c(theta.x1, theta.x2), nrow = 2, byrow = TRUE)
Thetay <- matrix(c(theta.y1, theta.y2), nrow = 2, byrow = TRUE)
E1 <- rmvnorm(n, mean = rep(0, p), sigma = Sigmax, method = "svd")
E2 <- rmvnorm(n, mean = rep(0, q), sigma = Sigmay, method = "svd")
X <- GAM %*% Thetax + E1
Y <- GAM %*% Thetay + E2
### PLS model
model.PLS <- PLS(X, Y, ncomp = 2, mode = "regression")
## Second example
train <- 1:40
test <- 41:50
n.test <- length(test)
d <- data.create(n = 50, p = 10, q = 2, list = TRUE)
X <- d$X[train,]
Y <- d$Y[train,]
X.test <- d$X[test,]
Y.test <- d$Y[test,]
ncompmax <- 10
model.pls <- PLS(X = X, Y = Y, ncomp = ncompmax, mode = "regression")
pred <- predict.PLS(model.pls, newdata = X.test)$predict