Partial Least Squares (PLS)

A function for performing a partial least squares (OLS) method on two data sets with more columns than rows. The PLS approach is therefore useful for data sets where OLS regression is not possible.

Usage

PLS(X, Y, ncomp, mode = "regression",scale=TRUE)

Arguments

X: numeric matrix of predictors.
Y: numeric vector or matrix of responses (for multi-response models).
ncomp: the number of components to include in the model (see Details).
mode: character string. What type of algorithm to use, (partially) matching one of "regression" or "canonical". See Details.
scale: a logical indicating if the orignal data set need to be scaled. By default scale=TRUE

Details

PLS function fits PLS models with \(1, \ldots ,\)ncomp components. Multi-response models are fully supported.

The type of algorithm to use is specified with the mode argument. Two PLS algorithms are available: PLS regression ("regression") and PLS canonical analysis ("canonical") (see References).

Value

PLS returns an object of class "PLS", a list that contains the following components:

X: the centered and standardized original predictor matrix.
Y: the centered and standardized original response vector or matrix.
ncomp: the number of components included in the model.
mode: the algorithm used to fit the model.
mat.c: matrix of coefficients to be used internally by predict.
variates: list containing the variates.
loadings: list containing the estimated loadings for the \(X\) and \(Y\) variates.
names: list containing the names to be used for individuals and variables.

References

Liquet Benoit, Lafaye de Micheaux Pierre, Hejblum Boris, Thiebaut Rodolphe. Group and sparse group partial least square approaches applied in genomics context. bioinformatics

Author

Daniel FLORES.

Examples

# Simulation of datasets X and Y with group variables

library(sgPLSdevelop)

## First example 

### paramaters
n <- 100
sigma.gamma <- 1
sigma.e <- 1.5
p <- 400
q <- 500

theta.x1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5), rep(1.5,15), 
              rep(0, 5), rep(-1.5, 15), rep(0, 325))
theta.x2 <- c(rep(0, 320), rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5),
              rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 5))

theta.y1 <- c(rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5), rep(1.5, 15),
              rep(0, 5), rep(-1.5, 15), rep(0, 425))
theta.y2 <- c(rep(0, 420), rep(1, 15), rep(0, 5), rep(-1, 15), rep(0, 5),
              rep(1.5, 15), rep(0, 5), rep(-1.5, 15), rep(0, 5))                            

### covariance matrices
Sigmax <- matrix(0, nrow = p, ncol = p)
diag(Sigmax) <- sigma.e ^ 2
Sigmay <- matrix(0,nrow = q, ncol = q)
diag(Sigmay) <- sigma.e ^ 2

set.seed(125)

gam1 <- rnorm(n)
gam2 <- rnorm(n)

GAM <- matrix(c(gam1, gam2), ncol = 2, byrow = FALSE)
Thetax <- matrix(c(theta.x1, theta.x2), nrow = 2, byrow = TRUE)
Thetay <- matrix(c(theta.y1, theta.y2), nrow = 2, byrow = TRUE)
E1 <- rmvnorm(n, mean = rep(0, p), sigma = Sigmax, method = "svd")
E2 <- rmvnorm(n, mean = rep(0, q), sigma = Sigmay, method = "svd")
  
X <- GAM %*% Thetax + E1                                                
Y <- GAM %*% Thetay + E2

### PLS model
model.PLS <- PLS(X, Y, ncomp = 2, mode = "regression")


## Second example

train <- 1:40
test <- 41:50
n.test <- length(test)

d <- data.create(n = 50, p = 10, q = 2, list = TRUE)

X <- d$X[train,]
Y <- d$Y[train,]
X.test <- d$X[test,]
Y.test <- d$Y[test,]

ncompmax <- 10
model.pls <- PLS(X = X, Y = Y, ncomp = ncompmax, mode = "regression")
pred <- predict.PLS(model.pls, newdata = X.test)$predict