Dataset simulation — data.create • sgPLS

These functions allow to generate a dataset with linear dependance between \(Y\) and \(X\). data.create is used for quantitative response while data.cl.create is used for qualitative response.

Usage

data.create(n = 40, p = 10, q = 1, list = TRUE)
data.cl.create(n = 40, p = 10, classes = 2, list = TRUE)

Arguments

n: number of dataset rows
p: number of \(X\) variables
q: number of \(Y\) variables with data.create function
classes: number of classes to generate with data.cl.create function
list: By default, returns a list including the dataframe. If the argument list is set to FALSE, a dataframe is returned.

Details

By default, the population is set to \(n=40\) which is close to actual conditions. In this case, we have \(p<n\).

With data.create function, \(Y\) is a linear combinaison from each gaussian variable \(X_j\) of \(X\). Indeed, the function includes a matrix product to compute the response : \(Y = XB+E\) with \(B\) the weight (coefficients) matrix and \(E\) matrix the gaussian noise. \(B\) matrix can be found in the list returned by the function (if list = TRUE).

With data.cl.create function, there is a link between \(X\) and the classes of \(Y\). The list returns also Y.f an other version of Y but from factor class.

Examples

library(sgPLSdevelop, warn.conflicts = FALSE, verbose = FALSE, quietly = TRUE)
#> Registered S3 methods overwritten by 'sgPLSdevelop':
#>   method          from 
#>   predict.sPLS    sgPLS
#>   predict.gPLS    sgPLS
#>   predict.sgPLS   sgPLS
#>   predict.sPLSda  sgPLS
#>   predict.gPLSda  sgPLS
#>   predict.sgPLSda sgPLS
#>   perf.sPLS       sgPLS
#>   perf.gPLS       sgPLS
#>   perf.sgPLS      sgPLS
#>   perf.sPLSda     sgPLS
#>   perf.gPLSda     sgPLS
#>   perf.sgPLSda    sgPLS

# data.create
data <- data.create(n = 20, p = 5, q = 2, list = TRUE)
X <- data$X
Y <- data$Y

# data.cl.create
data.cl <- data.cl.create(n = 20, p = 5, classes = 3, list = TRUE)
X <- data.cl$X
Y <- data.cl$Y