Dataset simulation
data.create.Rd
These functions allow to generate a dataset with linear dependance between \(Y\) and \(X\).
data.create
is used for quantitative response while data.cl.create
is used for qualitative response.
Usage
data.create(n = 40, p = 10, q = 1, list = TRUE)
data.cl.create(n = 40, p = 10, classes = 2, list = TRUE)
Details
By default, the population is set to \(n=40\) which is close to actual conditions. In this case, we have \(p<n\).
With data.create
function, \(Y\) is a linear combinaison from each gaussian variable \(X_j\) of \(X\).
Indeed, the function includes a matrix product to compute the response : \(Y = XB+E\) with \(B\) the weight (coefficients) matrix and \(E\) matrix the gaussian noise.
\(B\) matrix can be found in the list returned by the function (if list = TRUE
).
With data.cl.create
function, there is a link between \(X\) and the classes of \(Y\).
The list returns also Y.f
an other version of Y
but from factor class.
Examples
library(sgPLSdevelop, warn.conflicts = FALSE, verbose = FALSE, quietly = TRUE)
#> Registered S3 methods overwritten by 'sgPLSdevelop':
#> method from
#> predict.sPLS sgPLS
#> predict.gPLS sgPLS
#> predict.sgPLS sgPLS
#> predict.sPLSda sgPLS
#> predict.gPLSda sgPLS
#> predict.sgPLSda sgPLS
#> perf.sPLS sgPLS
#> perf.gPLS sgPLS
#> perf.sgPLS sgPLS
#> perf.sPLSda sgPLS
#> perf.gPLSda sgPLS
#> perf.sgPLSda sgPLS
# data.create
data <- data.create(n = 20, p = 5, q = 2, list = TRUE)
X <- data$X
Y <- data$Y
# data.cl.create
data.cl <- data.cl.create(n = 20, p = 5, classes = 3, list = TRUE)
X <- data.cl$X
Y <- data.cl$Y