Skip to contents

What is sPLS method ?

The sPLS method (sparse partial least Square) follows the principle of PLS but also involves applying a Lasso penalty to certain variables in order to eliminate them. In the set XX, for each component tht_h, the variable Xj(h)X^{(h)}_j is eliminated if uj(h)=0u^{(h)}_j = 0 on a. In the set YY, for each component shs_h, the variable Yj(h)Y^{(h)}_j is eliminated if vj(h)=0v^{(h)}_j = 0 on a.

When h=1h = 1, the minimization problem therefore becomes:

(u,v)=argminu,vMuvTF2+Pλ(u)+Pμ(v)(u,v) = \operatorname*{argmin}_{u,v} \; \|M - uv^T\|^2_F + P_\lambda(u) + P_\mu(v)

λ\lambda and μ\mu are the regularization parameters. μ\mu is sometimes replaced by the notation λ2\lambda_2. These parameters allow us to nuance the degree of sparsity, that is, the rate of deleted variables. In particular, the larger λ\lambda and μ\mu are, the more variables are deleted in XX and YY, respectively.

The minimization function is biconvex, so it cannot be solved simultaneously in uu and vv. The solution is therefore to proceed variable by variable: for example, we fix uu to minimize in vv, then vice versa.

The function can also be written as below:

f(u,v)=MuvTF2+Pλ(u)+Pμ(v)=i=1pj=1q(mi,juivj)2+Pλ(u)+Pμ(v)=i=1pj=1qmi,j22i=1pj=1qmi,juivj+i=1pj=1qui2vj2+Pλ(u)+Pμ(v)=i=1pj=1qmi,j22i=1puij=1qmi,jvj+i=1pui2j=1qvj2+Pλ(u)+Pμ(v)=i=1pj=1qmi,j2+i=1pui2j=1qvj22i=1puij=1qmi,jvj+2λi=1p|ui|+2μj=1q|vj|\begin{align*} f(u,v) &= \|M - uv^T\|^2_F + P_\lambda(u) + P_\mu(v) \\ &= \sum_{i=1}^{p} \sum_{j=1}^{q} (m_{i,j} - u_i v_j)^2 + P_\lambda(u) + P_\mu(v) \\ &= \sum_{i=1}^{p} \sum_{j=1}^{q} m_{i,j}^2 - 2\sum_{i=1}^{p} \sum_{j=1}^{q} m_{i,j} u_i v_j + \sum_{i=1}^{p} \sum_{j=1}^{q} u_i^2 v_j^2 \\ & + P_\lambda(u) + P_\mu(v) \\ &= \sum_{i=1}^{p} \sum_{j=1}^{q} m_{i,j}^2 - 2\sum_{i=1}^{p} u_i \sum_{j=1}^{q} m_{i,j} v_j + \sum_{i=1}^{p} u_i^2 \sum_{j=1}^{q} v_j^2 \\ & + P_\lambda(u) + P_\mu(v) \\ &= \sum_{i=1}^{p} \sum_{j=1}^{q} m_{i,j}^2 + \sum_{i=1}^{p} u_i^2 \sum_{j=1}^{q} v_j^2 - 2\sum_{i=1}^{p} u_i \sum_{j=1}^{q} m_{i,j} v_j \\ & + 2\lambda \sum_{i=1}^{p} |u_i| + 2\mu \sum_{j=1}^{q} |v_j| \end{align*}

with the following relations :

Pλ(u)=2λi=1p|ui|P_\lambda(u) = 2\lambda \sum_{i=1}^{p} |u_i|

Pμ(v)=2μj=1q|vj|P_\mu(v) = 2\mu \sum_{j=1}^{q} |v_j| When h>1h>1, we replace the expressions mi,jm_{i,j}, uiu_i, and vjv_j with mi,j(h)m^{(h)}_{i,j}, ui(h)u^{(h)}_i, and vj(h)v^{(h)}_j, respectively.

How to solve minimisation problem ?

Shen & Huang lemma

We must therefore solve:

ũ=argminu||MuvT||F2+Pλ(u)=argminu(i=1pui22i=1puij=1qmi,jvj+2λi=1p|ui|)ũi=argminuui22uij=1qmi,jvj+2λ|ui|=argminuui22ui(Mv)i+2λ|ui|\begin{align*} \tilde{u} &= \operatorname*{argmin}_{u} ||M - uv^T||^2_F + P_\lambda(u) \\ &= \operatorname*{argmin}_u\left( \sum_{i=1}^{p} u_i^2 -2\sum_{i=1}^{p}u_i\sum_{j=1}^{q} m_{i,j} v_j + 2\lambda\sum_{i=1}^{p}|u_i| \right) \\ \implies\tilde{u}_i &= \operatorname*{argmin}_u u^2_i -2u_i\sum_{j=1}^{q} m_{i,j} v_j + 2\lambda|u_i| \\ &= \operatorname*{argmin}_u u^2_i -2u_i (Mv)_i + 2\lambda|u_i| \end{align*}

Using the first lemma of Shen & Huang which will be demonstrated a little later, we arrive at the following result:

ũi=sign((Mv)i)×max(|(Mv)i|λ,0)=soft(Mv,λ)\tilde{u}_i = \text{sign}((Mv)_i) \times \max(|(Mv)_i|-\lambda,0) = \text{soft}(Mv,\lambda)

In the same way,

ṽ=argminv||MvuT||F2+Pμ(v)=argminv(j=1qvj22i=1puij=1qmi,jv)+2μj=1q|vj|ṽj=argminv(vj22vji=1pmi,jui)+2μ|vj|=sign((MTu)j)×max(|(MTu)j|μ,0)=soft((MTu)j,μ)\begin{align*} \tilde{v} &= \operatorname*{argmin}_v ||M - vu^T||^2_F + P_\mu(v) \\ &= \operatorname*{argmin}_v \left( \sum_{j=1}^{q} v_j^2 -2\sum_{i=1}^{p}u_i\sum_{j=1}^{q} m_{i,j} v \right)+ 2\mu\sum_{j=1}^{q}|v_j|\\ \implies\tilde{v}_j &= \operatorname*{argmin}_v(v^2_j -2v_j\sum_{i=1}^{p} m_{i,j} u_i)+ 2\mu|v_j| \\ &= \text{sign}((M^Tu)_j ) \times \max(|(M^Tu)_j|-\mu,0) \\ &= \text{soft}((M^Tu)_j,\mu) \end{align*}

Remarks :

  • in the problem according uu, the sum factor j=1qvj2\sum_{j=1}^{q} v_j^2 disappears because of the valid condition ||v||=j=1qvj2=1||v|| = \sqrt{\sum_{j=1}^{q} v_j^2} = 1. It is the same for vv.

  • the term i=1pj=1qmi,j2\sum_{i=1}^{p} \sum_{j=1}^{q} m_{i,j}^2 disappears because neither ũ\tilde{u} nor ṽ\tilde{v} is depending to.

  • ũi\tilde{u}_i and ṽj\tilde{v}_j expressions made disappear i=1p\sum_{i=1}^{p} and j=1q\sum_{j=1}^{q} respectively.

  • Shen & Huang lemma is defined by :

argminxx22ax+2b|x|=sign(a)×max(|a|b,0),x,a,b>0\operatorname*{argmin}_x x^2 - 2ax + 2b|x| = \text{sign}(a) \times \max(|a|-b,0), \forall x,a \in \mathbb{R}, b>0

The right expression is also called function on aa.

Lemma demonstration

Let be : f(x)=x22ax+2b|x|f(x) = x^2 - 2ax + 2b|x|
We want to find x*x^* such that f(x*)=0f'(x^*) = 0.

f(x){=2x2a+2bif x>0,]2x2a2b;2x2a+2b[if x=0,=2x2a2bif x<0.f'(x) \begin{cases} = 2x - 2a +2b & \text{if } x > 0, \\ \in ]2x - 2a -2b ; 2x - 2a +2b[ & \text{if } x = 0, \\ = 2x - 2a -2b & \text{if } x < 0. \end{cases}

f(x*)=0{2x*2a+2b=0if x*>0,2x*2a2b=0if x*<0.{x*a+b=0if x*>0,x*ab=0if x*<0.f'(x^*) = 0 \iff \begin{cases} 2x^* - 2a +2b = 0 & \text{if } x^* > 0, \\ 2x^* - 2a -2b = 0 & \text{if } x^* < 0. \end{cases} \iff \begin{cases} x^* - a +b = 0 & \text{if } x^* > 0, \\ x^* - a -b = 0 & \text{if } x^* < 0. \end{cases}

x*={abif x*>0,0if x*=0,a+bif x*<0.x*={abif a>b,0if |a|<b,a+bif a<b.x*=sign(a)×max(|a|b,0)\iff x^* = \begin{cases} a-b & \text{if } x^* > 0, \\ 0 & \text{if } x^* = 0, \\ a+b & \text{if } x^* < 0. \end{cases} \iff x^* = \begin{cases} a-b & \text{if } a > b, \\ 0 & \text{if } |a| < b, \\ a+b & \text{if } a < -b. \end{cases} \iff x^* = \text{sign}(a) \times \max(|a|-b,0)

x*x^*, aa and bb play respectively uu, MvMv and λ\lambda role on the one hand then vv, MTuM^Tu and μ\mu on the other hand.

Convergence algorithm

To solve the minimization problem, we first perform an SVD decomposition as before (first column of matrices UU and VV). The vectors found are not yet the solutions. Therefore, a convergence algorithm must be applied for each component hh.

We first define: ũold(h)=ũ(h)\tilde{u}^{(h)}_{old} = \tilde{u}^{(h)} and ṽold(h)=ṽ(h)\tilde{v}^{(h)}_{old} = \tilde{v}^{(h)}.

We must therefore calculate h1,...,H,i1,...,p,j1,...,q\forall h \in 1,...,H, \forall i \in 1,...,p, \forall j \in 1,...,q :

ũnew(h)=soft(M(h)vold(h),λh)ũnew,i(h)=soft((M(h)vold(h))i,λh)ṽnew(h)=soft(M(h)Tuold(h),μh)ṽnew,j(h)=soft((M(h)Tuold(h))j,μh)\begin{align*} \tilde{u}^{(h)}_{new} &= \text{soft}(M^{(h)}v^{(h)}_{old},\lambda_h) \implies\tilde{u}^{(h)}_{new,i} = \text{soft}((M^{(h)}v^{(h)}_{old})_i,\lambda_h) \\ \tilde{v}^{(h)}_{new} &= \text{soft}(M^{(h)T}u^{(h)}_{old},\mu_h) \implies\tilde{v}^{(h)}_{new,j} = \text{soft}((M^{(h)T}u^{(h)}_{old})_j,\mu_h) \end{align*}

We then normalize weights :

unew(h)=ũnew(h)||ũnew(h)||2;vnew(h)=ṽnew(h)||ṽnew(h)||2\begin{align*} u^{(h)}_{new} = \frac{\tilde{u}^{(h)}_{new}}{||\tilde{u}^{(h)}_{new}||_2} ; v^{(h)}_{new} = \frac{\tilde{v}^{(h)}_{new}}{||\tilde{v}^{(h)}_{new}||_2} \\ \end{align*}

We hence select unew(h)u^{(h)}_{new} and vnew(h)v^{(h)}_{new} respectively when |unew(h)uold(h)|<eps|u^{(h)}_{new} - u^{(h)}_{old}| < eps and |vnew(h)vold(h)|<eps|v^{(h)}_{new} - v^{(h)}_{old}| < eps.

While |unew(h)uold(h)|>eps|u^{(h)}_{new} - u^{(h)}_{old}| > eps or |vnew(h)vold(h)|>eps|v^{(h)}_{new} - v^{(h)}_{old}| > eps:

  • assign the following values: uold(h)=unew(h)u^{(h)}_{old} = u^{(h)}_{new} and vold(h)=vnew(h)v^{(h)}_{old} = v^{(h)}_{new}

  • repeat one more loop.

Special case of PLS1

In the case of PLS1, the minimization problem in uu becomes, considering m=XTym = X^Ty:

ũ=argminu||mu||22+Pλ(u)=argminu(i=1pui22i=1puimi+2λi=1p|ui|)ũi=argminuui22uimi+2λ|ui|=sign(mi)×max(|(mi|λ,0)=soft(mi,λ)\begin{align*} \tilde{u} &= \operatorname*{argmin}_{u} ||m - u||^2_2 + P_\lambda(u) \\ &= \operatorname*{argmin}_u\left( \sum_{i=1}^{p} u_i^2 -2\sum_{i=1}^{p}u_im_i + 2\lambda\sum_{i=1}^{p}|u_i| \right) \\ \implies\tilde{u}_i &= \operatorname*{argmin}_u u^2_i -2u_im_i + 2\lambda|u_i| \\ &= \text{sign}(m_i) \times \max(|(m_i|-\lambda,0) = \text{soft}(m_i,\lambda) \end{align*}

We can now compute (in PLS1 as in PLS2) :

th=j=1puj(h)Xj(h)=X(h)u(h)t_{h} = \sum_{j=1}^{p} u^{(h)}_j X^{(h)}_{j} = X^{(h)}u^{(h)}

sh=j=1qvj(h)Yj(h)=Y(h)v(h)s_{h} = \sum_{j=1}^{q} v^{(h)}_j Y^{(h)}_{j} = Y^{(h)}v^{(h)}

The principle of matrix deflation is the same as for the PLS method.