This page provides a brief introduction, the installation instruction and two usage examples for the R package rNMF. For detailed algorithm, theory and more examples please see [J. Sun, Y. Xu, K. Lopiano and S. Young 2013]

Robust nonnegative matrix factorization (rNMF)

rNMF is a modern alternative to principal component analysis (PCA) to extract clean low dimensional structure from nonnegative high dimensional data sets, while detecting and separating corruptions (outliers). It decomposes a p by n nonnegative matrix $$X$$ into $$X\approx WH$$ by minimizing a penalized objective function

$||X-WH||_{\gamma}^2 + \alpha||W||_F + \beta\sum||H_{.j}||_1^2$

Here rows of $$X$$ are variables and columns are observations. Both $$W$$ and $$H$$ are nonnegative. $$W$$ is p by k, $$H$$ is k by n with $$k<\min\{p,n\}$$. $$\alpha$$ and $$\beta$$ control the magnitude of $$W$$ and the sparsity of $$H$$, respectively, and $$\gamma$$ represents the percentage of $$X$$ that is trimmed.

rNMF allows user to specify three types of potential outliers in X: cell, row or column-wise.

As a major advantage over the regular NMF, rNMF detects outliers in $$X$$ and remove them from the fitting objective function.

After downloading the package file “rnmf_0.5.tar.gz” from [GitHub repo address here], put it in your preferred working directory and run both of the following lines (remove the ‘##’ from the first line):

## install.packages("rnmf_0.5.tar.gz", repos = NULL, type = "source")
library(rNMF)

Examples

rNMF can be used on any nonnegative data matrix. For illustration purposes, we give two examples with image data.

The first example demonstrates the decomposition and outlier extraction of multiple corrupted images by rNMF. The second example demonstrates the compression of a single corrupted image by rNMF.

Example 1

First load the build-in data set ‘Symbols_c’, a 5625 by 30 matrix where each column contains a vectorized 75 by 75 image.

data(Symbols_c)

The function ‘see()’ shows the corrupted images:

see(Symbols_c, title = "Corrupted data set") Solid boxes in the images are corruptions. In the following we compare the decomposition results by regular NMF and rNMF.

Regular NMF decomposition (gamma = 0, k = 4) gives the following result.

res <- rnmf(Symbols_c, k = 4, showprogress = FALSE, my.seed = 100)
## Done. Time used:
##    user  system elapsed
##   4.540   0.096   4.713
## No trimming.
## Input matrix dimension: 5625 by 30
## Left matrix: 5625 by 4. Right matrix: 4 by 30
## alpha = 0. beta = 0
## Number of max iterations = 20
## Number of iterations = 16

res$fit gives the reconstruction: see(res$fit, title = "Regular NMF reconstruction with k = 4") The basis vectors of the regular NMF decomposition are given by res$W: see(res$W, title = "Regular NMF basis", layout = c(1,4)) Shadows in above images indicate that corruptions were not removed and contaminated the decomposition.

Now the same data is decomposed with robust NMF (rNMF) with 3% trimming (gamma = 0.03, k = 4)

res2 <- rnmf(Symbols_c, k = 4, gamma = 0.03, showprogress = FALSE,
my.seed = 100, tol = 0.0001, maxit = 50)
## Done. Time used:
##    user  system elapsed
##  21.682   0.487  22.423
##
##  Trimming mode = "cell". Proportion trimmed: 0.03
## Input matrix dimension: 5625 by 30
## Left matrix: 5625 by 4. Right matrix: 4 by 30
## alpha = 0. beta = 0
## Number of max iterations = 50
## Number of iterations = 23

rnmf reconstruction:

see(res2$fit, title = "rNMF reconstruction with k = 4") see(res2$W, title = "rNMF basis vectors", layout = c(1,4)) The results are more clear. The outliers are extracted as follows:

outliers <- matrix(0, nrow = nrow(Symbols_c), ncol = ncol(Symbols_c))
outliers[res2$trimmed[[res2$niter]]] <- 1
see(outliers, title = "Outliers extracted by rNMF") Example 2

In this example we compare the compression of a single corrupted image by both regular NMF and rNMF. First we load a build-in corrupted face image.

data(face)

Let’s look at the corrupted face image:

see(face, title = "Corrupted face image", col = "grey", input = "single") Regular NMF compression (trim = FALSE)

A compression by the regular NMF is shown in the following:

res <- rnmf(face, k = 10, showprogress = FALSE, my.seed = 100)
## Done. Time used:
##    user  system elapsed
##   0.268   0.006   0.277
## No trimming.
## Input matrix dimension: 192 by 168
## Left matrix: 192 by 10. Right matrix: 10 by 168
## alpha = 0. beta = 0
## Number of max iterations = 20
## Number of iterations = 8
see(res$fit, title = "NMF compression", col = "grey", input = "single") The compression is heavily contaminated. rNMF compression with trimming Now we compress the corrupted image with rNMF: res2 <- rnmf(face, k = 10, gamma = 0.025, showprogress = FALSE, my.seed = 100) ## Done. Time used: ## user system elapsed ## 0.881 0.010 0.898 ## ## Trimming mode = "cell". Proportion trimmed: 0.025 ## Input matrix dimension: 192 by 168 ## Left matrix: 192 by 10. Right matrix: 10 by 168 ## alpha = 0. beta = 0 ## Number of max iterations = 20 ## Number of iterations = 7 see(res2$fit, title = "rNMF compression", col = "grey", input = "single") Much better, isn’t it?