NMFk: Nonnegative Matrix Factorization with $k$-means clustering

NMFk

NMFk is a code within the SmartTensors framework for unsupervised, supervised and physics-informed (scientific) machine learning (ML) and artificial intelligence (AI) (web source).

NMFk

Here, an example problem demonstrating how NMFk can be applied to solve a blind source separation problem.

The goal is to extract unknown signatures embedded (mixed) in unknown fashion in analyzed datasets.

NMFk also identifies how many are the unknown (hidden, latent) signatures.

NMFk also estimates the mixing ratios at each sensor.

The extracted signatures can be also unknown sources, signals or features depending on the analyzed dataset

This type of analysis is also called feature extraction.

In summary, NMFk automatically:

NMFk installation

If NMFk is not installed, first execute in the Julia REPL:

import Pkg
Pkg.add("NMFk")
Pkg.add("Mads")
Pkg.add("Cairo")
Pkg.add("Fontconfig")
Pkg.add("Gadfly")

Loading NMFk in Julia

Synthetic problem setup

Let us generate 3 random signals:

The singals look like this:

We can collect the 3 signal vectors into a signal matrix W:

Now we can mix the signals in matrix W to produce a data matrix X representing data collected at 5 sensors (e.g., measurement devices or montoring wells at different locations).

Each of the 5 sensors is observing some mixture of the signals in W.

The way the 3 signals are mixed at the sensors is represented by the mixing matrix H.

Let us define the mixing matrix H as:

Each column of the H matrix defines how the 3 signals are represented in each sensors.

For example, the first sensor (column 1 above) detects only Signals 1 and 3.

Signal 2 is missing because H[2,1] is equal to zero.

The second sensor (column 2 above) detects Signals 1 and 2.

Signal 3 is missing because H[3,2] is equal to zero.

The entries of H matrix also define the proportions at which the signals are mixed.

For example, the first sensor (column 1 above) detects Signal 3 three times stronger than Signal 1.

The data matrix X is formed by multiplying W and H matrices.

X defines the actual dataset observed at the 4 sensors.

The data matrix X looks like this:

NMFk analysis

Now, we can assume that we only know the data matrix X.

The W and H matrices are assumed to be unknown and will be estimated by NMFk.

NMFk analysis of the data matrix X will automatically:

This can be done based only on the information in X:

NMFk returns the estimated optimal number of signals kopt which in this case, as expected, is equal to 3.

The optimal number of signals is estimated the following graph showing the quality (fit) and robustness of the soultion vs. number of signals:

NMFk also returns estimates of matrices W and H.

Here, the estimates of matrices W and H are stored as We and He objects.

We[kopt] and He[kopt] are scaled versions of the original W and H matrices:

The extracted signals are ordered by their expected importance.

The most dominant is the first signal, which is captured by Column 1 of We[kopt] and Row 1 of He[kopt].

The least dominant is the third (last) signal, which is captured by Column 3 of We[kopt] and Row 3 of He[kopt].

Note that the order of columns ('signals') in W and We[kopt] are not expected to match.

In the same way, the order of rows ('sensors') in H and He[kopt] are also not expected to match.

In general, the estimated order of 'signals' may be slightly different every time the code is executed due to randomness of the processes.

Below are plots providing comparisons between the original and estimated W an H matrices.

NMFk Pros

NMFk Cons

Math behind NMF

NMF factorizes (splits up) a non-negative input matrix ($\mathbf{X}$) into two smaller rank matrices $\mathbf{W}$ and $\mathbf{H}$.

To achieve this, NMF minimizes the following function:

$$ \Vert \mathbf{X} - \mathbf{W} \times \mathbf{H} \Vert_2 $$

NMF starts with either random or specified initialization of $\mathbf{W}$ and $\mathbf{H}$ matrices.

NMF estimates $\mathbf{W}$ and $\mathbf{H}$ that approximate $\mathbf{X}$.