Saturday, June 02, 2012

Sparse Matrix Multiplication

I want the Math.NET Numerics developers to know their work is great, they put together an easy to use, astoundingly well documented numerical library for .NET. Please know this little criticism comes from a place of respect. It could even be that the code has been updated since your last release and what I'm going to point out is no longer a problem.

I really don't know much about calculus and mathematics at that level. I barely passed A-level maths, and the only time I've used any of the knowledge gained therein was when I had to calculate the first derivative of 1-e-x at university. My mathematics skills are weak (sadly). So, when, in mid-April, I was asked at work to implement some maths heavy algorithms, I felt suitably challenged. Thankfully the scientist who was feeding me the algorithms understood them really well and was on hand to explain things to me over and over again until we finally got things working yesterday. Yay!

Some of what we did relied on sparse matrices, something I had heard of, but never used. So my first thought was that I needed a third party library to do these calculations. The library we are currently using is the bluebit .NET matrix library, it's not perfect and we'll have to replace it with something faster, but for the moment it makes the code testable. This matrix was not my first choice, ideally I wanted something we didn't have to pay for. My first stop was the Math.NET Numerics library. This, unfortunately proved to be too slow. I also tried out Extreme Optimization, but this library was also too slow. Other libraries I looked at were ILNumerics, IMSL.NET and Center Space NMath. I looked but I did not test these last three because each library's API and help were so bad I couldn't figure out how to do what I needed to do. I don't have time to figure out matrix maths, this is why I'm looking for a library. If you want me to choose yours, make it easy to use.

So that was the bulk of the outcome of my foray in numerical libraries. Bluebit is my current choice, but I will have to change it for something faster. This is not the only thing I learned. I learned something that I hope, if they haven't already, the Math.NET developers will be able to use in their code. I've not time to dive into the project, and patch it myself — as I've said, my understanding of the maths is not great — so feel free to take the code here and fix it to work in the library.

At work I'm dealing with quite large matrices. The stuff I've been testing with is 8K x 8K points, and the real data will probably be up to 32K x 32K. But these are sparse matrices, so working with them should not be too processor and memory intensive. The major things I need to do are transposition, multiplication and inversion. Inversion is the killer, and understanding it is currently over my head. It's the place where Extreme Optimization fell down, and where bluebit struggles. I need the algorithms to run in a few seconds. Currently, with 16K x 16K points and bluebit, it's taking 2 minutes. The algorithm did not complete with the other two libraries. I waited for over half an hour, and still nothing, and that was with 8K data.

The first problem that Math.NET encountered was with the multiplication of the matrices. This is what I hope I've optimised. All I've done is profile their code and change the bit that took forever - assigning data to a point in the matrix

My first step was to write these two tests, to make sure I was multiplying the matrices correctly:

[Test]
public void MatrixMultiplication()
{
    var leftM = new double[,] {{4, 5, 6, 7, 8, 1, 2}, {3, 9, 6, 7, 3, 3, 1}, {2, 2, 8, 4, 1, 8, 1}, {1, 9, 9, 4, 3, 1, 2}};
    var rightM = new double[,] {{1, 8, 1}, {2, 6, 2}, {3, 4, 1}, {4, 2, 2}, {5, 1, 1}, {6, 3, 2}, {7, 5, 1}};
    var expectedM = new double[,] {{120, 121, 46}, {107, 133, 51}, {106, 98, 40}, {97, 122, 43}};

    var sm = new SparseMatrix();

    var resultM = sm.MultiplyMatrices(leftM, rightM);

    Assert.AreEqual(expectedM.Rank, resultM.Rank);
    Assert.AreEqual(expectedM.GetLength(0), resultM.GetLength(0));
    Assert.AreEqual(expectedM.GetLength(1), resultM.GetLength(1));

    for(int row = 0; row < 4; row++)
    {
        for(int col = 0; col < 3; col++)
        {
            Assert.AreEqual(expectedM[row, col], resultM[row, col]);
        }
    }
}

[Test]
public void SparseMatrixMultiplication()
{
    var leftM = new double[,] {{1,2,3,0,0,0,0,0,0,0}, {0,0,0,0,0,1,2,0,0,0}, {1,0,4,0,0,5,0,0,0,0}, {0,4,0,5,0,6,0,0,7,0}, {9,0,0,0,0,0,8,0,0,0}};
    var rightM = new double[,] {{0,2,0,4,0}, {1,0,0,1,1}, {3,0,1,3,0}, {4,0,0,0,0}, {0,5,6,0,0}, {0,9,0,6,0}, {0,1,0,3,0}, {0,0,8,0,9}, {0,0,0,0,7}, {0,1,0,0,5}};
    var expectedM = new double[,] {{11,2,3,15,2}, {0,11,0,12,0}, {12,47,4,46,0}, {24,54,0,40,53}, {0,26,0,60,0}};
 
    var sm = new SparseMatrix();

    var resultM = bc.MultiplyMatrices(leftM, rightM);

    for (int row = 0; row < 4; row++)
    {
        for (int col = 0; col < 3; col++)
        {
            Assert.AreEqual(expectedM[row, col], resultM[row, col]);
        }
    }
}

(SparseMatrix isn't really the name of the class, I put the multiplication into the class that was handling the algorithm, but I'm not allowed to talk about that!)

Then I spent ages struggling (because of my ignorance, the code is easy to read) with the Math.NET code to try and understand sparse matrix multiplication - how it could be faster than normal matrix multiplication, and how I could implement it faster. It took a couple of days. I spent a couple of days, rather than giving up and finding a proprietary library right away, because I thought that Math.NET would do the business when it came to inversion. Sadly this isn't the case. Anyway, this is my optimised sparse matrix multiplication method:

private IEnumerable GetNonZeroIndicesForMatrixColumn(double[,] matrix, long col, int rowcount)
{
    for (int row = 0; row < rowcount; row++)
    {
        if (matrix[row, col] != 0)
        {
            yield return row;
        }
    }
}

private IEnumerable GetNonZeroIndicesForMatrixRow(double[,] matrix, int row, int colcount)
{
    for (int col = 0; col < colcount; col++)
    {
        if (matrix[row, col] != 0)
        {
            yield return col;
        }
    }
}
        
/// <summary>
/// Matrix multiplication optimised for sparse matrices
/// </summary>
/// <param name="matrix1">Matrix on the left of the multiplication</param>
/// <param name="matrix2">Matrix on the right of the multiplication</param>
/// <returns>A matrix that is the multiplication of the two passed in</returns>
public double[,] MultiplyMatrices(double[,] matrix1, double[,] matrix2)
{
    int j = matrix1.GetLength(1);
    if (j != matrix2.GetLength(0))
    {
        throw new ArgumentException("matrix1 must have the same number of columns as matrix2 has rows.");
    }

    int m1Rows = matrix1.GetLength(0);
    int m2Cols = matrix2.GetLength(1);
    double[,] result = new double[m1Rows, m2Cols];

    var nonZeroRows = new List[m1Rows];

    Parallel.For(0, m1Rows, row =>
    {
        nonZeroRows[row] = GetNonZeroIndicesForMatrixRow(matrix1, row, j).ToList();
    });

    var nonZeroColumns = new List[m2Cols];

    Parallel.For(0, m2Cols, col =>
    {
        nonZeroColumns[col] = GetNonZeroIndicesForMatrixColumn(matrix2, col, j).ToList();
    });



    Parallel.For(0, m1Rows , row =>
    {
        Parallel.For(0, m2Cols, column =>
        {
            var ns = nonZeroColumns[column].Intersect(nonZeroRows[row]);
            double sum = ns.Sum(n => matrix1[row, n] * matrix2[n, column]);
            result[row, column] = sum;
        });
    });

    return result;
}

As you can see, there is a lot of reliance on the parallel methods that come with .NET 4. That, coupled with the trick of getting the intersection of the non-zeros in the rows of the left matrix with the columns of the right matrix, seems to be the major advantage of my method over Math.NET, because their assignments can't be done in parallel. This could be to do with Silverlight compatibility issues, I don't know. I don't have to worry about Silverlight.

I have run a benchmark for my code. I created a 5000 x 5000 point matrix and filled it at random points with random data (well, pseudo-random). I benchmarked at 5, 50, 150 and 500 non-zero items per row. I ran the test 10 times, to get a mean. The table shows the results:

Number of non-zeros per rowMean seconds taken to multiplyStandard Deviation
56.244657160.1037383251
5051.109723320.8521258197
15093.2973362977.751344564
50013.184354116.4991175895

I find it strange that the standard deviation for the 150 condition is so high. If anyone can see a problem in my code, I'd be really happy to hear it! The full test is below:

toggle test code
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

namespace Math.NetBenchmark
{
    class Program
    {
        private static Random _r = new Random();

        static void Main(string[] args)
        {
            const int rows = 5000;
            const int cols = 5000;
            var nonzerosPerRow = new [] {5, 50, 150, 500};

            Console.WriteLine("started");
            using (var sw = new StreamWriter("MyMX10.results"))
            {
                sw.WriteLine("Number of non-zeros,Time taken");
                foreach (var nzpr in nonzerosPerRow)
                {
                    Console.Write(nzpr+" - making left");
                    var left = MakeMatrix(rows, cols, nzpr);
                    Console.Write("making right");
                    var right = MakeMatrix(rows, cols, nzpr);
                    Console.Write("multiplying...");
                    var startTime = DateTime.Now;
                    MultiplyMatrices(left, right);
                    var endTime = DateTime.Now;
                    var diff = endTime - startTime;
                    sw.WriteLine(nzpr + "," + diff.TotalSeconds);
                    Console.WriteLine("done");
                }
            }

            Console.WriteLine("done");
        }

        private static double[,] MakeMatrix(int rows, int cols, int nonzerosPerRow)
        {
            var result = new double[rows, cols];
            var colsPoss = Enumerable.Range(0, cols).ToArray();
            Parallel.For(0, rows, iRow =>
            {
                var posleft = colsPoss;
                Console.Write(".");
                for (int i = 0; i < nonzerosPerRow; i++)
                {
                    int posindex = _r.Next(posleft.Length);
                    int index = posleft[posindex];
                    result[iRow, index] = 1+_r.NextDouble();
                    posleft = posleft.Take(index).Concat(posleft.Skip(index+1)).ToArray();
                }
            });
            return result;
        }

        private static IEnumerable GetNonZeroIndicesForMatrixColumn(double[,] matrix, long col, int rowcount)
        {
            for (int row = 0; row < rowcount; row++)
            {
                if (matrix[row, col] != 0)
                {
                    yield return row;
                }
            }
        }

        private static IEnumerable GetNonZeroIndicesForMatrixRow(double[,] matrix, int row, int colcount)
        {
            for (int col = 0; col < colcount; col++)
            {
                if (matrix[row, col] != 0)
                {
                    yield return col;
                }
            }
        }

        public static double[,] MultiplyMatrices(double[,] matrix1, double[,] matrix2)
        {
            int j = matrix1.GetLength(1);
            if (j != matrix2.GetLength(0))
            {
                throw new ArgumentException("matrix1 must have the same number of columns as matrix2 has rows.");
            }

            int m1Rows = matrix1.GetLength(0);
            int m2Cols = matrix2.GetLength(1);
            double[,] result = new double[m1Rows, m2Cols];

            var nonZeroRows = new List[m1Rows];

            Parallel.For(0, m1Rows, row =>
            {
                nonZeroRows[row] = GetNonZeroIndicesForMatrixRow(matrix1, row, j).ToList();
            });

            var nonZeroColumns = new List[m2Cols];

            Parallel.For(0, m2Cols, col =>
            {
                nonZeroColumns[col] = GetNonZeroIndicesForMatrixColumn(matrix2, col, j).ToList();
            });

            Parallel.For(0, m1Rows, row =>
            {
                Parallel.For(0, m2Cols, column =>
                {
                    var ns = nonZeroColumns[column].Intersect(nonZeroRows[row]);
                    double sum = ns.Sum(n => matrix1[row, n] * matrix2[n, column]);
                    result[row, column] = sum;
                });
            });

            return result;
        }
    }
}