Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

Saturday, June 02, 2012

Sparse Matrix Multiplication

I want the Math.NET Numerics developers to know their work is great, they put together an easy to use, astoundingly well documented numerical library for .NET. Please know this little criticism comes from a place of respect. It could even be that the code has been updated since your last release and what I'm going to point out is no longer a problem.

I really don't know much about calculus and mathematics at that level. I barely passed A-level maths, and the only time I've used any of the knowledge gained therein was when I had to calculate the first derivative of 1-e-x at university. My mathematics skills are weak (sadly). So, when, in mid-April, I was asked at work to implement some maths heavy algorithms, I felt suitably challenged. Thankfully the scientist who was feeding me the algorithms understood them really well and was on hand to explain things to me over and over again until we finally got things working yesterday. Yay!

Some of what we did relied on sparse matrices, something I had heard of, but never used. So my first thought was that I needed a third party library to do these calculations. The library we are currently using is the bluebit .NET matrix library, it's not perfect and we'll have to replace it with something faster, but for the moment it makes the code testable. This matrix was not my first choice, ideally I wanted something we didn't have to pay for. My first stop was the Math.NET Numerics library. This, unfortunately proved to be too slow. I also tried out Extreme Optimization, but this library was also too slow. Other libraries I looked at were ILNumerics, IMSL.NET and Center Space NMath. I looked but I did not test these last three because each library's API and help were so bad I couldn't figure out how to do what I needed to do. I don't have time to figure out matrix maths, this is why I'm looking for a library. If you want me to choose yours, make it easy to use.

So that was the bulk of the outcome of my foray in numerical libraries. Bluebit is my current choice, but I will have to change it for something faster. This is not the only thing I learned. I learned something that I hope, if they haven't already, the Math.NET developers will be able to use in their code. I've not time to dive into the project, and patch it myself — as I've said, my understanding of the maths is not great — so feel free to take the code here and fix it to work in the library.

At work I'm dealing with quite large matrices. The stuff I've been testing with is 8K x 8K points, and the real data will probably be up to 32K x 32K. But these are sparse matrices, so working with them should not be too processor and memory intensive. The major things I need to do are transposition, multiplication and inversion. Inversion is the killer, and understanding it is currently over my head. It's the place where Extreme Optimization fell down, and where bluebit struggles. I need the algorithms to run in a few seconds. Currently, with 16K x 16K points and bluebit, it's taking 2 minutes. The algorithm did not complete with the other two libraries. I waited for over half an hour, and still nothing, and that was with 8K data.

The first problem that Math.NET encountered was with the multiplication of the matrices. This is what I hope I've optimised. All I've done is profile their code and change the bit that took forever - assigning data to a point in the matrix

My first step was to write these two tests, to make sure I was multiplying the matrices correctly:

[Test]
public void MatrixMultiplication()
{
    var leftM = new double[,] {{4, 5, 6, 7, 8, 1, 2}, {3, 9, 6, 7, 3, 3, 1}, {2, 2, 8, 4, 1, 8, 1}, {1, 9, 9, 4, 3, 1, 2}};
    var rightM = new double[,] {{1, 8, 1}, {2, 6, 2}, {3, 4, 1}, {4, 2, 2}, {5, 1, 1}, {6, 3, 2}, {7, 5, 1}};
    var expectedM = new double[,] {{120, 121, 46}, {107, 133, 51}, {106, 98, 40}, {97, 122, 43}};

    var sm = new SparseMatrix();

    var resultM = sm.MultiplyMatrices(leftM, rightM);

    Assert.AreEqual(expectedM.Rank, resultM.Rank);
    Assert.AreEqual(expectedM.GetLength(0), resultM.GetLength(0));
    Assert.AreEqual(expectedM.GetLength(1), resultM.GetLength(1));

    for(int row = 0; row < 4; row++)
    {
        for(int col = 0; col < 3; col++)
        {
            Assert.AreEqual(expectedM[row, col], resultM[row, col]);
        }
    }
}

[Test]
public void SparseMatrixMultiplication()
{
    var leftM = new double[,] {{1,2,3,0,0,0,0,0,0,0}, {0,0,0,0,0,1,2,0,0,0}, {1,0,4,0,0,5,0,0,0,0}, {0,4,0,5,0,6,0,0,7,0}, {9,0,0,0,0,0,8,0,0,0}};
    var rightM = new double[,] {{0,2,0,4,0}, {1,0,0,1,1}, {3,0,1,3,0}, {4,0,0,0,0}, {0,5,6,0,0}, {0,9,0,6,0}, {0,1,0,3,0}, {0,0,8,0,9}, {0,0,0,0,7}, {0,1,0,0,5}};
    var expectedM = new double[,] {{11,2,3,15,2}, {0,11,0,12,0}, {12,47,4,46,0}, {24,54,0,40,53}, {0,26,0,60,0}};
 
    var sm = new SparseMatrix();

    var resultM = bc.MultiplyMatrices(leftM, rightM);

    for (int row = 0; row < 4; row++)
    {
        for (int col = 0; col < 3; col++)
        {
            Assert.AreEqual(expectedM[row, col], resultM[row, col]);
        }
    }
}

(SparseMatrix isn't really the name of the class, I put the multiplication into the class that was handling the algorithm, but I'm not allowed to talk about that!)

Then I spent ages struggling (because of my ignorance, the code is easy to read) with the Math.NET code to try and understand sparse matrix multiplication - how it could be faster than normal matrix multiplication, and how I could implement it faster. It took a couple of days. I spent a couple of days, rather than giving up and finding a proprietary library right away, because I thought that Math.NET would do the business when it came to inversion. Sadly this isn't the case. Anyway, this is my optimised sparse matrix multiplication method:

private IEnumerable GetNonZeroIndicesForMatrixColumn(double[,] matrix, long col, int rowcount)
{
    for (int row = 0; row < rowcount; row++)
    {
        if (matrix[row, col] != 0)
        {
            yield return row;
        }
    }
}

private IEnumerable GetNonZeroIndicesForMatrixRow(double[,] matrix, int row, int colcount)
{
    for (int col = 0; col < colcount; col++)
    {
        if (matrix[row, col] != 0)
        {
            yield return col;
        }
    }
}
        
/// <summary>
/// Matrix multiplication optimised for sparse matrices
/// </summary>
/// <param name="matrix1">Matrix on the left of the multiplication</param>
/// <param name="matrix2">Matrix on the right of the multiplication</param>
/// <returns>A matrix that is the multiplication of the two passed in</returns>
public double[,] MultiplyMatrices(double[,] matrix1, double[,] matrix2)
{
    int j = matrix1.GetLength(1);
    if (j != matrix2.GetLength(0))
    {
        throw new ArgumentException("matrix1 must have the same number of columns as matrix2 has rows.");
    }

    int m1Rows = matrix1.GetLength(0);
    int m2Cols = matrix2.GetLength(1);
    double[,] result = new double[m1Rows, m2Cols];

    var nonZeroRows = new List[m1Rows];

    Parallel.For(0, m1Rows, row =>
    {
        nonZeroRows[row] = GetNonZeroIndicesForMatrixRow(matrix1, row, j).ToList();
    });

    var nonZeroColumns = new List[m2Cols];

    Parallel.For(0, m2Cols, col =>
    {
        nonZeroColumns[col] = GetNonZeroIndicesForMatrixColumn(matrix2, col, j).ToList();
    });



    Parallel.For(0, m1Rows , row =>
    {
        Parallel.For(0, m2Cols, column =>
        {
            var ns = nonZeroColumns[column].Intersect(nonZeroRows[row]);
            double sum = ns.Sum(n => matrix1[row, n] * matrix2[n, column]);
            result[row, column] = sum;
        });
    });

    return result;
}

As you can see, there is a lot of reliance on the parallel methods that come with .NET 4. That, coupled with the trick of getting the intersection of the non-zeros in the rows of the left matrix with the columns of the right matrix, seems to be the major advantage of my method over Math.NET, because their assignments can't be done in parallel. This could be to do with Silverlight compatibility issues, I don't know. I don't have to worry about Silverlight.

I have run a benchmark for my code. I created a 5000 x 5000 point matrix and filled it at random points with random data (well, pseudo-random). I benchmarked at 5, 50, 150 and 500 non-zero items per row. I ran the test 10 times, to get a mean. The table shows the results:

Number of non-zeros per rowMean seconds taken to multiplyStandard Deviation
56.244657160.1037383251
5051.109723320.8521258197
15093.2973362977.751344564
50013.184354116.4991175895

I find it strange that the standard deviation for the 150 condition is so high. If anyone can see a problem in my code, I'd be really happy to hear it! The full test is below:

toggle test code
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

namespace Math.NetBenchmark
{
    class Program
    {
        private static Random _r = new Random();

        static void Main(string[] args)
        {
            const int rows = 5000;
            const int cols = 5000;
            var nonzerosPerRow = new [] {5, 50, 150, 500};

            Console.WriteLine("started");
            using (var sw = new StreamWriter("MyMX10.results"))
            {
                sw.WriteLine("Number of non-zeros,Time taken");
                foreach (var nzpr in nonzerosPerRow)
                {
                    Console.Write(nzpr+" - making left");
                    var left = MakeMatrix(rows, cols, nzpr);
                    Console.Write("making right");
                    var right = MakeMatrix(rows, cols, nzpr);
                    Console.Write("multiplying...");
                    var startTime = DateTime.Now;
                    MultiplyMatrices(left, right);
                    var endTime = DateTime.Now;
                    var diff = endTime - startTime;
                    sw.WriteLine(nzpr + "," + diff.TotalSeconds);
                    Console.WriteLine("done");
                }
            }

            Console.WriteLine("done");
        }

        private static double[,] MakeMatrix(int rows, int cols, int nonzerosPerRow)
        {
            var result = new double[rows, cols];
            var colsPoss = Enumerable.Range(0, cols).ToArray();
            Parallel.For(0, rows, iRow =>
            {
                var posleft = colsPoss;
                Console.Write(".");
                for (int i = 0; i < nonzerosPerRow; i++)
                {
                    int posindex = _r.Next(posleft.Length);
                    int index = posleft[posindex];
                    result[iRow, index] = 1+_r.NextDouble();
                    posleft = posleft.Take(index).Concat(posleft.Skip(index+1)).ToArray();
                }
            });
            return result;
        }

        private static IEnumerable GetNonZeroIndicesForMatrixColumn(double[,] matrix, long col, int rowcount)
        {
            for (int row = 0; row < rowcount; row++)
            {
                if (matrix[row, col] != 0)
                {
                    yield return row;
                }
            }
        }

        private static IEnumerable GetNonZeroIndicesForMatrixRow(double[,] matrix, int row, int colcount)
        {
            for (int col = 0; col < colcount; col++)
            {
                if (matrix[row, col] != 0)
                {
                    yield return col;
                }
            }
        }

        public static double[,] MultiplyMatrices(double[,] matrix1, double[,] matrix2)
        {
            int j = matrix1.GetLength(1);
            if (j != matrix2.GetLength(0))
            {
                throw new ArgumentException("matrix1 must have the same number of columns as matrix2 has rows.");
            }

            int m1Rows = matrix1.GetLength(0);
            int m2Cols = matrix2.GetLength(1);
            double[,] result = new double[m1Rows, m2Cols];

            var nonZeroRows = new List[m1Rows];

            Parallel.For(0, m1Rows, row =>
            {
                nonZeroRows[row] = GetNonZeroIndicesForMatrixRow(matrix1, row, j).ToList();
            });

            var nonZeroColumns = new List[m2Cols];

            Parallel.For(0, m2Cols, col =>
            {
                nonZeroColumns[col] = GetNonZeroIndicesForMatrixColumn(matrix2, col, j).ToList();
            });

            Parallel.For(0, m1Rows, row =>
            {
                Parallel.For(0, m2Cols, column =>
                {
                    var ns = nonZeroColumns[column].Intersect(nonZeroRows[row]);
                    double sum = ns.Sum(n => matrix1[row, n] * matrix2[n, column]);
                    result[row, column] = sum;
                });
            });

            return result;
        }
    }
}

Saturday, August 28, 2010

Learning to see patterns in my own behaviour

So, a week and a half ago I was looking at a question on Stack Overflow (Algorithm to calculate the number of combinations to form 100 ). I set about solving it in Haskell, and came up against a block to my success:

Given a list of numbers xs and another number n, generate a list of all the possible combinations lists of length n that contain the numbers from xs.

So, given the list [1,2] and the number 3, the function should generate this list of lists: [[1,1,1],[1,1,2],[1,2,1],[1,2,2],[2,1,1],[2,1,2],[2,2,1],[2,2,2]]

I was pretty sure that this had been done before, but because I'm trying to get better at deducing algorithms, I'm stubborn, and I'm doing this for fun I decided to figure out the algorithm for myself.

It wasn't as easy as it seemed.

I sat down and wrote out the outputs for a few different sets of inputs, I looked at them, I looked some more. I could see a couple of patterns, namely that (length of xs)n is the length of the final output and that you could create a rectangle of answers with width length of xs and height (length of xs - 1)n. Neither of these were helpful.

I left the problem alone for a while, hoping that time would give me perspective. I was surprised how hard I was finding it to find the pattern.

Today I came back to it with a fresh brain and time to kill. I took a walk to the park, sat down, started to write out the output where the input is a list of length 3, and n as 3. As I was writing, I had the realisation that the way to solve this was to figure out the algorithm of how to write it down. The problem in my previous examples of output was that I hadn't written it in a good enough pattern. I started writing out the output for a different input a list of length 4, with n of 4 (256 items, for those keeping count). This time I was very systematic about how I wrote out the output. I got to the 44th list in the list and stopped to see if I could see it yet. I could: the last element in the individual lists was repeating every 4 items.

I stood up and, as is my wont when I am thinking, I started pacing. I must have looked a little unhinged, as I was pacing in a small circle around my bag.

It took me a few minutes, but eventually I figured out how to represent what I was seeing in my written output as an algorithm: the first time through, each item of xs is appended to an empty list, for each subsequent time through, each item in xs is appended to each list in the list of lists.

In Haskell, I came up with this function to do the work:

makeallsets :: Integral a => [a] -> a -> [[a]] makeallsets xs n = mas (addtoonelist [] xs) xs (n - 1) where mas yss _ 0 = yss mas yss xs (n + 1) = mas (addtoeachlist yss xs) xs n where addtoeachlist [] xs = [] addtoeachlist (ys:yss) xs = (addtoonelist ys xs) ++ (addtoeachlist yss xs) addtoonelist ys [] = [] addtoonelist ys (x:xs) = (x : ys) : (addtoonelist ys xs)

This allowed me to create an answer to the Stack Overflow problem. (Although there's no point posting it for 3 very good reasons: 1. it's not in the target language (which is Scala); 2. It uses the brute force approach; 3. There is already a better answer.)

Score 1 for perseverance!

P.s. if anyone would like to show me a better way, I'd be very glad to hear it.