Efficient in-place sparse matrix transpose [Taken]

A sparse matrix is a two-dimensional array where most of the elements are zero. If a very large sparse matrix is stored in normal array format, a huge amount of memory will be needed to store values that are almost all zero. Instead, large sparse matrices are normally stored in a compacted form, where only the non-zero values are stored. Coordinate format is a very simple representation of a sparse matrix, where each non-zero value is stored alongside its row and column index. Perhaps the most widely-used sparse matrix representation is compressed sparse row (CSR), where non-zero values from the same row are grouped together, and each non-zero value is stored with its corresponding column index.

A fundamental operation on matrices is transpose, where the row and column indices of each non-zero value are swapped. The transpose of a sparse matrix in CSR format can be computed extremely efficiently if it is computed out-of-place, that is the values are copied from the original matrix to a new matrix in a different memory location. However, sparse matrices that are used to solve systems of equations are often extremely large, and there is often not enough memory for both the original and transposed copy. There are big advantages of any approach that can do an in-place transpose, that is transpose the matrix within the memory of the original sparse matrix.

For an N x N sparse matrix in CSR format with Z non-zero values, Saad’s algorithm can compute an in-place sparse matrix transpose in O(Z + N) time, and using O(Z + N) additional temporary space. Crosbie and Gregg’s algorithm computes the in-place transpose in O(Z + N) time and requires O(N) additional temporary space. The goal of the project is to evaluate algorithmic improvements for the in-place transpose that reduce the execution time.

A key aspect of this project is measuring the practical reductions in execution time due to algorithmic improvements. It is therefore important that the project be implemented in a compiled programming language where there is a relatively simple relationship from source code to execution time. The obvious choices of language are C or C++, but a compiled language such as Rust might also work.