Understanding Sparse Matrix Storage

Sparse matrices are powerful tools that can be used to save memory when processing and storing matrices that contain a lot of zero values. Within a sparse matrix, only the non-zero values are stored which can mean significant memory savings. For example, a 501-by-501 tri-diagonal matrix contains 251,001 values as a full matrix while it only contains 1,500 non-zero values when stored as sparse. There is some overhead to the sparse storage format but in general it is small relative to the savings of not storing the zero values.1

3-Tuple storage

>> sparse(diag(0:3))
ans =

   (2,2)        1
   (3,3)        2
   (4,4)        3

As is evident from the display format for a sparse matrix, the non-zero values in the matrix are stored as (i,j,value) tuples. The zero elements are implied — anything not stored is treated as a zero.

The find function and the sparse function work directly with these tuples

  • [i,j,v] = find(S) returns the truple row, column and value values as separate arrays.
  • S = sparse(i,j,v) reconstructs the sparse array from the tuple values.

and can be used to switch back and forth between the tuple and the sparse matrix. The tuple format is a particularly compact way of representing sparse data and can be easily written to files that can be read by MATLAB and other data analysis packages.

Column oriented storage

The sparse storage format is optimized for column-oriented operations which can be important to understand when crafting m-files that use the sparse matrix. This can be especially true when working with huge matrices where the orientation of a matrix has a big effect on processing speed or memory requirements. The following figure illustrates the sparse storage approach used within MATLAB.

MATLAB sparse matrix storage scheme

A column spine contains pointers to the elements in each column. In addition, the elements in the column are stored in row sorted order to improve the search time for a particular (row,col) element. Columns without elements contain a pointer to an empty column list.

This has several implications when working with the sparse matrix.

  1. The column size of the matrix size(S,2) is significant since a column pointer is created for a column even if there are no elements in that column.
  2. The row size of the matrix size(S,1) has no impact on the amount of memory used by the sparse matrix.
  3. Rows that contain only zeros take up no space.
  4. Accessing the columns of a sparse matrix is faster than accessing the rows of a sparse matrix.

The ramifications of these implications are that we should prefer algorithms that operate on the columns of sparse matrices and that orient the matrix so that there are less columns than rows.

1You can use the whos command to determine the actual memory allocated to a sparse matrix.