Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add tutorials about ragged tensors. #823

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

csukuangfj
Copy link
Collaborator

No description provided.

@csukuangfj
Copy link
Collaborator Author

A preview can be found at

https://csukuangfj.github.io/k2/python_tutorials/ragged/basics.html#

@danpovey
Copy link
Collaborator

danpovey commented Sep 11, 2021 via email

@GNroy
Copy link

GNroy commented Sep 13, 2021

@csukuangfj Thanks for this tutorial!
Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.

@csukuangfj
Copy link
Collaborator Author

@csukuangfj Thanks for this tutorial!
Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.

TensorFlow has sparse matrices and ragged tensors, see

PyTorch also has sparse matrices and nested tensors, see

We use the same terminology, i.e., row splits, row ids, etc, as the one used in tf.RaggedTensor, though ragged tensors in k2 were designed by @danpovey independently. We were later told that TensorFlow was using the same ideas.


A ragged tensor with 2 axes looks similar to a sparse matrix in CSR format, but they are different.

From https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) , a sparse matrix in CSR format has the following components:

  • ROW_INDEX
  • COL_INDEX
  • V

The ROW_INDEX is called row_splits in k2 and V is called values in k2. That's why I said a ragged tensor in k2
shares some similarities with sparse matrices.

However, there is no COL_INDEX in ragged tensors. We are not viewing a ragged tensor as a ragged matrix.
For a ragged tensor of 2 axes, what we care about is the number of elements in each row, we don't assign a column index to entries in a row.

PyTorch's sparse matrices use COO format. But anyway, they are still matrices with row indexes and column indexes.


Also, ragged tensors in k2 are not designed for linear algebra operations, i.e., there are no matrix-vector or matrix-matrix multiplications. Instead, they are designed for efficiently manipulating irregular data structures on GPU.

@GNroy
Copy link

GNroy commented Sep 14, 2021

Many thanks for the clarification!

A humble suggestion: you might consider including this information in the tutorial because I am hardly the last person to ask questions like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants