Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance loss when matrix size becomes a significant fraction of physical RAM #5

Open
alugowski opened this issue Jul 13, 2023 · 1 comment

Comments

@alugowski
Copy link

I've been benchmarking a few matrix loaders, and noticed a performance degradation on PIGO if the matrix being loaded is a large fraction of the available RAM.

See https://github.com/alugowski/sparse-matrix-io-comparison

The machine has 16GiB RAM (it's a laptop). The 1GiB file shows amazing read and write performance from PIGO, but the 10GiB file is about an order of magnitude slower in each. While experimenting I noticed that the performance drop is gradual and dependent on memory fraction. So an 8GiB file shows less degradation than 10GiB, but more than 6 GiB.

(The generated MatrixMarket files and code are defined such that the filesize is roughly equal to the memory requirement of the matrix)

I noticed the PIGO paper used a 1TB machine to load at most ~30GiB files, so this may or may not be important.

@alugowski
Copy link
Author

My first suspicion was that PIGO reads the mmaped region twice, but with a pattern such that if the entire thing does not fit in RAM then the second pass will not find its data in cache from the first pass.

There must be more going on, though, because that would explain a 2x drop, not a ~10x one. Perhaps OS caching methods come into play as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant