Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post on UseGalaxy.eu's rustus migration #2056

Merged
merged 8 commits into from
Jun 28, 2023
Merged

Blog post on UseGalaxy.eu's rustus migration #2056

merged 8 commits into from
Jun 28, 2023

Conversation

kysrpex
Copy link
Collaborator

@kysrpex kysrpex commented Jun 23, 2023

A blog post on the recent migration from tusd to rustus on UseGalaxy.eu and the reasons behind.

The blog post is still a draft solely because we should find a good source that explains why storing tens of thousands of files in the same directory is disadvantageous.

Given the high volume of uploads on our server, this is advantageous for scaling our service, because we can avoid storing tens of thousands of files in the same directory.

A table that compares the features of tusd and rustus has also been included in the PR even though it will not be displayed in the blog post (just for our own reference).

@kysrpex kysrpex added the blog label Jun 23, 2023
@kysrpex kysrpex self-assigned this Jun 23, 2023
@kysrpex kysrpex changed the title Blog post on rustus migration Blog post on UseGalaxy.eu's rustus migration Jun 23, 2023
@bgruening
Copy link
Member

A table that compares the features of tusd and rustus has also been included in the PR even though it will not be displayed in the blog post (just for our own reference).

I guess there is nothing wrong to put this into the blog post as well.

@kysrpex
Copy link
Collaborator Author

kysrpex commented Jun 26, 2023

I actually do not see the reason why having hundreds of thousands of files is inherently bad. I have been checking different sources and found nothing that backs up this general claim.

As far as I understood, old filesystems had this problem but modern filesystems use B or B+ trees [1], so this becomes a non-issue. There are even people that have benchmarked flat and deep directory structures [2] and the outcome is that, given that you know the path of the file to be read and/or written, the deep directory structure is even counterproductive.

I can understand that for different applications that are not exclusively simple read or write operations, like listing the contents of a directory, removing old uploads, and others definitely there is an advantage in having the directory structure.

I lack the knowledge of how we are cleaning up the old uploads at the moment, so I can neither claim that the deep directory structure an advantage nor a disadvantage for us. Information is welcome.

[1] - What are the performance implications for millions of files in a modern file system? - https://serverfault.com/a/796696
[2] - Benchmark: Deep directory structure vs. flat directory structure to store millions of files on ext4 - https://medium.com/@hartator/benchmark-deep-directory-structure-vs-flat-directory-structure-to-store-millions-of-files-on-ext4-cac1000ca28

@kysrpex
Copy link
Collaborator Author

kysrpex commented Jun 27, 2023

Information is welcome.

Thanks @bgruening and @sj213 for the help!

@kysrpex kysrpex marked this pull request as ready for review June 27, 2023 12:42
@kysrpex kysrpex requested a review from bgruening June 27, 2023 12:44
@bgruening bgruening merged commit b2d791c into galaxyproject:master Jun 28, 2023
@bgruening
Copy link
Member

Cool, thanks.

@kysrpex kysrpex deleted the blog-post-rustus-migration branch June 28, 2023 11:11
paulzierep pushed a commit to paulzierep/galaxy-hub that referenced this pull request Jul 26, 2023
* Draft blog post on rustus migration

* Update comparison table

* Add detailed explanation on how avoiding to store all uploads in the same folder helps with scalability

* Update blog post date

* Fix typo

* Fix typo

* Add table comparing rustus and tusd features as a drop-down

* Update blog post date
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants