Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Arrow to further speed up raw data I/O #189

Open
2 of 5 tasks
ghiggi opened this issue Jun 6, 2023 · 0 comments
Open
2 of 5 tasks

Using Arrow to further speed up raw data I/O #189

ghiggi opened this issue Jun 6, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@ghiggi
Copy link
Collaborator

ghiggi commented Jun 6, 2023

Prework

  • Read and agree to the code of conduct.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
  • Runnable
  • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.

Description

Evaluate the benefits of using:

  • the engine="arrow" in read.csv to read the raw data using multithreading,
  • the arrow dtype backend introduced in pandas 2.0 to decrease the memory usage of string columns in pd.DataFrame

Please describe the performance issue.

Benchmarks

How poorly does DISDRODB perform?

@ghiggi ghiggi self-assigned this Jun 6, 2023
@ghiggi ghiggi changed the title Using arrow to further speed up raw data I/O [ENHANCEMENT] Using arrow to further speed up raw data I/O Jun 6, 2023
@ghiggi ghiggi added the enhancement New feature or request label Nov 1, 2023
@ghiggi ghiggi changed the title [ENHANCEMENT] Using arrow to further speed up raw data I/O Using Arrow to further speed up raw data I/O Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant