-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Saving a file index to avoid recomputing each time #4
Comments
would you recommend pickling the index, possibly in the same directory as the file? a possible file extension could be "*.idx". |
@carvetighter Pickling could work. Is there some fact about the index structure that could let it be compressed more? |
@zjijz I don't know about compressing the index. It was just an idea. Do you want to access the index information quickly? I'm looking at the linereader code and it's interesting how he counts the lines and makes every line the same length by padding with spaces at the end in the index file. It's always hard reading someone else's code. I don't understand why he is doing some things. Like the index file which in an integer than a lot of spaces after (e.g. '32 ...a bunch of spaces... \n'). It just seems odd to me. If you pickle the index then you can just load it and use it easily. |
A pull request would be well received. If neither of you have the time for it, maybe I can put something basic together. |
Hey, sorry about the delay. I was working on a school project that would use this feature but the class ended and some other workloads piled up. Do you have a date you would want a version of this done by? |
You requested it, so you tell me! |
@jegesh Could we add the ability for a file index to be saved to disk to avoid indexing each time a file is opened? I've been using this package for machine learning batches, and indexing the file each training run has been noticeable.
A similar package called linereader saves the index automatically.
I can help implement this too.
The text was updated successfully, but these errors were encountered: