-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only extract a specified namelist #39
Comments
Yes, a good idea I think, but I wonder how to do it in an effective way. Most of the time seems to be spent parsing and constructing the tokens (via It would let you exit immediately after reading the namelist, rather than going through the whole file, which might help in some cases. A more intelligent tokenizer (#30) might be a way forward here. Or maybe it's time to just dump the entire namelist (or file) into memory and dice it up into pieces. (Maybe I should have done that from the beginning...) |
Anything to speed things up has my support. Unfortunately it's only moral support right now. :) |
FYI: I have a simple experiment related to this here. I was testing splitting up a file of multiple namelists into chunks, reading them separately with multiprocessing, and then stitching the results back together at the end. Even with only one thread, it's still faster than a default read. Also related to #30. |
Thanks, useful info! Splitting the namelists would generally be more difficult, but it shows there's value in splitting up the work. At the least, splitting the namelist into groups before parsing them individually is probably a better approach. I think that I do something like this in the new parser (which has lagged unfortunately) but will make it a priority when I get back to it. BTW I'm in the process of relocating my family to a new job overseas, so no idea when I'll get time to think about this. |
FYI: I noticed something else. If the keys contain array notation, the parsing is dramatically slower. See the examples here. 'files/test.nml' # 112 namelists -- short keys [8 sec]
'files/test4b.nml' # 112 namelists -- longer keys no arrays [9 sec]
'files/test4c.nml' # 112 namelists -- longer keys w/ types [12 sec]
'files/test4.nml' # 112 namelists -- longer keys w/ array [42 sec] This is killing me since all my namelists have many arrays. :) |
Idea... Say there was a file with multiple namelists, and you only wanted to read a specified one. Maybe this could be an option, where you specify the one you want, rather than having to parse the whole file. (related to #30 if the file is very large and the parsing is a bottleneck if you only want certain info from the file).
The text was updated successfully, but these errors were encountered: