Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package/module structure, code style, performance - maybe go for version 2 #46

Open
MothNik opened this issue Feb 1, 2024 · 5 comments

Comments

@MothNik
Copy link

MothNik commented Feb 1, 2024

Hello,

As a developer for infrared spectroscopy software, I really like to work with the functionality that HITRAN offers (Perkin Elmer even built a patent with it for water vapour and CO2-correction of IR-spectra), but from time to time one has to look into the code which tries to summarize everything in a single .py-file (global constants, data-specific definitions, calculation routines, a 10 minutes Python tutorial). This fact makes it hard to understand/maintain and also increases the likelihood of errors a lot (there are still a some Pull requests and issues open regarding thins).

Before tackling the issues, I think a first step could be a version 2.0 that actually makes more use of Python built-in functionalities like

  • using of Python conventions for global constans (UPPER_CASE_CONSTANT)
  • making more use of Enums and dataclasses (or even pydantic models) rather than relying on global-constants-dict-combinations which are hard to track back in scope
  • not checking types with type(var) == some_type but rather with isinstance
  • splitting the package up into modules like (models for the global constants and data models involved, database_io for reading from the database, data for the hard-coded data in the module, lineshapes for the computation of lineshapes, environment for the environment specification part, misc for something like the tutorial ..., just a first example structure I could think of)

Besides, the code contains some parts that could be improved

  • black formatting
  • linting (ruff) which would uncover something like Basic coding errors in HAPI.py: calculate_parameter_NuVC #37 like a charm
  • adding type hints
    these 3 steps would not require a lot of effort, but make the code so much more joy to read and maintain
  • not using Python-loops in heavy-duty numerics, e.g., switching to numba, cython, or even rust with multiprocessing/threading could reduce the computation time quite a bit (this is currently a limiting factor for me)
  • in some parts scipy could also be beneficial as dependency that most developers in that field have installed anyway
  • adding automated tests with pytest (how is the module currently tested for correctness?) that could then be run whenever somebody makes a push or pull request
  • ...

Then, it would be way easier to incorporate the changes required to resolve open issues/pull requests in less time.

Please don't see this as criticism, I would like to join as a contributor to this package and help 😄
However, I see a lot of open issues and pull requests and therefore wanted to ask if the project is still active and if there is interest in such big changes.

Thanks for your time.

@MothNik MothNik changed the title Module structure, code style, performance - maybe go for version 2 Package/module structure, code style, performance - maybe go for version 2 Feb 1, 2024
@jmmelko
Copy link

jmmelko commented Apr 16, 2024

Personally I totally agree with you (I was chocked, almost choked, by the non-Pythonic nature of hapi.py the first time I saw it) except that:

  • there is a HAPI2 (or HAPIest I can’t remember) around, so I am not sure which version to support
  • code modularity must not be done at the expense of speed (but maybe using numpy functions would balance the loss of speed due to calling modules)

@erwanp
Copy link

erwanp commented Apr 16, 2024

By the way, waiting for HAPI2; we have an object-oriented, automatically-tested, HITRAN/ExoMol LBL code at https://github.com/radis/radis from which some structuration-ideas could be taken for HAPI2 (that's the point of open source)

@MothNik
Copy link
Author

MothNik commented Apr 16, 2024

@jmmelko I think the overhead from calling a module becomes negligible given that line-by-line computations of the spectra can take some time, but that's more of a detail I would say.
You probably refer to hapi2 where it is stated that the line profile computations are more efficient due to just-in-time-compilation.
Now, I'm quite confused 😵 because Hitran Online points to to hapi and not hapi2.

@erwanp Thanks! I will have a definitely have a look into it! Especially the GPU acceleration looks nice 😸

@MothNik
Copy link
Author

MothNik commented Apr 16, 2024

If hapi is still an interface to use for many users, maybe a "facelift" would still be nice?
Because for hapi2, the last commit was mid of 2023 as well.

@jmmelko
Copy link

jmmelko commented Apr 17, 2024

@MothNik yes, the fact that Hitran Online still uses HAPI is very disturbing.
@erwanp radis looks cool, I will definitely try it when I have some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants