Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: Dynamax: A Python package for probabilistic state space modeling with JAX #7069

Open
editorialbot opened this issue Aug 4, 2024 · 26 comments
Assignees
Labels
review Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Aug 4, 2024

Submitting author: @slinderman (Scott Linderman)
Repository: https://github.com/probml/dynamax
Branch with paper.md (empty if default branch): paper
Version: v0.1.4
Editor: @osorensen
Reviewers: @thomaspinder, @gdalle
Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e"><img src="https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e/status.svg)](https://joss.theoj.org/papers/aaec4098e71833c94a74dbe1ff785d9e)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@thomaspinder & @gdalle, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @osorensen know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @gdalle

📝 Checklist for @thomaspinder

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=0.15 s (764.3 files/s, 298530.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          67           3037           4303           8814
Jupyter Notebook                30              0          24675           3907
Markdown                        10             99              0            282
TeX                              1             25              0            199
reStructuredText                 2            152            208            153
YAML                             5             20              7            147
DOS Batch                        1              8              1             26
make                             1              4              7              9
TOML                             1              1              0              4
-------------------------------------------------------------------------------
SUM:                           118           3346          29201          13541
-------------------------------------------------------------------------------

Commit count by author:

   344	Scott Linderman
   178	Peter G. Chang
   135	xinglong
   131	Kevin P Murphy
    83	karalleyna
    79	gileshd
    69	Gerardo Duran-Martin
    60	petergchang
    26	Caleb Weinreb
    18	libby
    14	kostastsa
    13	slinderman
    10	Elizabeth DuPre
     8	Kevin Murphy
     6	andrewwarrington
     6	davidzoltowski
     6	patel-zeel
     4	Ravin Kumar
     3	Aleyna Kara
     2	Yixiu Zhao
     2	Zeel B Patel
     2	partev
     1	Collin Schlager
     1	Jake VanderPlas
     1	Jason Davies
     1	RaulPL
     1	Xinglong
     1	Xinglong Li
     1	xinglong-li

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 960

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

✅ License found: MIT License (Valid open source OSI approved license)

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1146/annurev-neuro-092619-094115 is OK
- 10.1017/CBO9781139344203 is OK
- 10.48550/arXiv.2305.16543 is OK
- 10.48550/arXiv.2306.03291 is OK
- 10.48550/arXiv.2305.19535 is OK
- 10.1038/s41592-024-02318-2 is OK
- 10.25080/majora-92bf1922-011 is OK
- 10.1017/cbo9780511790492 is OK
- 10.1016/j.tree.2007.10.009 is OK
- 10.1198/073500102753410408 is OK
- 10.3402/tellusa.v56i5.14462 is OK
- 10.1145/355656.355657 is OK
- 10.1109/TAC.2020.2976316 is OK
- 10.1109/TSP.2021.3103338 is OK

MISSING DOIs

- No DOI given, and none found for title: Probabilistic Machine Learning: Advanced Topics
- No DOI given, and none found for title: JAX: composable transformations of Python+NumPy pr...
- No DOI given, and none found for title: PyHSMM: Bayesian inference in HSMMs and HMMs
- No DOI given, and none found for title: Code Companion for Bayesian Filtering and Smoothin...
- No DOI given, and none found for title: SSM: Bayesian Learning and Inference for State Spa...
- No DOI given, and none found for title: JSL: JAX State-Space models (SSM) Library
- No DOI given, and none found for title: hmmlearn
- No DOI given, and none found for title: Structural Time Series (STS) in JAX

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@gdalle
Copy link

gdalle commented Aug 4, 2024

Review checklist for @gdalle

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/probml/dynamax?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@slinderman) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@thomaspinder
Copy link

thomaspinder commented Aug 5, 2024

Review checklist for @thomaspinder

Conflict of interest

  • I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the https://github.com/probml/dynamax?
  • License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
  • Contribution and authorship: Has the submitting author (@slinderman) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
  • Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
  • Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
  • Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

@osorensen
Copy link
Member

👋 @gdalle, @thomaspinder, could you please update us on how it's going with your reviews?

@thomaspinder
Copy link

@osorensen expecting to be done by September 1st.

@gdalle
Copy link

gdalle commented Aug 23, 2024

Haven't started it yet so I will probably need two more weeks.

@thomaspinder
Copy link

Dynamax Review

Firstly, I'd like to congratulate the authors and contributors of Dynamax for creating a nicely designed package in JAX. Dynamax enables practitioners to easily fit state-space models (SSM), whilst simultaneously allowing researchers to implement their own custom SSM approaches. The repo's corresponding paper is well written and provides a clear and concise summary of the paper. Finally, I enjoyed reading the documentation of Dynamax; the large number of examples and use-cases is great for new users of the package.

I have broken my review up into two sections. The first section lists the blocking issues I have that prevent me from being able to mark each item of the reviewer's checklist as complete. In the second section, I suggest some things that I would advise the authors to do in Dynamax; however, I do not consider these as blocking concerns.

Significant issues

  • Failing tests
  • Docstring coverage
  •  Incomplete contribution guidelines
  • Missing typing

Failing tests

For me, running

git clone [email protected]:probml/dynamax.git
cd dynamax
pip install -e '.[dev]'
pytest dynamax

as per the docs, threw errors on a Mac M1. I was using Python 3.10 and a fresh virtual environment.

On this topic though, I see that you're Github testing workflow uses a pinned Python version and machine. It could be good to run your tests on all supported Python versions and a Mac and Linux machine. I have opened a PR with a suggestion that reflects this comment.

Docstring coverage

The docstrings within Dynamax are inconsistent. Taking, for example, dynamax.generalized_gaussian_ssm.inference as an example file, there are no docstrings for the methods or arguments of the public classes, yet there are comprehensive docstrings for the private functions. This is unconventional, and I would encourage the authors to more widely document the classes, methods, and functions of Dynamax so as to make the package easier for people to use. Additionally, typing such as the below from dynamax.generalized_gaussian_ssm.models is not particularly descriptive.

:param initial_mean: $m$
:param initial_covariance: $S$

It would be good to see more comprehensive documentation. To rigorously test documentation thresholds in the future, you may consider using interrogate. This will measure the number of objects for which a docstring is supplied and raise errors/warnings when the coverage decreases. I have opened a suggestive PR for how this can be achieved.

33.1% of classes/methods/functions are covered by tests. It would be good to see this increased. This was calculated with interrogate dynamax.

Broken documentation

The documentation is broken and many notebooks do not render - Example. My suggestion would be to use something like nbsphinx to execute the notebooks each time the documentation is built, and to fail when a notebook does not execute top-to-bottom.

### Incomplete contribution guidelines

Your contribution guidelines are incomplete. To push a change, one would need to add and commit the change before pushing - this is missing from the document.

### Missing typing

The package has no typing. Additionally, many functions use variable names which are very hard to interpret e.g., def _predict(m, P, f, Q, u, g_ev, g_cov) in dynamax.generalized_gaussian_ssm.inference. The combination of no typing and terse variable naming makes the code challenging to read, particularly for new users and/or those new to SSMs. I would strongly encourage the authors to either use more descriptive variable names, or add typing to their code. JaxTyping is excellent for this.

Incorrect typing

Some of the typing used is incorrectly done. Take ParamsGGSSM.initial_mean in dynamax.generalized_gaussian_ssm.models as an example. In this function, the typing on initial_mean is incorrect, as outlined here in JaxTyping. A change to the effect of

- initial_mean: Float[Array, "state_dim"]
+ initial_mean: Float[Array, " state_dim"]

should be made.

Suggested improvements

  •  Large repo size
  • Tighter Python bounds
  • Package management

### Large repo size

You have some very large git pack files:

❯ du -ah dynamax | sort -rh | head -n 10

224M	dynamax
209M	dynamax/.git/objects/pack
209M	dynamax/.git/objects
209M	dynamax/.git
208M	dynamax/.git/objects/pack/pack-aaa194639683ed252462b75ee0ec2203edc5c09f.pack

You may consider running git gc to clean these up unless you need them.

### Tighter Python bounds

I would suggest tighter Python bounds. Currently your lower bound is 3.6; a version which is considered end-of-life by the Python Org.: https://devguide.python.org/versions/#unsupported-versions
As an aside, I appreciate that it's specified in your setup.py, but explicitly stating the supported Python version(s) in your project's README is nice.

Package management

It is strongly advised to migrate setup.py and setup.cfg files to a pyproject.toml. I would encourage the authors to do this in order to future-proof Dynamax.

@osorensen
Copy link
Member

Thanks a lot for your thorough review, @thomaspinder. @slinderman, you're welcome to start addressing the issues whenever you like.

@slinderman
Copy link

Thank you, @osorensen and @thomaspinder! We will start working on these issues and suggestions ASAP, and I'll keep you posted.

@gdalle
Copy link

gdalle commented Sep 12, 2024

Meanwhile I'm making my way through my own review. I will gather my remarks about code and documentation in two issues on the dynamax repo:

I might be limited by my very ancient Python skills for installation and testing of dynamax, so I'd appreciate a hand in figuring out the errors I observe.

As for the paper itself, it is very clear and does a good job of introducing dynamax. I have three minor suggestions:

Dynamax supports canonical SSMs and allows the user to construct bespoke models as needed.

Can you give more details on how this is implemented API-wise? For instance, how generic can observation distributions be?
This question was a major motivation for my own take on HMMs, coded in Julia (https://joss.theoj.org/papers/10.21105/joss.06436). Hopefully I didn't misrepresent dynamax in the state of the art there.

Dynamax provides a unique combination of low-level inference algorithms and high-level modeling objects that can support a wide range of research applications in JAX.

Would it be possible to provide concrete examples of what is missing from the previous libraries? Obviously JAX support is a big aspect, since I know that some of them are coded in Numpy (hmmlearn) or PyTorch (pomegranate)

Parallel message passing routines that leverage GPU or TPU acceleration to perform message passing in sublinear time.

What do you mean by sublinear time? Isn't it just (roughly speaking) total sequential time divided by amount of parallelism?

@gdalle
Copy link

gdalle commented Sep 13, 2024

I'm halfway through the documentation and a significant number of code examples are broken by the Numpy 2.0 release (about 1 per page). I think it would be a good idea to fix them before I make a second (complete) pass.

@osorensen
Copy link
Member

@slinderman, ref the post by @gdalle, could you please follow up and ping us here when done?

@slinderman
Copy link

Hi @osorensen, thanks for the reminder. @gdalle, sorry for the rendering issue. I thought we had fixed this by pinning to Numpy<2.0, but somehow this slipped through the cracks. I will fix it and let you know when it's ready for another pass.

@osorensen
Copy link
Member

@slinderman any updates on this?

@slinderman
Copy link

slinderman commented Oct 28, 2024

Hi @osorensen, sorry for the delay. We pinned Numpy < 2.0 and recreated the documentation, and now almost all of the rendering issues should be fixed. There is a known issue with the HMC notebook (see probml/dynamax#384), but the rest look good.

Regarding the overall review progress, we have begun work on the feedback we received from @thomaspinder. It is taking more time than anticipated to fix the typing issues, but we aim to wrap that up in the coming weeks. We greatly appreciate the reviewers' feedback, and we appreciate their patience while we work to address their comments.

@osorensen
Copy link
Member

Thanks for the update, @slinderman

@gdalle
Copy link

gdalle commented Nov 1, 2024

@slinderman thanks for the fixes. Can you tag a new release containing them, so that running the documentation examples does not require cloning the repo?

@slinderman
Copy link

Hi @gdalle, of course. Please see https://github.com/probml/dynamax/releases/tag/0.1.5, which is now available on PyPI as well.

@gdalle
Copy link

gdalle commented Nov 8, 2024

Thank you for fixing the versions, I am now able to install and run tests smoothly.

I have made a second pass on the documentation1 and now that they are actually working, your tutorials show a huge effort in terms of coding and visualization! As stated in the issue, I feel like the user experience could be further improved with two additions:

  • a general overview of the package and its components (in order to not have to dig into the full API reference when I'm wondering how to do something)
  • verbose explanations in each tutorial (with words and not just code) + better axis labeling on the plots

Of course I also have more specific remarks which are listed in the issues below:

Do you think such changes are reasonable requests? I know that this review has been going on for a while but I really think this will make the package much more user-friendly.

Footnotes

  1. I only skipped the last section (Generalized Gaussian SSMs) and the API reference.

@osorensen
Copy link
Member

Thanks a lot for your review, @gdalle.

Do you think such changes are reasonable requests?

As editor, I suggest @slinderman adresses as much as possible of the issues raised here. If there are certain things which may be too much work or out of scope, please let us know in this thread, and we can discuss it.

@slinderman
Copy link

Thank you @osorensen and @gdalle! We are starting to work on these issues and hope to have a response in a couple weeks. We greatly appreciate your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Track: 5 (DSAIS) Data Science, Artificial Intelligence, and Machine Learning
Projects
None yet
Development

No branches or pull requests

5 participants