-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor (library installation fixes and new static site generator) #247
Conversation
@adarshp @Free-Quarks @cl4yton @vincentraymond-ua , example refactoring in #233 quietly broke Rust tests by removing gromet JSON from the repository. I suspect this has to do with the triggers for the steps:
- name: test_rust_components
image: rust:1.67
commands:
- apt update && apt install -y cmake openssl libclang-13-dev
- cd skema/skema-rs
- cargo test --verbose --all
when:
paths:
- .drone.yml
- skema/skema-rs/** This should probably include steps:
- name: test_rust_components
image: rust:1.67
commands:
- apt update && apt install -y cmake openssl libclang-13-dev
- cd skema/skema-rs
- cargo test --verbose --all
when:
paths:
- .drone.yml
- skema/skema-rs/**
- data/gromet/** The motivation for removing this was to minimize committed gromet JSON, as updates to gromet will cause large diffs and bloat the repo's history. Perhaps we can consider generating the gromet JSON at test time? Maybe we can discuss this after the team meeting on 6/20. |
Thanks @myedibleenso for taking on this large refactoring! A few comments/questions:
|
Re: generating GroMET JSON at test time, I can see both pros and cons. I won't be able to attend the 6/20 meeting due to a prior conflict, but I'll go with whatever route is chosen. Pros:
Cons:
|
name="skema" | ||
authors=[ | ||
name = "skema" | ||
authors = [ | ||
{name="Enrique Noriega", email="[email protected]"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myedibleenso We should add Keith here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kwalcock , if you'd like to be added to the list of authors in this PR, please post your preferred email here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking of me. If it isn't specifically just for the Python code, which I haven't worked on, then it would be OK. LUM would probably like to have it be [email protected]. If it is, then credit to where it is due, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kwalcock is being too humble. Of course he should be listed
pyproject.toml
Outdated
"skema.program_analysis" = "skema/program_analysis" | ||
"skema.skema_py" = "skema/skema_py" | ||
# re-map skema/text_reading/python to skema.text_reading | ||
"skema.text_reading" = "skema/text_reading/python" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myedibleenso : Let's keep in mind that the python code that lives under mention_linking
will be deprecated in favor of model_linker
introduced in #246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enoriega , if you want to open a PR for this branch that replaces that code, feel free to pull these changes into another branch and do so. Otherwise we can take care of it in a subsequent PR.
@adarshp , thanks for your feedback. multiple .dockerignore files
Ah, thanks for reminding me to include a section for docker builds in the documentation. The weird naming convention relates to a somewhat obscure docker feature that has been around for a bit now. I can't recall where I first learned about it, but here is a comment that may have first announced it: moby/moby#12886 (comment) This is a feature supported by Buildkit (the default builder for Docker Desktop and Docker Engine since v23). By specifying a Dockerfile with Most newer Docker installs support this out of the box, but here is an example for explicitly invoking use of buildkit (which includes this feature): DOCKER_BUILDKIT=1 docker build -f "Dockerfile.skema-py" -t "lumai/askem-skema-py:test" . You can verify the new dockerignore is being applied here by building locally using this branch with the following command: DOCKER_BUILDKIT=1 docker build --progress=plain --no-cache -f "Dockerfile.skema-py" -t "lumai/askem-skema-py:test" . (I left a More info on enabling buildkit: https://docs.docker.com/build/buildkit/#getting-started Python version
Looking forward to getting out of that corner! Rust dependency and
|
@myedibleenso - thanks for clarifying about the multiple .dockerignore files and the graspologic issue! |
Hi @myedibleenso and @adarshp As for the gromet JSON's in the repo, I would not be opposed to generating them during test time, however my rust components that are being tested on the JSON's often lag behind in version support from the code2fn module, so if we always generate with the most recent version of code2fn, it would likely cause test failures until I update my end for the most recent version, which isn't a huge deal and might be good for people to know when there is version mismatch. but this is why there are so many JSON's currently and the different versions. It is a bit of a pain to update the JSON too since I basically manually curate the JSON's on the repo from their sourcing being the google drive: https://drive.google.com/drive/folders/1RnNVTifUESFXXkNh3tc8J6kot0xink07?usp=drive_link , so I am completely open to suggestions on how to manage the JSON's and their tests. |
Thanks, @Free-Quarks !
I actually think this would be a good thing, as it would immediately alert us of drift. I'll discuss generating gromet JSON at test time with the PA team and circle back with you. |
skema/text_reading/python/mention_linking -> skema/text_reading/mention_linking/
…247) Resolves #251 This PR is a larger than originally intended. It started as a switch to a different static site generator that led to a series of other changes to correct broken imports and streamline installation. ## Summary of changes - [**documentation**]: Move from Haskell-based doc generation to mkdocs (Python-based static site generator) - new site will include API docs for python and rust modules (scala components are not included for the time being) - [**documentation**]: Publish code coverage reports for Python module (Scala and Rust TBD) - [**misc**]: Refactor module and corresponding `pyproject.toml` to account for missing project deps and broken references - [**misc**] fixed link in `skema_py/README.md` - [**misc**] Added `.unused` suffix to `program_analysis` files that are unused and soon to be removed - [**integration** / @adarshp @jastier ]: Update Dockerfile to include all skema python module components - This removes the need for a separate dockerfile for `img2mml`; for the time being, we still publish a `lumai/askem-skema-img2mml` image (which now mirrors `skema-py`) - [**img2mml**]: `img2mml` model (`cnn_xfmer_OMML-90K_best_model_RPimage.pt` from `https://kraken.sista.arizona.edu/skema/img2mml/models/` is now retrieved at image build time and included in the published docker image (NOTE: we'll probably want to revisit this in the future) - [**text reading** / @enoriega ]: `text_reading` python utilities are included as another sub-module (at the cost of some additional dependencies noted in `pyproject.toml`) - docker-compose files are updated with new invocations reflecting new package structure (ex. `uvicorn skema.skema_py.server:app --host 0.0.0.0 --port 8000`) - [**code2fn** / @vincentraymond-ua ]: corrected a bug related to naming of `tree-sitter-fortran` grammar in [`build_tree_sitter_fortran.py`](https://github.com/ml4ai/skema/blob/7c3c8959df3a909ce758313f035efd91c93938af/skema/program_analysis/TS2CAST/build_tree_sitter_fortran.py#L9) (`build, my-languages.so` -> `build/ts-fortran.so`) ## Notes - ISA's inclusion means the `skema-py` Docker image will now include a Rust installation. We may want to consider combining this with the `skema-rs` components to generate a single image. This might simplify calling things like the code comment extractor from Python for a unified pipeline. fe5f565
This PR is a larger than originally intended. It started as a switch to a different static site generator that led to a series of other changes to correct broken imports and streamline installation.
Summary of changes
pyproject.toml
to account for missing project deps and broken referencesskema_py/README.md
.unused
suffix toprogram_analysis
files that are unused and soon to be removedimg2mml
; for the time being, we still publish alumai/askem-skema-img2mml
image (which now mirrorsskema-py
)img2mml
model (cnn_xfmer_OMML-90K_best_model_RPimage.pt
fromhttps://kraken.sista.arizona.edu/skema/img2mml/models/
is now retrieved at image build time and included in the published docker image (NOTE: we'll probably want to revisit this in the future)text_reading
python utilities are included as another sub-module (at the cost of some additional dependencies noted inpyproject.toml
)uvicorn skema.skema_py.server:app --host 0.0.0.0 --port 8000
)tree-sitter-fortran
grammar inbuild_tree_sitter_fortran.py
(build, my-languages.so
->build/ts-fortran.so
)Notes
skema-py
Docker image will now include a Rust installation. We may want to consider combining this with theskema-rs
components to generate a single image. This might simplify calling things like the code comment extractor from Python for a unified pipeline.