Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to be added as maintainer #655

Closed
NickCrews opened this issue Feb 6, 2024 · 5 comments
Closed

Request to be added as maintainer #655

NickCrews opened this issue Feb 6, 2024 · 5 comments

Comments

@NickCrews
Copy link

Hi! I would love to

  • go through and merge the obvious, easy PRs that are open
  • close/deal with easy issues
  • add a GitHub bot that tags a release every week/month. This would allow people to have a recent version to build against that is still stable. Also would allow for an official homebrew formula to ease installation.

Can you give me permissions to be able to do this? Appreciate it!

@NickCrews
Copy link
Author

@albarrentine @missinglink

@missinglink
Copy link
Contributor

Sounds good to me, my basic knowledge of C has prevented me from handling much other than administrative duties.

@albarrentine
Copy link
Contributor

@NickCrews sounds good. I just reverted a few changes the Senzing team had been making recently without pull requests and spoke to them about it offline, so removed the last version tag, don't think that affects any of the existing PRs.

Would love to get some of the issues/PRs resolved. If it's minor, ok to merge, but not anything for now that changes the API.

Re: builds, only thing I'd caution for the moment is around rebuilding data files. Since the parser and e.g. the abbreviations dictionary are not entirely independent, the data files are no longer built automatically on merge as was once the case because it might require parser retraining, which is quite involved. That means technically we should be able to merge most of the abbreviations, etc. that folks contributed it just may not have an impact on results at present. I am experimenting in https://github.com/goodcleanfun with reworking the C codebase into modular, independent components that use clib (which is like copy/paste, not a huge clunky package manager or dependency nightmare, no corporate megacode), making them suitable for other areas of NLP/ML. The transparency level in that world of late leaves much to be desired so I am, quietly, building out some ideas which will find their way back into libpostal.

On things like duckdb integrations and what not, I'm opting not to proceed with the ideas that have been suggested so far around thread-safety, etc. Have something else entirely in mind around model storage that renders some of it moot.

@NickCrews
Copy link
Author

NickCrews commented Feb 6, 2024

roger, I must admit I don't really want the responsibility of in-depth changes, so limiting myself to minor fixups sounds great :)

honestly, just going through and triaging issues, assigning a bad-result label so we can ignore them would be great.

re model storage sounds great if you want to overhaul the model storage, that has been a PITA for all the other language bindings. In particular, I was looking into compiling this into WASM, so that then all other languages can link into that, and that .wasm file can be a standalone release artifact. The filesystem can be tricky for WASM as I understand it, so other implementations that are WASM-friendly would be great.

@albarrentine
Copy link
Contributor

Awesome, should be added, and many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants