-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert numpy.floating values in meta.json #13644
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add workflow files for cibuildwheel * Add config for cibuildwheel * Set version for experimental prerelease * Try updating cython * Skip 32-bit windows builds * Revert "Try updating cython" This reverts commit c1b794a. * Try to import cibuildwheel settings from previous setup
Implemented a foundational Scottish Gaelic (gd) language option with tokenizer_exceptions and stop_words files.
* Add Kurdish Kurmanji language * Add lex_attrs
Add a context manage nlp.memory_zone(), which will begin memory_zone() blocks on the vocab, string store, and potentially other components. Example usage: ``` with nlp.memory_zone(): for text in nlp.pipe(texts): do_something(doc) # do_something(doc) <-- Invalid ``` Once the memory_zone() block expires, spaCy will free any shared resources that were allocated for the text-processing that occurred within the memory_zone. If you create Doc objects within a memory zone, it's invalid to access them once the memory zone is expired. The purpose of this is that spaCy creates and stores Lexeme objects in the Vocab that can be shared between multiple Doc objects. It also interns strings. Normally, spaCy can't know when all Doc objects using a Lexeme are out-of-scope, so new Lexemes accumulate in the vocab, causing memory pressure. Memory zones solve this problem by telling spaCy "okay none of the documents allocated within this block will be accessed again". This lets spaCy free all new Lexeme objects and other data that were created during the block. The mechanism is general, so memory_zone() context managers can be added to other components that could benefit from them, e.g. pipeline components. I experimented with adding memory zone support to the tokenizer as well, for its cache. However, this seems unnecessarily complicated. It makes more sense to just stick a limit on the cache size. This lets spaCy benefit from the efficiency advantage of the cache better, because we can maintain a (bounded) cache even if only small batches of documents are being processed.
Co-authored-by: marinelay <[email protected]>
Co-authored-by: Sofie Van Landeghem <[email protected]> Co-authored-by: Ines Montani <[email protected]>
Co-authored-by: Halvani <>
Co-authored-by: Ines Montani <[email protected]>
Co-authored-by: Ines Montani <[email protected]>
Co-authored-by: Ines Montani <[email protected]>
Co-authored-by: Ines Montani <[email protected]>
Co-authored-by: Ines Montani <[email protected]>
Co-authored-by: Ines Montani <[email protected]>
Co-authored-by: Ines Montani <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ports over a numpy v2 compatibility change from v3.8