Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

umarbutler / semchunk Public

Notifications You must be signed in to change notification settings
Fork 9
Star 145

Code
Issues 3
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: umarbutler/semchunk

Releases · umarbutler/semchunk

v2.2.0

12 Jul 11:29

umarbutler

Compare

Choose a tag to compare

Loading

v2.2.0 Latest

Latest

Changed

Switched from having chunkerify() output a function to having it return an instance of the new Chunker() class which should not alter functionality in any way but will allow for the preservation of type hints, fixing #7.

Assets 2

Loading

All reactions

v2.1.0

20 Jun 02:50

umarbutler

Compare

Choose a tag to compare

Loading

v2.1.0

Fixed

Ceased memoizing chunk() (but not token counters) due to the fact that cached outputs of memoized functions are shallow rather than deep copies of original outputs, meaning that if one were to chunk a text and then chunk that same text again and then modify one of the chunks outputted by the first call, the chunks outputted by the second call would also be modified. This behaviour is not expected and therefore undesirable. The memoization of token counters is not impacted as they output immutable objects, namely, integers.

Assets 2

Loading

All reactions

v2.0.0

19 Jun 06:08

umarbutler

Compare

Choose a tag to compare

Loading

v2.0.0

Added

Added support for multiprocessing through the processes argument passable to chunkers constructed by chunkerify().

Removed

No longer guaranteed that semchunk is pure Python.

Assets 2

Loading

All reactions

v1.0.1

02 Jun 11:44

umarbutler

Compare

Choose a tag to compare

Loading

v1.0.1

Fixed

Documented the progress argument in the docstring for chunkerify() and its type hint in the README.

Assets 2

Loading

All reactions

v1.0.0

02 Jun 11:41

umarbutler

Compare

Choose a tag to compare

Loading

v1.0.0

Added

Added a progress argument to the chunker returned by chunkerify() that, when set to True and multiple texts are passed, displays a progress bar.

Assets 2

Loading

All reactions

v0.3.2

01 Jun 06:31

umarbutler

Compare

Choose a tag to compare

Loading

v0.3.2

Fixed

Fixed a bug where a DivisionByZeroError would be raised where a token counter returned zero tokens when called from merge_splits(), courtesy of @jcobol (#5) (7fd64eb), fixing #4.

Assets 2

Loading

All reactions

v0.3.1

18 May 12:13

umarbutler

Compare

Choose a tag to compare

Loading

v0.3.1

Fixed

Fixed typo in error messages in chunkerify() where it was referred to as make_chunker().

Assets 2

Loading

All reactions

v0.3.0

18 May 12:06

umarbutler

Compare

Choose a tag to compare

Loading

v0.3.0

Added

Introduced the chunkerify() function, which constructs a chunker from a tokenizer or token counter that can be reused and can also chunk multiple texts in a single call. The resulting chunker speeds up chunking by 40.4% thanks, in large part, to a token counter that avoid having to count the number of tokens in a text when the number of characters in the text exceed a certain threshold, courtesy of @R0bk (#3) (337a186).

Assets 2

Loading

All reactions

v0.2.4

13 May 11:34

umarbutler

Compare

Choose a tag to compare

Loading

v0.2.4

Changed

Improved chunking performance with larger chunk sizes by switching from linear to binary search for the identification of optimal chunk boundaries, courtesy of @R0bk (#3) (1e3ddb9).

Assets 2

Loading

All reactions

v0.2.3

11 Mar 04:29

umarbutler

Compare

Choose a tag to compare

Loading

v0.2.3

Fixed

Ensured that memoization does not overwrite chunk()'s function signature.

Assets 2

Loading

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.