Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support batch_size>1 for some operators #406

Merged
merged 24 commits into from
Sep 23, 2024
Merged

Conversation

Cathy0908
Copy link
Collaborator

No description provided.

@Cathy0908 Cathy0908 changed the title [WIP] support batch_size support batch_size for part ops Sep 3, 2024
@Cathy0908 Cathy0908 changed the title support batch_size for part ops support batch_size>1 for part operators Sep 3, 2024
@Cathy0908 Cathy0908 changed the title support batch_size>1 for part operators support batch_size>1 for some operators Sep 3, 2024
@yxdyc yxdyc requested review from yxdyc, drcege, HYLcool, BeachWang and pan-x-c and removed request for yxdyc and drcege September 10, 2024 07:05
Copy link
Collaborator

@drcege drcege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many ops involved; everyone's effort is needed for the review!
@zhijianma @HYLcool @pan-x-c

data_juicer/ops/filter/alphanumeric_filter.py Show resolved Hide resolved
data_juicer/ops/filter/alphanumeric_filter.py Outdated Show resolved Hide resolved
data_juicer/ops/base_op.py Show resolved Hide resolved
@drcege
Copy link
Collaborator

drcege commented Sep 12, 2024

I greatly appreciate the contributions made in this PR. Please sync the latest changes from the main branch and carefully resolve any conflicts, particularly in core/data.py.

@drcege drcege added enhancement New feature or request dj:op issues/PRs about some specific OPs labels Sep 12, 2024
Copy link
Collaborator

@yxdyc yxdyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:
[] default bs_size= 1--> a larger suitable value
[] check redundant _batched_op=True in process func

@yxdyc yxdyc merged commit 97d70e1 into modelscope:main Sep 23, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dj:op issues/PRs about some specific OPs enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants