Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need clarification of Gopher in Step 2 #172

Open
mihara-bot opened this issue Jun 18, 2024 · 0 comments
Open

Need clarification of Gopher in Step 2 #172

mihara-bot opened this issue Jun 18, 2024 · 0 comments

Comments

@mihara-bot
Copy link

Dear authors,
I was trying to reimplement the Dolma-Web described in your paper.
However, in the Step 2, using the dolma toolkit, I found Gopher implementation in this repo something different with original Gopher at http://arxiv.org/abs/2112.11446.
Specifically,
There are no computations for 'Duplicate paragraph fraction' and 'Duplicate paragraph character fraction' in current code at /python/dolma/taggers.py , which are provided in Table A1 in the Gopher paper.

Is this a bug or there is no need to compute these metrics? Looking forward to your kind reply.

Best regards,
Xinlin Zhuang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant