Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wording sentencepiece.cpp #1435

Closed
wants to merge 11 commits into from
Closed

Wording sentencepiece.cpp #1435

wants to merge 11 commits into from

Conversation

mikekgfb
Copy link
Contributor

. before newline. Reformat file name to make clear . is not part of filename

. before newline.  Reformat file name to make clear . is not part of filename
Copy link

pytorch-bot bot commented Dec 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1435

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 4 Pending

As of commit 8455d58 with merge base 019f76f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 20, 2024
List tokenizer file to make sure it's present
perform ls for debug only when loading tokenizer model fails
@@ -38,7 +40,13 @@ void SPTokenizer::load(const std::string& tokenizer_path) {
// read in the file
const auto status = _processor->Load(tokenizer_path);
if (!status.ok()) {
fprintf(stderr, "couldn't load %s\n. If this tokenizer artifact is for llama3, please pass `-l 3`.", tokenizer_path.c_str());
// Execute 'ls -al' on the tokenizer path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the print, great for debugging

Looks like the ls is spitting out the root torchchat directory instead of tokenize path which is curious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would explain why the tokenizer can't be loaded, and the AOTI tests keep failing. #1429

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added set -x to the command to echo the ls to make absolutely sure the path is not getting corrupted somehow (not sure how it would, but belts and suspenders)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neutral signal- looks like the arg is not being picked up by ls (which would explain why it just shows PWD)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is sooooo weird! Want to add a print of command and rerun? Maybe there’s some magic character that causes indigestion for the shell running La; and the tokenizer model load?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added print of command before execution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split C style strong conversion and c++ const as convert followed by append

add `set -x` to debug output to get command with tokenizer path echoed
Explícitly Convert c style string constant to std::string
Update to C++11 ABI for AOTI, similar to ET
@Jack-Khuu
Copy link
Contributor

Thanks again for helping with the debug.

@larryliu0820 was able to get his tokenizer changes back in so we can either rebase or close this one
#1443

@mikekgfb
Copy link
Contributor Author

mikekgfb commented Jan 3, 2025

Thanks again for helping with the debug.

@larryliu0820 was able to get his tokenizer changes back in so we can either rebase or close this one #1443

Awesome. I suggest we close this one.

@mikekgfb mikekgfb closed this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants