Wording sentencepiece.cpp #1435

mikekgfb · 2024-12-20T17:52:26Z

. before newline. Reformat file name to make clear . is not part of filename

pytorch-bot · 2024-12-20T17:52:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1435

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 4 Pending

As of commit 8455d58 with merge base 019f76f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

List tokenizer file to make sure it's present

perform ls for debug only when loading tokenizer model fails

Jack-Khuu · 2024-12-23T06:48:03Z

tokenizer/sentencepiece.cpp

@@ -38,7 +40,13 @@ void SPTokenizer::load(const std::string& tokenizer_path) {
  // read in the file
  const auto status = _processor->Load(tokenizer_path);
  if (!status.ok()) {
-    fprintf(stderr, "couldn't load %s\n. If this tokenizer artifact is for llama3, please pass `-l 3`.", tokenizer_path.c_str());
+    // Execute 'ls -al' on the tokenizer path


Thanks for adding the print, great for debugging

Looks like the ls is spitting out the root torchchat directory instead of tokenize path which is curious

That would explain why the tokenizer can't be loaded, and the AOTI tests keep failing. #1429

I added set -x to the command to echo the ls to make absolutely sure the path is not getting corrupted somehow (not sure how it would, but belts and suspenders)

Neutral signal- looks like the arg is not being picked up by ls (which would explain why it just shows PWD)

That is sooooo weird! Want to add a print of command and rerun? Maybe there’s some magic character that causes indigestion for the shell running La; and the tokenizer model load?

Added print of command before execution

Split C style strong conversion and c++ const as convert followed by append

add `set -x` to debug output to get command with tokenizer path echoed

Add print

Fix typo.

Explícitly Convert c style string constant to std::string

Update to C++11 ABI for AOTI, similar to ET

Jack-Khuu · 2025-01-03T03:35:51Z

Thanks again for helping with the debug.

@larryliu0820 was able to get his tokenizer changes back in so we can either rebase or close this one
#1443

mikekgfb · 2025-01-03T04:15:46Z

Thanks again for helping with the debug.

@larryliu0820 was able to get his tokenizer changes back in so we can either rebase or close this one #1443

Awesome. I suggest we close this one.

Wording sentencepiece.cpp

6cac9ea

. before newline. Reformat file name to make clear . is not part of filename

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 20, 2024

Jack-Khuu approved these changes Dec 20, 2024

View reviewed changes

mikekgfb added 3 commits December 21, 2024 16:44

Merge branch 'pytorch:main' into patch-32

ad5ba1e

List tokenizer file in sentencepiece.cpp

8195034

List tokenizer file to make sure it's present

Update sentencepiece.cpp

f7b3df6

perform ls for debug only when loading tokenizer model fails

Jack-Khuu reviewed Dec 23, 2024

View reviewed changes

mikekgfb added 7 commits December 22, 2024 23:48

Update sentencepiece.cpp

e5635fd

add `set -x` to debug output to get command with tokenizer path echoed

Merge branch 'pytorch:main' into patch-32

fb24e0b

Update sentencepiece.cpp

871a874

Add print

Update sentencepiece.cpp

9f6b198

Fix typo.

Update sentencepiece.cpp

ece7b45

Explícitly Convert c style string constant to std::string

Merge branch 'pytorch:main' into patch-32

55c48b5

Update build_native.sh

8455d58

Update to C++11 ABI for AOTI, similar to ET

mikekgfb closed this Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wording sentencepiece.cpp #1435

Wording sentencepiece.cpp #1435

mikekgfb commented Dec 20, 2024

pytorch-bot bot commented Dec 20, 2024 •

edited

Loading

Jack-Khuu Dec 23, 2024

mikekgfb Dec 23, 2024

mikekgfb Dec 23, 2024

Jack-Khuu Dec 23, 2024

mikekgfb Dec 23, 2024

mikekgfb Dec 23, 2024

mikekgfb Dec 23, 2024

Jack-Khuu commented Jan 3, 2025

mikekgfb commented Jan 3, 2025

Wording sentencepiece.cpp #1435

Wording sentencepiece.cpp #1435

Conversation

mikekgfb commented Dec 20, 2024

pytorch-bot bot commented Dec 20, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1435

⏳ No Failures, 4 Pending

Jack-Khuu Dec 23, 2024

Choose a reason for hiding this comment

mikekgfb Dec 23, 2024

Choose a reason for hiding this comment

mikekgfb Dec 23, 2024

Choose a reason for hiding this comment

Jack-Khuu Dec 23, 2024

Choose a reason for hiding this comment

mikekgfb Dec 23, 2024

Choose a reason for hiding this comment

mikekgfb Dec 23, 2024

Choose a reason for hiding this comment

mikekgfb Dec 23, 2024

Choose a reason for hiding this comment

Jack-Khuu commented Jan 3, 2025

mikekgfb commented Jan 3, 2025

pytorch-bot bot commented Dec 20, 2024 •

edited

Loading