Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Caching logic to only trigger on the first inference sample #1369

Merged
merged 3 commits into from
Nov 13, 2024

Conversation

Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Nov 13, 2024

When the model cache is already set up, there is no need to call setup_caches each time a sample is passed in.
This is normally fine, but torchtune is noisy (as it should) when setup_cache is unnecessarily called.

This just adds a check for first sample

Warnings that are now missing

Key value caches are already setup. You cannot call ``setup_caches()`` twice. Skipping.
Key value caches are already setup. You cannot call ``setup_caches()`` twice. Skipping.
Key value caches are already setup. You cannot call ``setup_caches()`` twice. Skipping.
Key value caches are already setup. You cannot call ``setup_caches()`` twice. Skipping.

Generation after fix (no warning)

python torchchat.py generate llama3.2-11B --prompt "What's in this image?" --image-prompt assets/dog.jpg  --num-samples 2

Note: NumExpr detected 22 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
NumExpr defaulting to 16 threads.
PyTorch version 2.6.0.dev20241002+cu121 available.
lm_eval is not installed, GPTQ may not be usable
Using device=cuda NVIDIA PG509-210
Loading model...
Time to load model: 10.45 seconds
-----------------------------------------------------------
What's in this image?The image features a dog sitting on a skateboard with its tongue out, sporting sunglasses. The dog has a white chest with brown ears and a brown patch of fur between its eyes and nose. It wears a blue collar and red sunglasses. The skateboard is red and yellow, with two yellow wheels on either side, and the dog appears to be sitting on top of it while facing the camera. The background of the image is blurry but seems to feature a paved road lined with green grass and trees.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 99 tokens
Time for inference 1: 17.3737 sec total
Time to first token: 2.5819 sec with parallel prefill.

      Total throughput: 5.7558 tokens/sec, 0.1737 s/token
First token throughput: 0.3873 tokens/sec, 2.5819 s/token
 Next token throughput: 6.6929 tokens/sec, 0.1494 s/token

Bandwidth achieved: 122.55 GB/s
*** This first iteration will include cold start effects for dynamic import, hardware caches. ***

========================================

What's in this image?The image depicts a medium-sized white dog sitting on a red skateboard on an asphalt path. The dog has brown ears and a tan patch over one eye, giving it a slightly inquisitive appearance. Its tongue is protruding slightly from its mouth, which is slightly open, suggesting that the dog may be panting or playing along with the photo.

The dog is wearing red-framed sunglasses with black lenses, an alternative to a pair of goggles, and a blue collar. The skateboard features yellow wheels and has the word "CRAZ" written on the underside. The dog's body is facing forward, but it's looking toward the camera with its head turned slightly to the side, as if posing.

The background of the image shows a green grassy area and a hedge or bush behind it. The overall atmosphere suggests that the dog is enjoying a fun day out, possibly on a sunny day, and is ready to take a ride on its skateboard. The image is likely intended to be
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 199 tokens
Time for inference 2: 32.2208 sec total
Time to first token: 1.3305 sec with parallel prefill.

      Total throughput: 6.2072 tokens/sec, 0.1611 s/token
First token throughput: 0.7516 tokens/sec, 1.3305 s/token
 Next token throughput: 6.4421 tokens/sec, 0.1552 s/token

Bandwidth achieved: 132.16 GB/s

========================================

Copy link

pytorch-bot bot commented Nov 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1369

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 7047d79 with merge base 93f713f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 13, 2024
max_batch_size=1,
max_seq_length=max_seq_length,
)
if not skip_cache_setup:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only change in this block: rest is whitespace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to directly telling the cache status from model, instead of forwarding a new attribute from outside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not off the top of my head, but definitely worth baking into our model abstraction in the future

@@ -591,6 +591,7 @@ def generate(
Dict[str, Any]
] = None, # List of Image prompt tensors for multimodal models
start_pos: int = 0,
skip_cache_setup: bool = False,
Copy link
Contributor

@Gasoonjia Gasoonjia Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok for now, but introducing new inputs into generate function might trigger my nightmare 😣, making it farther away from our target.
I would like to delegate it to model side to suppress the warning mgs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's not great. Luckily it's light so we can abstract it easily later on

@Jack-Khuu Jack-Khuu merged commit 6eae887 into main Nov 13, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants