Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarking scripts #1030

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
655 changes: 655 additions & 0 deletions llama31-1213/cpu_aoti_4.txt

Large diffs are not rendered by default.

223 changes: 223 additions & 0 deletions llama31-1213/cpu_aoti_8.txt

Large diffs are not rendered by default.

225 changes: 225 additions & 0 deletions llama31-1213/cpu_aoti_b16.txt

Large diffs are not rendered by default.

668 changes: 668 additions & 0 deletions llama31-1213/cpu_aoti_pt2_4.txt

Large diffs are not rendered by default.

236 changes: 236 additions & 0 deletions llama31-1213/cpu_aoti_pt2_8.txt

Large diffs are not rendered by default.

223 changes: 223 additions & 0 deletions llama31-1213/cpu_aoti_pt2_b16.txt

Large diffs are not rendered by default.

299 changes: 299 additions & 0 deletions llama31-1213/cpu_compile_4.txt

Large diffs are not rendered by default.

81 changes: 81 additions & 0 deletions llama31-1213/cpu_compile_8.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@

OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"linear:int8": {"groupsize": 0}, "precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"linear:int8": {"groupsize": 0}, "precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
PyTorch version 2.6.0.dev20241213+cu124 available.
Unabled to import torchao experimental quant_api with error: [Errno 2] No such file or directory: '/home/jackkhuu/oss/torchchat/torchao-build/src/ao/torchao/experimental/quant_api.py'
Using device=cpu Intel(R) Xeon(R) Platinum 8339HC CPU @ 1.80GHz
Loading model...
Time to load model: 0.11 seconds
Quantizing the model with: {'linear:int8': {'groupsize': 0}, 'precision': {'dtype': 'bfloat16'}, 'executor': {'accelerator': 'cpu'}}
Time to quantize model: 29.28 seconds
-----------------------------------------------------------
Once upon a time, I was sitting in a coffee shop, surrounded by the bustling sounds of city living. As I sipped on my coffee, I noticed a peculiar-looking man walking in, wearing a stylish three-piece suit. He was an older gentleman, with a kind face and wispy hair.

The man approached the counter and ordered his coffee, engaging in a warm conversation with the barista. When he received his drink, he sat down in a nearby seat, his eyes scanning the room as if searching for someone. Our eyes met, and he smiled kindly, nodding at me.

Intrigued, I decided to strike up a conversation. "Excuse me," I said, "but you seem like a man with a great story. Would you like to hear one of mine?"

He chuckled, his eyes lighting up with interest. "That sounds intriguing," he replied. "I'm always eager to hear a good tale."

I launched into a story about a peculiar occurrence in my childhood, about a dream that I had experienced as a young boy. The man listened attentively, his expression growing more and more intrigued.

As I finished my story, the man leaned in, his voice barely above a whisper. "I'm glad you shared that with me," he said
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 255 tokens
Time for inference 1: 167.6127 sec total
Time to first token: 0.3494 sec with parallel prefill.

Total throughput: 1.5273 tokens/sec, 0.6547 s/token
First token throughput: 2.8620 tokens/sec, 0.3494 s/token
Next token throughput: 1.5245 tokens/sec, 0.6559 s/token

Bandwidth achieved: 13.07 GB/s
*** This first iteration will include cold start effects for dynamic import, hardware caches, JIT compilation. ***
just-in-time compilation time (incl run time): 1.7e+02 seconds

========================================

Once upon a time, in a far-off kingdom, there was a beautiful and kind-hearted princess named Sophia. She was loved by all who knew her, and her smile could light up the darkest of rooms.
One day, a wise old wizard named Zephyr came to the kingdom, seeking refuge from the dangers of the outside world. He was a master of magic, and the princess was immediately drawn to his wisdom and kindness.
As the days passed, Sophia and Zephyr grew closer, and they began to talk about their dreams and aspirations. Sophia confided in Zephyr about her desire to help those in need, to make a difference in the world, and to bring joy and happiness to all whom she met.
Zephyr, seeing the good in Sophia's heart, decided to share a secret with her. He told her that he had a magical amulet that would grant her wishes, but warned her that it came with a great responsibility.
The amulet, Zephyr explained, was a powerful tool that could change the course of history, and it was up to Sophia to use it wisely. He told her that with this amulet, she could bring peace and prosperity to the kingdom, but she must be careful not to abuse its power.
Sophia was both
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 255 tokens
Time for inference 2: 38.5396 sec total
Time to first token: 0.3094 sec with parallel prefill.

Total throughput: 6.6425 tokens/sec, 0.1505 s/token
First token throughput: 3.2318 tokens/sec, 0.3094 s/token
Next token throughput: 6.6701 tokens/sec, 0.1499 s/token

Bandwidth achieved: 56.85 GB/s

========================================

Once upon a time, we were young, fun-loving, and carefree. We had high school sweethearts, prom dates, and graduation ceremonies. But, as time went by, life took its toll. We got older, got married, got kids, and got busy. The romance faded, and the excitement began to wane. But, that was before we discovered a secret to rekindle the flame.
One day, while browsing through an online blog, we stumbled upon an article about a magical way to bring back the spark. It mentioned a simple yet powerful technique: couples’ date nights. That's right! Regularly scheduled dates with your partner can reignite the flames of passion, intimacy, and love.
At first, it sounded too simple, too cheesy, or too old-fashioned. But, we were desperate to revive our relationship, so we decided to give it a try. We started with a weekly dinner date, just the two of us. We would pick a restaurant, plan a fun activity, and make it a priority to spend quality time together.
As we began to incorporate date nights into our busy schedules, we noticed a significant change. We started to look forward to our alone time together, and the excitement began to build again. We would laugh, reminis
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 255 tokens
Time for inference 3: 35.3439 sec total
Time to first token: 0.3349 sec with parallel prefill.

Total throughput: 7.2431 tokens/sec, 0.1381 s/token
First token throughput: 2.9861 tokens/sec, 0.3349 s/token
Next token throughput: 7.2838 tokens/sec, 0.1373 s/token

Bandwidth achieved: 61.99 GB/s

========================================


Warning: Excluding compile in calculations
Average tokens/sec (total): 6.94
Average tokens/sec (first token): 3.11
Average tokens/sec (next tokens): 6.98

Memory used: 0.00 GB
73 changes: 73 additions & 0 deletions llama31-1213/cpu_compile_b16.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@

OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OMP_NUM_THREADS=16 numactl --cpunodebind=0 --membind=0 python3 torchchat.py generate llama3.1 --quantize '{"precision": {"dtype":"bfloat16"}, "executor":{"accelerator":"cpu"}}' --prompt "Once upon a time," --max-new-tokens 256 --compile --num-samples 3
PyTorch version 2.6.0.dev20241213+cu124 available.
Unabled to import torchao experimental quant_api with error: [Errno 2] No such file or directory: '/home/jackkhuu/oss/torchchat/torchao-build/src/ao/torchao/experimental/quant_api.py'
Using device=cpu Intel(R) Xeon(R) Platinum 8339HC CPU @ 1.80GHz
Loading model...
Time to load model: 0.12 seconds
Quantizing the model with: {'precision': {'dtype': 'bfloat16'}, 'executor': {'accelerator': 'cpu'}}
Time to quantize model: 0.01 seconds
-----------------------------------------------------------
Once upon a time, in a small village nestled in the rolling hills of the countryside, there lived a young girl named Sophia. Sophia was a curious and adventurous child who loved to explore the world around her. She spent most of her days playing outside, chasing after butterflies, and watching the clouds roll by.
One day, while wandering through the village, Sophia stumbled upon a small, mysterious shop tucked away on a quiet street. The sign above the door read "Curios and Wonders," and the windows were filled with all sorts of strange and fascinating objects. Sophia's curiosity was piqued, and she pushed open the door to venture inside.
The shop was dimly lit, and the air was thick with the scent of old books and dust. Sophia's eyes adjusted slowly to the darkness, and she saw rows upon rows of shelves stacked high with peculiar items. There were strange artifacts, rare minerals, and even a few taxidermied animals peeking out from behind a velvet curtain.
Sophia wandered through the shop, running her fingers over the various objects on display. She picked up a delicate crystal pendant, a vintage locket, and a small, leather-bound book. As she touched each item, she felt a strange sensation, as if the object was imbuing her with
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 255 tokens
Time for inference 1: 196.0270 sec total
Time to first token: 0.6336 sec with parallel prefill.

Total throughput: 1.3059 tokens/sec, 0.7657 s/token
First token throughput: 1.5783 tokens/sec, 0.6336 s/token
Next token throughput: 1.3051 tokens/sec, 0.7662 s/token

Bandwidth achieved: 20.97 GB/s
*** This first iteration will include cold start effects for dynamic import, hardware caches, JIT compilation. ***
just-in-time compilation time (incl run time): 2e+02 seconds

========================================

Once upon a time, there was a small, rural town nestled in the heart of a vast and mysterious forest. The town was called Ravenswood, and it was a place where time seemed to stand still. The residents of Ravenswood lived simple lives, relying on the land for their livelihood and honoring the traditions of their ancestors.
In the center of Ravenswood stood an ancient, gnarled tree, its branches twisted and knotted with age. The townspeople believed that this tree held mystical powers, and they would often gather around its base to pray, tell stories, and seek guidance.
One day, a young girl named Aria wandered into the forest, searching for a rare herb for her mother's healing potions. As she walked deeper into the woods, the trees seemed to grow taller, and the shadows grew darker. Aria felt a strange, tingling sensation in her fingers, as if the tree was calling to her.
She approached the ancient tree, feeling a sense of wonder and awe wash over her. As she reached out to touch the trunk, a low, rumbling voice spoke to her, echoing in her mind.
"Aria, child of Ravenswood, I have been waiting for you. You have come to seek the secrets of the forest, and I shall grant them
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 255 tokens
Time for inference 2: 74.2544 sec total
Time to first token: 0.4688 sec with parallel prefill.

Total throughput: 3.4476 tokens/sec, 0.2901 s/token
First token throughput: 2.1333 tokens/sec, 0.4688 s/token
Next token throughput: 3.4560 tokens/sec, 0.2894 s/token

Bandwidth achieved: 55.37 GB/s

========================================

Once upon a time, in a world not so different from our own, there lived a young girl named Sophia. Sophia loved to dream big and chase her passions. She was a curious and ambitious individual who always sought to learn more and push beyond her boundaries. One day, Sophia stumbled upon an old, mysterious-looking book hidden in the attic of her family's old mansion. The book was bound in a strange, glowing material that seemed to pulse with an otherworldly energy.
As Sophia opened the book, she discovered that it contained ancient knowledge and secrets that had been hidden for centuries. The book spoke of magical realms, hidden dimensions, and mystical creatures that existed beyond the veil of the mundane world. Sophia was both fascinated and terrified by the secrets revealed within the book's pages.
As she delved deeper into the book, Sophia began to notice strange occurrences happening around her. Objects would move on their own, and she would hear whispers in the dead of night. It was as if the book was trying to communicate with her, drawing her into a world of wonder and magic. Sophia's heart pounded with excitement as she realized that she had stumbled upon something much bigger than herself.
With every passing day, Sophia became more and more enthralled by the mystical world within the book. She would
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generated 255 tokens
Time for inference 3: 75.9722 sec total
Time to first token: 0.3684 sec with parallel prefill.

Total throughput: 3.3697 tokens/sec, 0.2968 s/token
First token throughput: 2.7142 tokens/sec, 0.3684 s/token
Next token throughput: 3.3728 tokens/sec, 0.2965 s/token

Bandwidth achieved: 54.12 GB/s

========================================


Warning: Excluding compile in calculations
Average tokens/sec (total): 3.41
Average tokens/sec (first token): 2.42
Average tokens/sec (next tokens): 3.41

Memory used: 0.00 GB
Loading
Loading