Jagged tensor micro-benchmarks

Summary: X-link: facebookresearch/FBGEMM#250 - Add jagged tensor micro-benchmarks ``` (foo) bash-5.1$ python -W ignore jagged_tensor_benchmark.py device --embedding-dim 512 INFO:root:######## Jagged (2D) to Dense ######## INFO:root:FBGEMM JaggedTensor: 5.746198445558548e-05 sec 438.11657809101143 GB/s INFO:root:PyTorch NestedTensor: 6.370197981595993e-05 sec 395.1842010676863 GB/s INFO:root: INFO:root:######## Dense to Jagged (2D) ######## INFO:root:FBGEMM JaggedTensor: 3.12004815787077e-05 sec 806.880109734599 GB/s INFO:root:PyTorch NestedTensor: 0.0014418727159500122 sec 17.459249850229323 GB/s INFO:root: INFO:root:######## Jagged (x) Dense -> Jagged ######## INFO:root:(+) FBGEMM JaggedTensor: 4.031049832701683e-05 sec 624.9347699856205 GB/s INFO:root:(+) PyTorch NestedTensor: 0.001540895700454712 sec 16.348564015439923 GB/s INFO:root:(*) FBGEMM JaggedTensor: 4.03628796339035e-05 sec 624.1237550068162 GB/s INFO:root:(*) PyTorch NestedTensor: 0.0015746270418167114 sec 15.998348390445281 GB/s INFO:root: INFO:root:######## Jagged + Dense + Dense -> Jagged ######## INFO:root:FBGEMM JaggedTensor: 5.2013471722602845e-05 sec 645.7602403302756 GB/s INFO:root:PyTorch NestedTensor: 0.0028932960033416747 sec 11.608985724656774 GB/s INFO:root: INFO:root:######## Jagged (1D) to Dense ######## INFO:root:FBGEMM JaggedTensor: 1.526080071926117e-05 sec 6.511322821651443 GB/s INFO:root:PyTorch NestedTensor: 3.976528346538544e-05 sec 2.4729108264901147 GB/s INFO:root: INFO:root:######## Dense to Jagged (1D) ######## INFO:root:FBGEMM JaggedTensor: 1.5250975266098977e-05 sec 6.51551774665078 GB/s INFO:root:PyTorch NestedTensor: 0.0014563246965408326 sec 0.06752340342340878 GB/s INFO:root: (foo) bash-5.1$ ``` Differential Revision: D59973955
pytorch · Sep 19, 2024 · 50b724e · 50b724e
1 parent ebbebd4
commit 50b724e
Show file tree

Hide file tree

Showing 2 changed files with 278 additions and 62 deletions.
diff --git a/fbgemm_gpu/bench/bench_utils.py b/fbgemm_gpu/bench/bench_utils.py
@@ -35,6 +35,8 @@ def benchmark_torch_function(  # noqa: C901
     f,
     # pyre-fixme[2]: Parameter must be annotated.
     args,
+    # pyre-fixme[2]: Parameter must be annotated.
+    kwargs={},
     flush_gpu_cache_size_mb: int = 40,
     iters: int = 10,
     num_warmups: int = 2,
@@ -43,11 +45,11 @@ def benchmark_torch_function(  # noqa: C901
     num_threads: int = 1,
     copy_f_for_multi_thread_test: bool = False,
 ) -> Tuple[float, torch.Tensor]:
-    logging.info(f"Start to benchmark {name}...")
+    logging.debug(f"Start to benchmark {name}...")
     if device != "cpu" and device != "" and device != "cuda":
         torch.cuda.set_device(device)
     for _ in range(num_warmups):
-        output = f(*args)
+        output = f(*args, **kwargs)
 
     assert num_threads > 0
     if device != "cpu" and torch.cuda.is_available() and (num_threads == 1):