run_vit_b_quant.py runs slower than run_bit_b.py #898

jerryzh168 · 2024-09-17T02:14:00Z

run_vit_b_quant.py
elapsed_time: 11.0519150390625 milliseconds

run_bit_b.py
elapsed_time: 1.2272755432128906 milliseconds

this is with int8_dynamic_activation_int8_weight

The text was updated successfully, but these errors were encountered:

jerryzh168 · 2024-09-17T03:19:26Z

it seems torch==2.4.0 does not have the drop (with unwrap_tensor_subclass)

run_vit_b_quant.py
elapsed_time: 1.288721923828125 milliseconds

run_bit_b.py
elapsed_time: 1.561510772705078 milliseconds

int8_weight_only
run_vit_b_quant.py
elapsed_time: 1.3892197265625 milliseconds

run_bit_b.py
elapsed_time: 1.543534698486328 milliseconds

bdhirsh · 2024-09-17T15:10:45Z

hmm. This might be a good example of "subclass runtime overhead" given the fact that you're pointing out that you see the slowdown goes away on 2.4.0 when using unwrap_tensor_subclasses. But it would be nice to have a profiler trace that actually shows us that most of the time is spent in python overhead and not e.g. compile generating a slower artifact. @jerryzh168 any chance you can get a profile output? Also cc @IvanKobzarev

IvanKobzarev · 2024-09-17T15:14:50Z

Yeah, will try to repro and profile it.

jerryzh168 · 2024-09-18T00:27:57Z

we'll be able to cherry-pick the change until 9/30

HDCharles · 2024-09-20T17:45:12Z

hmm. This might be a good example of "subclass runtime overhead" given the fact that you're pointing out that you see the slowdown goes away on 2.4.0 when using unwrap_tensor_subclasses. But it would be nice to have a profiler trace that actually shows us that most of the time is spent in python overhead and not e.g. compile generating a slower artifact. @jerryzh168 any chance you can get a profile output? Also cc @IvanKobzarev

@bdhirsh i thought compile would trace through the subclass, you're saying there's still a bunch of overhead for subclasses even after compile?

IvanKobzarev · 2024-09-20T20:06:27Z

Found the problem. The main regression is because of dynamo fails to compile fullgraph=True, as a result compiles it partially with graph break on every MultiHeadAttention call and that causes a bad perf.

The compilation fails because compile path picks multi head attention "fast-path".

https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/activation.py#L1286 - there is a manual check to avoid fastpath (native_multi_head_attention) if one of the arguments has torch_function handling.

But this check fail for Subclasses during compilation and compilation tries to compile fastpath via aten.native_multi_head_attention and results in NYI for subclass.

If to take not-fast-path during compilation - benchmark for me shows 1.21ms back

So there is no significant runtime overhead for subclasses, just compilation issue of MultiHeadAttention when there is a subclass as a parameter.

Now thinking on the fix how to make ao subclasses to take only non-fast-path for MultiHeadAttention during compilation.

cpuhrsch · 2024-09-21T00:25:56Z

@IvanKobzarev - You should be able to use https://pytorch.org/docs/main/backends.html#torch.backends.mha.set_fastpath_enabled to disable the fast path.

IvanKobzarev · 2024-09-23T12:30:01Z

@cpuhrsch Thanks, this helps.
@jerryzh168 , I've verified, adding torch.backends.mha.set_fastpath_enabled(False) to run_vit_b_quant.py at the top gets back performance without unwrap_tensor_subclasses

elapsed_time:  1.216195556640625  milliseconds

I will leave it to you where to put torch.backends.mha.set_fastpath_enabled(False) in AO, that AO-quantized models will not take mha.fastpath.

cpuhrsch · 2024-09-23T21:30:01Z

I'd add this setting into the run_vit_b_quant.py example script. We might also want to consider adding a warning to PyTorch when the fast path is enabled and subclasses are used (i.e. one of the arguments has a torch_function).

…torch.compile Summary: Recently we found a perf drop in quantized vit due to pytorch#898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:

…#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to #898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:

…pytorch#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to pytorch#898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:

…th torch.compile (#904) * [float8] improve eager numerics for dynamic scales * leave torch.linalg.vector_norm for another PR Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * cuda Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data and investigate Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data comment Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * upcast to float32 is enough Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * explain why float32 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * _data parity Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * handle sm8.9 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix transformer unit test Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * print if error Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add tutorial for trainable tensor subclass (#908) Summary: The new tutorial provides an example of how to implement a trainable tensor subclass that wraps quantized data. This extends the existing `MyDTypeTensor` with a few necessary steps to ensure proper gradient updates, namely: 1. Define a differentiable constructor 2. Define backward pass for ops of interest (e.g. torch.nn.functional.linear) 3. Handle special ops used by the optimizer (e.g. aten.add, aten.add_) Test Plan: python tutorials/developer_api_guide/my_trainable_tensor_subclass.py * Introducing 1-bit quantization for Llama in torchchat (#910) Differential Revision: D63052325 Pull Request resolved: #911 * Rename Floating point to fp8 (#909) * [float8] fix typo in bitwise_identical unit test (#918) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Adding example for quantized tensor + tensor parallelism (#785) * [WIP] Adding example for quantized tensor + tensor parallelism Summary: This PR adds an example of how quantized tensor subclass can work with DTensor: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md End goal is to rewrite https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama2.py with normal llama2 implementation and show case with DTensor + AffineQuantizedTensor + torch.compile we can get on par performance with the custom tensor parallel implementation Test Plan: torchrun --standalone --nnodes=1 --nproc-per-node=4 tutorials/developer_api_guide/tensor_parallel.py Reviewers: Subscribers: Tasks: Tags: * tensor parallel file * Use DTensor.from instead of distribute_tensor * implementing aten.slice.Tensor (WIP) * working * some shape fix and use more quant primitive ops * Add rowwise test * make rowwise sharding work * compile still not working yet * fake tensor didn't pick up shape changes from transpose * backend='eager' * change transpose to non-inplace op * add error message * works now with torch nightly * remove print * ruff * Clean up * Fix device id --------- Co-authored-by: Ke Wen <[email protected]> * rename cuda mode -> gpu mode (#925) * Add workaround to recover the perf for quantized vit in torch.compile (#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to #898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags: * clean up device checks in float8 unit test files (#923) Summary: While working on rowwise scaling I noticed that some of the CUDA device capability checks we had in the test files did not make sense, cleaning this up. Test Plan: tests pass on my H100 CI, it should skip less tests now since CI only has CUDA capability 8, 9 Reviewers: Subscribers: Tasks: Tags: * [low-bit optim] Change 8-bit and FP8 optim block size from 2048 to 256 to match new bnb v0.44 (#927) * Float8 autoquant weight only (#866) * Fix failing FP6 benchmark (#931) * Remove two if statements in fp8 padding (#935) Reviewed By: vkuzo Differential Revision: D63051205 Pull Request resolved: #935 Approved by: https://github.com/vkuzo * [Distributed] Improve sharding example (#937) * [Distributed] Improve sharding example * Add comment * Add composable QAT quantizer (#938) Summary: This is a utility for users who wish to apply multiple QAT quantizers to their models. In the near future, we expect to add an embedding QAT quantizer that composes with the existing linear QAT quantizers. Test Plan: python test/quantization/test_qat.py -k test_composable_qat_quantizer * resolve conflict with latest main Differential Revision: D63048850 Pull Request resolved: #912 * Add torchchat quantizer Differential Revision: D62394341 Pull Request resolved: #897 * Add compile tests to test suite (#906) * Add compile tests to test suite Summary: This is a follow up PR addressing #839 (comment) We can add more compiler related tests in the future. Next * refactor a bit to use quantize_ API directly * use the test suite in existing API tests Test Plan: python torchao/testing/utils.py Reviewers: Subscribers: Tasks: Tags: * rename * add result check * Fix up CMakeLists and reorganize some code locations Differential Revision: D62711903 Pull Request resolved: #948 * [float8] all-reduce amax on dp mesh instead of global pg (#933) * [float8] all-reduce amax on dp mesh instead of global pg Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * liner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * improve comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move hp tensor inside if Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * int8 dynamic quant + bsr support (#821) This PR, adds in int8 dynamicquant + bsr support. Changes: * Use i8i8 -> bf16 matmul to maintain accuracy * Added a block sparse layout type to AffineQuantizedTensor + check/impl. * Cleaned up benchmark.py script and add a single line `benchmark.sh` file for acceleration numbers * Updated eval.py and added a single line `evaluate.sh` file for accuracy numbers * Lots of lint formatting and README updates * torch.compile now working and is correct * fixing some issues with our support for 70/405B models (#941) Summary: download and convert scripts needed to be updated alongside model.py config files Test Plan: python generate.py --checkpoint_path ../../../checkpoints/meta-llama/Meta-Llama-3.1-70B/model.pth Reviewers: Subscribers: Tasks: Tags: * Update INT8 mixed-precision training test to be less flaky (#950) * Add executorch parallel Differential Revision: D62711909 Pull Request resolved: #953 * test CI Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * better comment on why upcasting Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * control seed Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move unit test to test_compile Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix typo Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * float64 upcasting after allreduce Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * use LinearMMConfig Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: andrewor14 <[email protected]> Co-authored-by: Vaishnavi Gupta <[email protected]> Co-authored-by: Apurva Jain <[email protected]> Co-authored-by: Jerry Zhang <[email protected]> Co-authored-by: Ke Wen <[email protected]> Co-authored-by: Mark Saroufim <[email protected]> Co-authored-by: Vasiliy Kuznetsov <[email protected]> Co-authored-by: Thien Tran <[email protected]> Co-authored-by: Tobias van der Werff <[email protected]> Co-authored-by: Shuqi Yang <[email protected]> Co-authored-by: Scott Roy <[email protected]> Co-authored-by: Jesse Cai <[email protected]> Co-authored-by: HDCharles <[email protected]>

…pytorch#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to pytorch#898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags:

…th torch.compile (pytorch#904) * [float8] improve eager numerics for dynamic scales * leave torch.linalg.vector_norm for another PR Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * cuda Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data and investigate Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * remove _data comment Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * upcast to float32 is enough Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * explain why float32 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * _data parity Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * handle sm8.9 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix transformer unit test Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * print if error Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add tutorial for trainable tensor subclass (pytorch#908) Summary: The new tutorial provides an example of how to implement a trainable tensor subclass that wraps quantized data. This extends the existing `MyDTypeTensor` with a few necessary steps to ensure proper gradient updates, namely: 1. Define a differentiable constructor 2. Define backward pass for ops of interest (e.g. torch.nn.functional.linear) 3. Handle special ops used by the optimizer (e.g. aten.add, aten.add_) Test Plan: python tutorials/developer_api_guide/my_trainable_tensor_subclass.py * Introducing 1-bit quantization for Llama in torchchat (pytorch#910) Differential Revision: D63052325 Pull Request resolved: pytorch#911 * Rename Floating point to fp8 (pytorch#909) * [float8] fix typo in bitwise_identical unit test (pytorch#918) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Adding example for quantized tensor + tensor parallelism (pytorch#785) * [WIP] Adding example for quantized tensor + tensor parallelism Summary: This PR adds an example of how quantized tensor subclass can work with DTensor: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md End goal is to rewrite https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/llama2.py with normal llama2 implementation and show case with DTensor + AffineQuantizedTensor + torch.compile we can get on par performance with the custom tensor parallel implementation Test Plan: torchrun --standalone --nnodes=1 --nproc-per-node=4 tutorials/developer_api_guide/tensor_parallel.py Reviewers: Subscribers: Tasks: Tags: * tensor parallel file * Use DTensor.from instead of distribute_tensor * implementing aten.slice.Tensor (WIP) * working * some shape fix and use more quant primitive ops * Add rowwise test * make rowwise sharding work * compile still not working yet * fake tensor didn't pick up shape changes from transpose * backend='eager' * change transpose to non-inplace op * add error message * works now with torch nightly * remove print * ruff * Clean up * Fix device id --------- Co-authored-by: Ke Wen <[email protected]> * rename cuda mode -> gpu mode (pytorch#925) * Add workaround to recover the perf for quantized vit in torch.compile (pytorch#926) Add temporary workaround to recover the perf for quantized vit under torch.compile Summary: Recently we found a perf drop in quantized vit due to pytorch#898 (comment) This PR add a temp fix until we figure out the longer term fix. I think ideally we should figure out why the tensor subclass check failed in torch.compile (https://github.com/pytorch/pytorch/blob/e4d294221b140fdbb49a64f297bc60c9fcc2f80e/torch/nn/modules/activation.py#L1286) and fix that Test Plan: python tutorials/quantize_vit/run_vit_b_quant.py Reviewers: Subscribers: Tasks: Tags: * clean up device checks in float8 unit test files (pytorch#923) Summary: While working on rowwise scaling I noticed that some of the CUDA device capability checks we had in the test files did not make sense, cleaning this up. Test Plan: tests pass on my H100 CI, it should skip less tests now since CI only has CUDA capability 8, 9 Reviewers: Subscribers: Tasks: Tags: * [low-bit optim] Change 8-bit and FP8 optim block size from 2048 to 256 to match new bnb v0.44 (pytorch#927) * Float8 autoquant weight only (pytorch#866) * Fix failing FP6 benchmark (pytorch#931) * Remove two if statements in fp8 padding (pytorch#935) Reviewed By: vkuzo Differential Revision: D63051205 Pull Request resolved: pytorch#935 Approved by: https://github.com/vkuzo * [Distributed] Improve sharding example (pytorch#937) * [Distributed] Improve sharding example * Add comment * Add composable QAT quantizer (pytorch#938) Summary: This is a utility for users who wish to apply multiple QAT quantizers to their models. In the near future, we expect to add an embedding QAT quantizer that composes with the existing linear QAT quantizers. Test Plan: python test/quantization/test_qat.py -k test_composable_qat_quantizer * resolve conflict with latest main Differential Revision: D63048850 Pull Request resolved: pytorch#912 * Add torchchat quantizer Differential Revision: D62394341 Pull Request resolved: pytorch#897 * Add compile tests to test suite (pytorch#906) * Add compile tests to test suite Summary: This is a follow up PR addressing pytorch#839 (comment) We can add more compiler related tests in the future. Next * refactor a bit to use quantize_ API directly * use the test suite in existing API tests Test Plan: python torchao/testing/utils.py Reviewers: Subscribers: Tasks: Tags: * rename * add result check * Fix up CMakeLists and reorganize some code locations Differential Revision: D62711903 Pull Request resolved: pytorch#948 * [float8] all-reduce amax on dp mesh instead of global pg (pytorch#933) * [float8] all-reduce amax on dp mesh instead of global pg Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * liner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * improve comments Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move hp tensor inside if Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * linter Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * int8 dynamic quant + bsr support (pytorch#821) This PR, adds in int8 dynamicquant + bsr support. Changes: * Use i8i8 -> bf16 matmul to maintain accuracy * Added a block sparse layout type to AffineQuantizedTensor + check/impl. * Cleaned up benchmark.py script and add a single line `benchmark.sh` file for acceleration numbers * Updated eval.py and added a single line `evaluate.sh` file for accuracy numbers * Lots of lint formatting and README updates * torch.compile now working and is correct * fixing some issues with our support for 70/405B models (pytorch#941) Summary: download and convert scripts needed to be updated alongside model.py config files Test Plan: python generate.py --checkpoint_path ../../../checkpoints/meta-llama/Meta-Llama-3.1-70B/model.pth Reviewers: Subscribers: Tasks: Tags: * Update INT8 mixed-precision training test to be less flaky (pytorch#950) * Add executorch parallel Differential Revision: D62711909 Pull Request resolved: pytorch#953 * test CI Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * better comment on why upcasting Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * control seed Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * move unit test to test_compile Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix typo Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * float64 upcasting after allreduce Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * use LinearMMConfig Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: --------- Co-authored-by: andrewor14 <[email protected]> Co-authored-by: Vaishnavi Gupta <[email protected]> Co-authored-by: Apurva Jain <[email protected]> Co-authored-by: Jerry Zhang <[email protected]> Co-authored-by: Ke Wen <[email protected]> Co-authored-by: Mark Saroufim <[email protected]> Co-authored-by: Vasiliy Kuznetsov <[email protected]> Co-authored-by: Thien Tran <[email protected]> Co-authored-by: Tobias van der Werff <[email protected]> Co-authored-by: Shuqi Yang <[email protected]> Co-authored-by: Scott Roy <[email protected]> Co-authored-by: Jesse Cai <[email protected]> Co-authored-by: HDCharles <[email protected]>

jerryzh168 mentioned this issue Sep 18, 2024

Enable non-safetensor ser/deser for TorchAoConfig quantized model 🔴 huggingface/transformers#33456

Merged

IvanKobzarev self-assigned this Sep 20, 2024

jerryzh168 mentioned this issue Sep 24, 2024

Add workaround to recover the perf for quantized vit in torch.compile #926

Merged

IvanKobzarev mentioned this issue Sep 24, 2024

[mha] Disable native_mha(fast_path) in dynamo compilation pytorch/pytorch#136542

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_vit_b_quant.py runs slower than run_bit_b.py #898

run_vit_b_quant.py runs slower than run_bit_b.py #898

jerryzh168 commented Sep 17, 2024 •

edited

Loading

jerryzh168 commented Sep 17, 2024 •

edited

Loading

bdhirsh commented Sep 17, 2024

IvanKobzarev commented Sep 17, 2024

jerryzh168 commented Sep 18, 2024 •

edited

Loading

HDCharles commented Sep 20, 2024

IvanKobzarev commented Sep 20, 2024 •

edited

Loading

cpuhrsch commented Sep 21, 2024

IvanKobzarev commented Sep 23, 2024

cpuhrsch commented Sep 23, 2024

run_vit_b_quant.py runs slower than run_bit_b.py #898

run_vit_b_quant.py runs slower than run_bit_b.py #898

Comments

jerryzh168 commented Sep 17, 2024 • edited Loading

jerryzh168 commented Sep 17, 2024 • edited Loading

bdhirsh commented Sep 17, 2024

IvanKobzarev commented Sep 17, 2024

jerryzh168 commented Sep 18, 2024 • edited Loading

HDCharles commented Sep 20, 2024

IvanKobzarev commented Sep 20, 2024 • edited Loading

cpuhrsch commented Sep 21, 2024

IvanKobzarev commented Sep 23, 2024

cpuhrsch commented Sep 23, 2024

jerryzh168 commented Sep 17, 2024 •

edited

Loading

jerryzh168 commented Sep 17, 2024 •

edited

Loading

jerryzh168 commented Sep 18, 2024 •

edited

Loading

IvanKobzarev commented Sep 20, 2024 •

edited

Loading