performance w.r.t W8A8 without groups #37

GuoYi0 · 2024-12-12T03:45:36Z

Thanks very much for the great work. I find that the W4A4 contains group, then the gemm accumulation may not be executed inside the tensor core using int32 accumulator, may I ask how the performance of this method compares to W8A8 without groups? Could you please provide some statistics ?
Thanks very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance w.r.t W8A8 without groups #37

performance w.r.t W8A8 without groups #37

GuoYi0 commented Dec 12, 2024

performance w.r.t W8A8 without groups #37

performance w.r.t W8A8 without groups #37

Comments

GuoYi0 commented Dec 12, 2024