How sparsity helps on GPUs? #1449

Nick-infinity · 2023-12-03T04:03:05Z

Nick-infinity
Dec 3, 2023

I am assuming CSR format is used to store sparse tensor.

For a 7B model that is unstructurally pruned by 70%, the model will have 2.1 B non zero parameters.
The CSR format will increase the size of a non zero values sparse tensor by atleast 2.5 times ( non zero vals, column index and row pointer).
I.e 7B model will become 5.25 B (2.1 x 2.5) model . The speedups will be small as the GPU llm inference in memory bound as all the 5.25B weights will need to be touched by GPU for a single token generation.

It would be really helpful if someone can help me to understand the deepsparse inference gains better

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How sparsity helps on GPUs? #1449

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How sparsity helps on GPUs? #1449

Nick-infinity Dec 3, 2023

Replies: 0 comments

Nick-infinity
Dec 3, 2023