feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

PABannier · 2024-11-16T15:03:31Z

This PR implements the Metal kernel used for the GGML_UNARY_OP_ARGMAX operation.

It is necessary for Encodec.cpp to run on the Metal backend.

ggerganov · 2024-11-16T18:44:05Z

Thanks, will merge these soon. Just need to sync the ggerganov/llama.cpp#10238 from llama.cpp to avoid resolving conflicts manually.

ggerganov · 2024-11-18T09:01:04Z

src/ggml-metal/ggml-metal.m

+                [encoder setBuffer:id_dst  offset:offs_dst         atIndex:1];
+                [encoder setBytes:&ne00    length:sizeof( int64_t) atIndex:2];
+
+                [encoder dispatchThreadgroups:MTLSizeMake(1, 1, 1) threadsPerThreadgroup:MTLSizeMake(nrows, 1, 1)];


Threadgroup sizes are typically maximum of 1024, so this implementation will stop working when nrows is larger than 1024. Instead, we should launch multiple threadgroups and distribute the rows across them.

Also, the implementation currently assumes src0 is contiguous, so add a GGML_ASSERT for that or extend the implementation by providing and taking account of the strides.

implemented argmax kernel

71487a6

ggerganov requested changes Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

PABannier commented Nov 16, 2024

ggerganov commented Nov 16, 2024

ggerganov Nov 18, 2024

feat: add GGML_UNARY_OP_ARGMAX Metal kernel #1019

Are you sure you want to change the base?

feat: add GGML_UNARY_OP_ARGMAX Metal kernel #1019

Conversation

PABannier commented Nov 16, 2024

ggerganov commented Nov 16, 2024

ggerganov Nov 18, 2024

Choose a reason for hiding this comment

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019

feat: add `GGML_UNARY_OP_ARGMAX` Metal kernel #1019