Multiplication Ops for int16 #1601
pablogranolabar
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So I am in the process of releasing a research paper related to int32 and float32 representation using int16 variable space. This is a method in a similar vein as a time/memory tradeoff attack, where instead of using traditional addition-based bitwise operators, multiplication operators are used within the same int16 memory space to provide a continuous int32+ or float32+ representation at the expense of front end computational resources. Which shouldn't be such a big deal soon, given AMD's decision to expand AVX-512 acceleration primitives while Intel is shelving the same, so in theory this method could be CPU-accelerated at the tensor level and even plugged into PyTorch using an ATen subclass.
So an int16 variable describes a sequence of flags which are used with multiplication operators to represent a continuous space larger than int32/float32. The POC library will be released in a similar fashion as GNU MP Bignum, which is a multiprecision library used to wrangle with 2048+ bit large numbers for things like cryptographic key material generation.
The thought would be, to first refactor the ggml/llama weight conversion scripts to accommodate the smaller int16 representation, and then integrate the float32 functions in llama/ggml inference. And then explore the AVX-512 acceleration idea from there.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions