GPT-2 Flavor Released!
Details
Models were being evaluated taking two main metrics into account: speed and performance. I've noticed that performance for 5_1 quantization is the fastest model from which you cannot notice important performance degradation at all, below that size, it gets a little dumber, so I'm using it as default from now (for models below 2.7b parameters of course, from here and above will remain 4_0). So cool that it's just 1GB terminal app that you can use to power your own apps!