Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-2 low quality responses #38

Open
ljaniszewski00 opened this issue Mar 13, 2024 · 2 comments
Open

GPT-2 low quality responses #38

ljaniszewski00 opened this issue Mar 13, 2024 · 2 comments

Comments

@ljaniszewski00
Copy link

I'm trying to develop an iOS app which utilizes your distilgpt2-64-6.mlmodel but getting strange answers to my questions.
I configured the model the same as you in attached ViewController: strategy: .topK(40) and nTokens: 50.
I'm attaching some screenshots that show my conversation with the model (question is at the top (You) and answer from model (Device) is right below).
What can be the cause of such behaviour?

IMG_1538
IMG_1537

@pcuenca
Copy link
Member

pcuenca commented Mar 14, 2024

Hi @ljaniszewski00! GPT2 is just a language model, and hasn't been trained to sustain chat conversations. It's trained to continue a text sequence with plausible text that may come after the prompt, and this task does not usually lend well to question answering. For example, instead of "What is the result of 2+2" you could potentially get better results with "2+2 is " (haven't tested it).

This project is currently in maintenance mode, I'd recommend you take a look at swift-transformers instead. That project uses the latest features in Core ML, which should give you better performance, and provides more tokenizers and tools. In addition, we are internally working on some exciting optimization features for language models.

@ljaniszewski00
Copy link
Author

@pcuenca Thanks for a response. This explains a lot. However as can be seen in the first screenshot I performed the same query as in the demo in readme of this repository but the output is drastically different.

My second question is - do you have any .mlmodel that is especially created for chatting on various topics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants