Add a LanguageModel class that implements the Llama2 architecture via llama2.c and Emscripten #26

gohai · 2023-08-01T09:33:12Z

This makes it possible to do inference on toy-models (15M, 42M, 110M parameters) using the Llama2 architecture, as implemented in llama2.c.

The included emscripten build artifacts came from this tree: https://github.com/gohai/llama2.c-emscripten

TODO:

CDN for models & tokenizer.bin
not firm about naming of things ("LanguageModel"?, "manual*"?, "tokens"?)
would be nice to come up with a Colab notebook to do training/fine-tuning of custom models

shiffman

Wow, this is amazing @gohai!

First, as we work on documenting this, a wonderful reference is Let Us Show You How GPT Works by @aatishb. Hi Aatish, tagging you mostly to say hi, but would of course welcome your input!

I added a few comments @gohai. In my experience with teaching the previous charRNN models with ml5.js, the main pain points were:

Generating character by character or token by token was very confusing for students and should probably be reserved for more advanced use cases (though great to include in ml5.js if we can?)
What students often want to do is train their own model, but I was never able to successfully pull this off with all the steps required to train in python then convert to JS. A clear and easy to use colab notebook could absolutely work though!

My other "worry" relates to treading into this territory and providing "out of the box" models. LLMs, as is well documented, have all sorts of bias and other issues. It would wonderful to collaborate with the team members who are working on educational materials and documentation to think about how we guide users to be aware of the limitations and possible dangers of certain models and datasets. The code of conduct can be a good reference as well for considering use cases and applications. This could be a great discussion for the full group! cc @sproutleaf

shiffman · 2023-08-03T19:32:38Z

examples/LanguageModel/sketch.js

+  createCanvas(400, 400);
+  background(0);
+
+  lm = ml5.languageModel(onModelLoaded);


Agreed it would be great to support preload().

Also, I wonder if we should require a "string" that references a specific model to load. Even though this is perhaps an extra, unecessary step it emphasizes to the user that this isn't magic, and requires an acknowledgement of the specific model they are using. Some options:

lm = ml5.languageModel("TinyStories", onModelLoaded); lm = ml5.languageModel("TinyLlamas", onModelLoaded);

Referencing the model name is probably more important, but it's also important to emphasize the specific dataset that was used for training.

shiffman · 2023-08-03T19:35:42Z

examples/LanguageModel/sketch.js

+  let options = {
+    temperature: 0.9
+  };
+  lm.generate(prompt, options, onToken);


I love the idea of an onToken() event, but for beginners it would be incredibly useful to have an API that mirrors the HuggingFace inference API. In this case, you pass a prompt and a desired number of tokens and you get the entire result back. Something along the lines of:

let options = { temperature: 0.9, maxTokens: 100 }; lm.generate(prompt, options, gotText);

I might also consider just be the form prompt, maxTokens, callback and then a JS object with more properties if you wanted to set temperature and other more, maybe:

lm.generate(prompt, 100, gotText);

and then:

let options = { prompt: "How are you?", temperature: 0.9, maxTokens: 100 }; lm.generate(options, gotText);

Though maybe it's good to always have the prompt separatel

gohai · 2023-08-04T04:38:17Z

@aatishb I so love this article, particularly how it raises the transparency/blackbox-ism issues with OpenAI to a general audience! (Believe that's also at the core for why bother with toy models - not to suggest a similar type of performance is achievable in the browser, but to unpack this technology, and how it got built/trained, as widely as possibly.)

Thank you for taking the time to think through this, @shiffman!

The suggestion of requiring the user to spell out the model name makes so much sense! (implemented this in 4eb89d9) I feel similarly about what you wrote about the dangers of whatever the "out-of-the-box" experience is going to be. Looking forward to working with the larger group on this!
Following your suggestion, I changed the main callback to be on-completion rather than on-token and simplified the most basic example. (b544b24)
I'll look into training of custom models. From scratch on the TinyStories dataset seems to take a "couple of hours" on four A100. This is probably a step to far for Colab 😄, but perhaps there will be improvements in the implementation. There are people working on implementing fine-tuning using LoRA, this might be doable!
Preload now works (hackishly, d962efe)

shiffman · 2023-08-04T13:49:22Z

examples/LanguageModel/sketch.js

+  lm.generate(prompt, options, gotText);
+}
+
+function gotText(out, lm) {


I like how simple it is to have just a single string that comes back, but it might match how other models in ml5.js work if we instead of wrap everything inside an object, for example:

function gotText(results) { console.log(results.text); console.log(results.words); }

{ prompt: "How are you", text: "I am doing fine.", words: ["I", "am", "doing", "fine"] }

I could also imagine returning a prompt property. Sometimes students want to include the prompt in what they display back and sometimes they don't. Keeping track of it in a global variable can be awkward (especially if there are multiple calls to generate() so having it available in the results can be helpful!

@shiffman words and out are properties of the instance currently - would passing the instance as the result make sense, or would you feel a "result" object value is nicer, which the user is able to mutate as they like etc? (text over out for the literal output?) I'll add prompt, although currently its strictly the beginning of the response - more of a "prefix" really.
Thank you for your review! I'll try it out

… llama2.c and Emscripten See https://github.com/gohai/llama2.c-emscripten for details.

@shiffman

Besides providing a model name, the user can also pass an object containing the URL to a custom model. In both cases, they're explicit about the model they're exploring. As suggested by @shiffman

@shiffman

…-token As suggested by @shiffman. This makes the most basic example easier to understand.

Previously this sometimes only passed the LanguageModel instance. Instead, always pass the best possible "value" as the first, and the instance as an (optional) second.

This drops all examples that use async/await.

@ziyuan-linn

Unsure if calling _{inc,de}crementPreload() manually is the best way to accomplish this. I first tried the p5PreloadHelper from the old repo, but this never made window._preloadCount go above zero for me. (Maybe @ziyuan-linn has some idea?)

…e value

@karpathy

…r top-p sampling (used by default) From @karpathy's commit message: Quick note on sampling, the recommendation for good results is to use `-t 1.0 -p 0.9`, i.e. top-p sampling at 0.9 with temperature 1.0 (this is the default). To control the diversity of samples use either the temperature (i.e. vary `-t` between 0 and 1 and keep top-p off with `-p 0`) or the top-p value (i.e. vary `-p` between 0 and 1 and keep `-t 1`), but not both. Nice explainers on LLM sampling strategies include [this](https://peterchng.com/blog/2023/05/02/token-selection-strategies-top-k-top-p-and-temperature/), [this](https://docs.cohere.com/docs/controlling-generation-with-top-k-top-p) or [this](https://huggingface.co/blog/how-to-generate).

… topp This automatically picks reasonable default value for the other parameter, if not explicity specified, and prints a message if both are.

This matches upstream llama2.c, and prevents a confusing message with the basic example, which specifies a temperature (thus disabling the default top-p sampling).

gohai requested review from shiffman and ziyuan-linn August 1, 2023 09:33

gohai force-pushed the model-llama2 branch 3 times, most recently from 53bd1f6 to e18b755 Compare August 2, 2023 09:31

shiffman reviewed Aug 3, 2023

View reviewed changes

shiffman reviewed Aug 4, 2023

View reviewed changes

gohai force-pushed the model-llama2 branch 4 times, most recently from 2e5365d to 347f5b4 Compare August 9, 2023 11:48

shiffman mentioned this pull request Aug 19, 2023

NeuroEvolution #31

Merged

gohai force-pushed the model-llama2 branch from a645b2c to 24ca52f Compare August 24, 2023 08:57

gohai added 15 commits August 24, 2023 17:01

Add a LanguageModel class that implements the Llama2 architecture via…

5ede9ce

… llama2.c and Emscripten See https://github.com/gohai/llama2.c-emscripten for details.

LanguageModel: Require explicitly specifying the model name used

abca950

Besides providing a model name, the user can also pass an object containing the URL to a custom model. In both cases, they're explicit about the model they're exploring. As suggested by @shiffman

LanguageModel: Change the callback to be on-completion rather than on…

d10242d

…-token As suggested by @shiffman. This makes the most basic example easier to understand.

LanguageModel: Pass value as the first argument to event handlers

0a5b089

Previously this sometimes only passed the LanguageModel instance. Instead, always pass the best possible "value" as the first, and the instance as an (optional) second.

LanguageModel: Reorganize examples

54323f2

This drops all examples that use async/await.

LanguageModel: Fix typo

78ba11c

LanguageModel: Support p5's preload()

5b3218a

Unsure if calling _{inc,de}crementPreload() manually is the best way to accomplish this. I first tried the p5PreloadHelper from the old repo, but this never made window._preloadCount go above zero for me. (Maybe @ziyuan-linn has some idea?)

LanguageModel: Rename .out to .text

8fcfe05

LanguageModel: Save prompt with instance

7410aaa

LanguageModel: Make callback/event return the instance rather than th…

d338c73

…e value

LanguageModel: Fix generate() options

dc41231

LanguageModel: Rename the "steps" option to "maxTokens"

7e03afa

LanguageModel: Better default options when user passes temperature or…

fdf8fb2

… topp This automatically picks reasonable default value for the other parameter, if not explicity specified, and prints a message if both are.

LanguageModel: Disable top-p sampling by default

429dc2e

This matches upstream llama2.c, and prevents a confusing message with the basic example, which specifies a temperature (thus disabling the default top-p sampling).

gohai force-pushed the model-llama2 branch from 24ca52f to 429dc2e Compare August 24, 2023 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a LanguageModel class that implements the Llama2 architecture via llama2.c and Emscripten #26

Add a LanguageModel class that implements the Llama2 architecture via llama2.c and Emscripten #26

gohai commented Aug 1, 2023 •

edited

Loading

shiffman left a comment

shiffman Aug 3, 2023

shiffman Aug 3, 2023

gohai commented Aug 4, 2023 •

edited

Loading

shiffman Aug 4, 2023

gohai Aug 5, 2023 •

edited

Loading

Add a LanguageModel class that implements the Llama2 architecture via llama2.c and Emscripten #26

Are you sure you want to change the base?

Add a LanguageModel class that implements the Llama2 architecture via llama2.c and Emscripten #26

Conversation

gohai commented Aug 1, 2023 • edited Loading

shiffman left a comment

Choose a reason for hiding this comment

shiffman Aug 3, 2023

Choose a reason for hiding this comment

shiffman Aug 3, 2023

Choose a reason for hiding this comment

gohai commented Aug 4, 2023 • edited Loading

shiffman Aug 4, 2023

Choose a reason for hiding this comment

gohai Aug 5, 2023 • edited Loading

Choose a reason for hiding this comment

gohai commented Aug 1, 2023 •

edited

Loading

gohai commented Aug 4, 2023 •

edited

Loading

gohai Aug 5, 2023 •

edited

Loading