detective needle test

Note

the x-axis is in characters, not in tokens, tokenization before each run is a future endeavour, but I also like how lightweight it makes it, not having to rely on tokenizers and also giving an idea of how much tokens map to in reality (e.g. 32k context being 128k~ characters, makes summarizing multiple news articles extremely viable!)

detective needle test

a "deterministic"(1) way to needle test without AI judges, compatible with all oAI endpoints, including tabby and vllm.

no dependencies, just copy config.example.json to config.json, edit the options and run node index.js, once done open up index.html in your webbrowser.

while the test is running, it'll keep writing its results, so you can refresh the html page to see the progress so far, a new test run will overwrite it currently.

the neat thing about ENDPOINTS being an array in the config is that you can host multiple backends on e.g. runpod or vast to get through the test faster.

what is a needle test?

A needle test allows testing the recall ability of an LLM. It works by inserting a tiny fact in a long context (LLM input), and then tests whether the LLM is capable of answering a question about this fact.

chart examples

historical charts

as development progressed, the chart type has changed, but since running these tests cost me a lot, I want to at least publish those findings here as a historical charts section

(1) deterministic in quotes because there's no AI judge, the test itself was meant to allow the model to have multiple needle tests with temp 1, so no 2 runs will be the exact same unless you set temp 0

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
backends		backends
.gitignore		.gitignore
README.md		README.md
config.example.json		config.example.json
demo.png		demo.png
demo_matrix_3.png		demo_matrix_3.png
index.html		index.html
index.js		index.js
logo.jpeg		logo.jpeg
multi_graph.jpeg		multi_graph.jpeg
text.txt		text.txt
util.js		util.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

detective needle test

what is a needle test?

chart examples

historical charts

About

Releases

Packages

Languages

lucyknada/detective-needle-llm

Folders and files

Latest commit

History

Repository files navigation

detective needle test

what is a needle test?

chart examples

historical charts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages