Node binding of llama.cpp.
llama.cpp: Inference of LLaMA model in pure C/C++
npm install @fugood/llama.node
import { loadModel } from '@fugood/llama.node'
// Initial a Llama context with the model (may take a while)
const context = await loadModel({
model: 'path/to/gguf/model',
use_mlock: true,
n_ctx: 2048,
n_gpu_layers: 1, // > 0: enable GPU
// embedding: true, // use embedding
// lib_variant: 'opencl', // Change backend
})
// Do completion
const { text } = await context.completion(
{
prompt: 'This is a conversation between user and llama, a friendly chatbot. respond in simple markdown.\n\nUser: Hello!\nLlama:',
n_predict: 100,
stop: ['</s>', 'Llama:', 'User:'],
// n_threads: 4,
},
(data) => {
// This is a partial completion callback
const { token } = data
},
)
console.log('Result:', text)
-
default
: General usage, not support GPU except macOS (Metal) -
vulkan
: Support GPU Vulkan (Windows/Linux), but some scenario might unstable
MIT
Built and maintained by BRICKS.