Griffin Jax

A hybrid model that mixes gated linear recurrences with local attention.

Griffin-3B outperforms Mamba-3B, and Griffin-7B and Griffin-14B achieve performance competitive with Llama-2, despite being trained on nearly 7 times fewer tokens.
Griffin can extrapolate on sequences significantly longer than those seen during training.

[] Usage and training code will be added to the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
docs		docs
gitops		gitops
griffin_jax		griffin_jax
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup-requirements.txt		setup-requirements.txt
setup.py		setup.py

Provide feedback