Llama2 quantized model on Inf2 generating nonsense #41

sumaiyah · 2023-09-19T10:08:16Z

I am following the steps (https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb) to run a Llama2 quantized model (https://huggingface.co/TheBloke/Dolphin-Llama2-7B-AWQ) on an AWS inf2 instance (Inf2 8x large)

I can run the code however when I try to generate a sequence I get a nonsense output stream

>>> neuron_model.sample(tokenizer.encode("who is prime minister of uk", return_tensors="pt"), sequence_length=2048, top_k=50, streamer=TextStreamer(tokenizer))

isherтак discoveryLENGrektta mel damтакudoudoisherkl̂LENGifarola melLENG destouselikhudocherิuskkl Hauptumar discovery Malludoikh moduleджа moduleelifudouskswerLENGkl discovery discoveryaltraungsusrLENG КурcherLENGLENGToolsivelivelusrungs Haupt geldig modulesivel modulesусrola discoverydelegate Haupt discoveryugeniture moduleselif›ugen Кеede geldig discovery Schl Mallivel HöheLENG audelegatedelegateusr КеedeLENG› Кур КеdelegateLENGudo›usr Mallrellppen›delegateivel Schldelegate accessibleodgeugenumar destдоваусdelegateToolsklundesede Кур Кеkl Mallugenentityikzdelegate discoveryanzen destusrungsppenentitychioíkíkkldelegate КеLENGrellToolsommenсиingu destLENGaussedeugnougnoppenikzíkLENG Mall auLENGrellikzivelugenkldelegatedelegateftyungsichtsdelegate Кеajuси Höheewусundesaju Курусikzík ensuiteichtsewzna ensuiteAccess discoverydelegate Кеdelegateinguboldmath nucitenusr accessibleedeLENGppenikzdelegateichtsdelegateundiallotikz Ке bon Кур Ке Курrell Schldelegate Schlус Ке MallLENGodgeǧikzкурغ Кеanzenlotppenungsdelegateichtsivel moduledelegatedelegaterellundialLENGinguungsivelichtshtusrdelegatedelegatehirehtichtschiohtdelegateedeغajuingu КеungsenschaftLENGLENGajuкурdelegateсиichtsikzտ MallLENGLENGLENGLENG auichtsси КеaussغewкуркурivelLENG modulesichtsLENGungs主 Кеchkikz主ajuichtsewugenichts nucichtsкур Schldelegate bonкурlotajuusrundialdentкуркурrellغikzugenусrelllotugenLENGinguppenchiochkajuhireкурppenichtsдвиhtanzeníkGRichtsichts Schl bon Schlchkchk nucdelegateichts Schlitenitenдви moduleznaajudelegatelotchkanzenlotἱAccessdelegateLENG nucinguchkitenppenусусdelegateкурдвиусikzundialajuenschaftdelegateznaдвикурichtschio Кеewadalichtsreesichtsտchioкурichtsenschaftichtsrell bonikzlot desc Mallкуркурchioсиadalenschaftinguppenusrhireikzivel Кеikzinguppen descdelegateusrikzichtsznaichtsewchkewrellAccessewichtsichtsдвикурikzznaichtslot Schlew nucíkкур nucAccessкурichtschioдвиivel firing nuc ordchiochkhireус auskeichtsodgeadalкурungsichtsewedeikz bonусewadalchkichtsATA主enschaftewusr
jurкурусppenichtsundialajuichtsLENGenschaftedeewichtsдвиppenichts sl nucchkadalкуркурichtsdelegateikzinguLEFTLEFTдвиchkкурchk bonundialundialadalundial Schlodgechk firing bonedeichts Abbкур desc Ке Schl descundialкурznalot auichts Schlclean Кеclean Mallchkadal reciznaadalundialichts formulachio Mallchioкурclean nucусhireATAichtshire desc desc recidelegatechioichtsichtschklotichtsusrichtsungs主rell Кеchioclean sl nucкуркурichtsadalundiallotGRewсиznaewhire主курewichtsкурсиichtsristichtscleanristichts ordAccessichtschkichtsdelegateungshireundialGRristíkodgeGRungs nucкур descLEFTinguLEFTikz Schlhirerellikzungsundial nucichtsкур AbbусewchioAccessodgeATA ```

aws-rhsoln · 2023-09-19T17:18:16Z

Thank you for reporting, we are trying to reproduce the issue on our end. Can you share the neuron package versions?

sumaiyah · 2023-09-20T08:14:36Z

This is everything installed in the environment

Package Version

absl-py 1.4.0
accelerate 0.23.0
aiohttp 3.8.5
aiosignal 1.3.1
amqp 5.1.1
annotated-types 0.5.0
anyio 3.7.1
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
astroid 2.15.6
asttokens 2.4.0
async-lru 2.0.4
async-timeout 4.0.3
attrs 23.1.0
Automat 22.10.0
aws-neuronx-runtime-discovery 2.9
awscli 1.29.45
Babel 2.12.1
backcall 0.2.0
backports.zoneinfo 0.2.1
beautifulsoup4 4.12.2
billiard 4.1.0
bleach 6.0.0
boto3 1.28.45
botocore 1.31.45
build 1.0.3
cachetools 5.3.1
celery 5.3.4
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 3.2.0
click 8.1.7
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.3.0
cloud-tpu-client 0.10
cloudpickle 2.2.1
cmake 3.27.4.1
colorama 0.4.4
comm 0.1.4
constantly 15.1.0
contourpy 1.1.0
cryptography 41.0.3
cssselect 1.2.0
cycler 0.11.0
dask 2023.5.0
debugpy 1.7.0
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.7
distlib 0.3.7
docutils 0.16
dparse 0.6.3
ec2-metadata 2.10.0
environment-kernels 1.2.0
exceptiongroup 1.1.3
executing 1.2.0
fastapi 0.103.1
fastjsonschema 2.18.0
filelock 3.12.3
fonttools 4.42.1
fqdn 1.5.1
frozenlist 1.4.0
fsspec 2023.9.0
google-api-core 1.34.0
google-api-python-client 1.8.0
google-auth 2.23.0
google-auth-httplib2 0.1.1
googleapis-common-protos 1.60.0
httpie 3.2.2
httplib2 0.22.0
huggingface-hub 0.17.1
hyperlink 21.0.0
idna 3.4
imageio 2.31.3
importlib-metadata 6.8.0
importlib-resources 6.0.1
incremental 22.10.0
iniconfig 2.0.0
ipykernel 6.25.2
ipython 8.12.2
ipython-genutils 0.2.0
ipywidgets 8.1.0
islpy 2023.1
isoduration 20.11.0
isort 5.12.0
itemadapter 0.8.0
itemloaders 1.1.0
jedi 0.19.0
Jinja2 3.1.2
jmespath 1.0.1
joblib 1.3.2
json5 0.9.14
jsonpointer 2.4
jsonschema 4.19.0
jsonschema-specifications 2023.7.1
jupyter 1.0.0
jupyter_client 8.3.1
jupyter-console 6.6.3
jupyter_core 5.3.1
jupyter-events 0.7.0
jupyter-lsp 2.2.0
jupyter_server 2.7.3
jupyter_server_terminals 0.4.4
jupyterlab 4.0.5
jupyterlab-pygments 0.2.2
jupyterlab_server 2.24.0
jupyterlab-widgets 3.0.8
kiwisolver 1.4.5
kombu 5.3.2
lazy-object-proxy 1.9.0
libneuronxla 0.5.476
llvmlite 0.40.1
locket 1.0.0
lockfile 0.12.2
lxml 4.9.3
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.7.3
matplotlib-inline 0.1.6
mccabe 0.7.0
mdurl 0.1.2
mistune 3.0.1
multidict 6.0.4
nbclient 0.8.0
nbconvert 7.8.0
nbformat 5.9.2
nest-asyncio 1.5.7
networkx 2.6.3
neuronx-cc 2.10.0.34+6c8792c6f
neuronx-hwm 2.10.0.5+7b1976adf
notebook 7.0.3
notebook_shim 0.2.3
numba 0.57.1
numpy 1.21.6
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
oauth2client 4.1.3
opencv-python 4.8.0.76
overrides 7.4.0
packaging 21.3
pandas 2.0.3
pandocfilters 1.5.0
parsel 1.8.1
parso 0.8.3
partd 1.4.0
pexpect 4.8.0
pgzip 0.3.5
pickleshare 0.7.5
Pillow 10.0.0
pip 23.2.1
pip-tools 7.3.0
pipenv 2023.2.4
pkg_resources 0.0.0
pkgutil_resolve_name 1.3.10
platformdirs 3.10.0
plotly 5.16.1
pluggy 1.3.0
prometheus-client 0.17.1
prompt-toolkit 3.0.39
Protego 0.3.0
protobuf 3.20.3
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
pydantic 2.3.0
pydantic_core 2.6.3
PyDispatcher 2.0.7
Pygments 2.16.1
pylint 2.17.5
pyOpenSSL 23.2.0
pyparsing 3.1.1
pyproject_hooks 1.0.0
PySocks 1.7.1
pytest 7.4.2
python-daemon 3.0.1
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2023.3.post1
PyYAML 6.0.1
pyzmq 25.1.1
qtconsole 5.4.4
QtPy 2.4.0
queuelib 1.6.2
referencing 0.30.2
regex 2023.8.8
requests 2.31.0
requests-file 1.5.1
requests-toolbelt 1.0.0
requests-unixsocket 0.3.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.5.2
rpds-py 0.10.2
rsa 4.7.2
ruamel.yaml 0.17.32
ruamel.yaml.clib 0.2.7
s3transfer 0.6.2
safetensors 0.3.3
scikit-learn 1.3.0
scipy 1.7.3
Scrapy 2.10.1
seaborn 0.12.2
Send2Trash 1.8.2
sentencepiece 0.1.99
service-identity 23.1.0
setuptools 68.2.1
shap 0.42.1
six 1.16.0
slicer 0.0.7
sniffio 1.3.0
soupsieve 2.5
stack-data 0.6.2
starlette 0.27.0
tenacity 8.2.3
terminado 0.17.1
threadpoolctl 3.2.0
tinycss2 1.2.1
tldextract 3.5.0
tokenizers 0.13.3
tomli 2.0.1
tomlkit 0.12.1
toolz 0.12.0
torch 1.13.1
torch-neuronx 1.13.1.1.11.0
torch-xla 1.13.1+torchneuronb
torchvision 0.14.1
tornado 6.3.3
tqdm 4.66.1
traitlets 5.9.0
transformers 4.33.1
transformers-neuronx 0.7.84
Twisted 22.10.0
typing_extensions 4.7.1
tzdata 2023.3
uri-template 1.3.0
uritemplate 3.0.1
urllib3 1.26.16
vine 5.0.0
virtualenv 20.24.5
virtualenv-clone 0.5.7
w3lib 2.1.2
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.6.3
wheel 0.41.2
widgetsnbextension 4.0.8
wrapt 1.15.0
yarl 1.9.2
zipp 3.16.2
zope.interface 6.0

aws-rhsoln · 2023-09-22T22:17:26Z

Hello @sumaiyah , we tried to get the quantized checkpoint from the link you sent, however, we were not successful. For such accuracy debug, we would need the checkpoint. Is it possible to share the checkpoint and the script at this email: [email protected] . This would make the debug faster for us.

sumaiyah · 2023-10-02T10:46:45Z

@aws-rhsoln sent

enochlev · 2024-02-11T02:53:13Z

@sumaiyah how did you compile the model... any special arguments for awq?

aws-donkrets · 2024-05-27T22:46:04Z

Hi @sumaiyah - This model uses the quantization algorithm called AWQ which is currently not supported in TnX. Is it possible to use the standard LLaMa 2 7B weights for your use-case?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama2 quantized model on Inf2 generating nonsense #41

Llama2 quantized model on Inf2 generating nonsense #41

sumaiyah commented Sep 19, 2023

aws-rhsoln commented Sep 19, 2023

sumaiyah commented Sep 20, 2023

aws-rhsoln commented Sep 22, 2023

sumaiyah commented Oct 2, 2023

enochlev commented Feb 11, 2024

aws-donkrets commented May 27, 2024

Llama2 quantized model on Inf2 generating nonsense #41

Llama2 quantized model on Inf2 generating nonsense #41

Comments

sumaiyah commented Sep 19, 2023

aws-rhsoln commented Sep 19, 2023

sumaiyah commented Sep 20, 2023

aws-rhsoln commented Sep 22, 2023

sumaiyah commented Oct 2, 2023

enochlev commented Feb 11, 2024

aws-donkrets commented May 27, 2024