Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2 quantized model on Inf2 generating nonsense #41

Open
sumaiyah opened this issue Sep 19, 2023 · 6 comments
Open

Llama2 quantized model on Inf2 generating nonsense #41

sumaiyah opened this issue Sep 19, 2023 · 6 comments

Comments

@sumaiyah
Copy link

I am following the steps (https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb) to run a Llama2 quantized model (https://huggingface.co/TheBloke/Dolphin-Llama2-7B-AWQ) on an AWS inf2 instance (Inf2 8x large)

I can run the code however when I try to generate a sequence I get a nonsense output stream

>>> neuron_model.sample(tokenizer.encode("who is prime minister of uk", return_tensors="pt"), sequence_length=2048, top_k=50, streamer=TextStreamer(tokenizer))

isherтак discoveryLENGrektta mel damтакudoudoisherkl̂LENGifarola melLENG destouselikhudocherิuskkl Hauptumar discovery Malludoikh moduleджа moduleelifudouskswerLENGkl discovery discoveryaltraungsusrLENG КурcherLENGLENGToolsivelivelusrungs Haupt geldig modulesivel modulesусrola discoverydelegate Haupt discoveryugeniture moduleselifugen Кеede geldig discovery Schl Mallivel HöheLENG audelegatedelegateusr КеedeLENGКур КеdelegateLENGudousr Mallrellppendelegateivel Schldelegate accessibleodgeugenumar destдоваусdelegateToolsklundesede Кур Кеkl Mallugenentityikzdelegate discoveryanzen destusrungsppenentitychioíkíkkldelegate КеLENGrellToolsommenсиingu destLENGaussedeugnougnoppenikzíkLENG Mall auLENGrellikzivelugenkldelegatedelegateftyungsichtsdelegate Кеajuси Höheewусundesaju Курусikzík ensuiteichtsewzna ensuiteAccess discoverydelegate Кеdelegateinguboldmath nucitenusr accessibleedeLENGppenikzdelegateichtsdelegateundiallotikz Ке bon Кур Ке Курrell Schldelegate Schlус Ке MallLENGodgeǧikzкурغ Кеanzenlotppenungsdelegateichtsivel moduledelegatedelegaterellundialLENGinguungsivelichtshtusrdelegatedelegatehirehtichtschiohtdelegateedeغajuingu КеungsenschaftLENGLENGajuкурdelegateсиichtsikzտ MallLENGLENGLENGLENG auichtsси КеaussغewкуркурivelLENG modulesichtsLENGungs主 Кеchkikz主ajuichtsewugenichts nucichtsкур Schldelegate bonкурlotajuusrundialdentкуркурrellغikzugenусrelllotugenLENGinguppenchiochkajuhireкурppenichtsдвиhtanzeníkGRichtsichts Schl bon Schlchkchk nucdelegateichts Schlitenitenдви moduleznaajudelegatelotchkanzenlotἱAccessdelegateLENG nucinguchkitenppenусусdelegateкурдвиусikzundialajuenschaftdelegateznaдвикурichtschio Кеewadalichtsreesichtsտchioкурichtsenschaftichtsrell bonikzlot desc Mallкуркурchioсиadalenschaftinguppenusrhireikzivel Кеikzinguppen descdelegateusrikzichtsznaichtsewchkewrellAccessewichtsichtsдвикурikzznaichtslot Schlew nucíkкур nucAccessкурichtschioдвиivel firing nuc ordchiochkhireус auskeichtsodgeadalкурungsichtsewedeikz bonусewadalchkichtsATA主enschaftewusr
jurкурусppenichtsundialajuichtsLENGenschaftedeewichtsдвиppenichts sl nucchkadalкуркурichtsdelegateikzinguLEFTLEFTдвиchkкурchk bonundialundialadalundial Schlodgechk firing bonedeichts Abbкур desc Ке Schl descundialкурznalot auichts Schlclean Кеclean Mallchkadal reciznaadalundialichts formulachio Mallchioкурclean nucусhireATAichtshire desc desc recidelegatechioichtsichtschklotichtsusrichtsungs主rell Кеchioclean sl nucкуркурichtsadalundiallotGRewсиznaewhire主курewichtsкурсиichtsristichtscleanristichts ordAccessichtschkichtsdelegateungshireundialGRristíkodgeGRungs nucкур descLEFTinguLEFTikz Schlhirerellikzungsundial nucichtsкур AbbусewchioAccessodgeATA ```
@aws-rhsoln
Copy link
Contributor

Thank you for reporting, we are trying to reproduce the issue on our end. Can you share the neuron package versions?

@sumaiyah
Copy link
Author

This is everything installed in the environment

Package Version


absl-py 1.4.0
accelerate 0.23.0
aiohttp 3.8.5
aiosignal 1.3.1
amqp 5.1.1
annotated-types 0.5.0
anyio 3.7.1
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
astroid 2.15.6
asttokens 2.4.0
async-lru 2.0.4
async-timeout 4.0.3
attrs 23.1.0
Automat 22.10.0
aws-neuronx-runtime-discovery 2.9
awscli 1.29.45
Babel 2.12.1
backcall 0.2.0
backports.zoneinfo 0.2.1
beautifulsoup4 4.12.2
billiard 4.1.0
bleach 6.0.0
boto3 1.28.45
botocore 1.31.45
build 1.0.3
cachetools 5.3.1
celery 5.3.4
certifi 2023.7.22
cffi 1.15.1
charset-normalizer 3.2.0
click 8.1.7
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.3.0
cloud-tpu-client 0.10
cloudpickle 2.2.1
cmake 3.27.4.1
colorama 0.4.4
comm 0.1.4
constantly 15.1.0
contourpy 1.1.0
cryptography 41.0.3
cssselect 1.2.0
cycler 0.11.0
dask 2023.5.0
debugpy 1.7.0
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.7
distlib 0.3.7
docutils 0.16
dparse 0.6.3
ec2-metadata 2.10.0
environment-kernels 1.2.0
exceptiongroup 1.1.3
executing 1.2.0
fastapi 0.103.1
fastjsonschema 2.18.0
filelock 3.12.3
fonttools 4.42.1
fqdn 1.5.1
frozenlist 1.4.0
fsspec 2023.9.0
google-api-core 1.34.0
google-api-python-client 1.8.0
google-auth 2.23.0
google-auth-httplib2 0.1.1
googleapis-common-protos 1.60.0
httpie 3.2.2
httplib2 0.22.0
huggingface-hub 0.17.1
hyperlink 21.0.0
idna 3.4
imageio 2.31.3
importlib-metadata 6.8.0
importlib-resources 6.0.1
incremental 22.10.0
iniconfig 2.0.0
ipykernel 6.25.2
ipython 8.12.2
ipython-genutils 0.2.0
ipywidgets 8.1.0
islpy 2023.1
isoduration 20.11.0
isort 5.12.0
itemadapter 0.8.0
itemloaders 1.1.0
jedi 0.19.0
Jinja2 3.1.2
jmespath 1.0.1
joblib 1.3.2
json5 0.9.14
jsonpointer 2.4
jsonschema 4.19.0
jsonschema-specifications 2023.7.1
jupyter 1.0.0
jupyter_client 8.3.1
jupyter-console 6.6.3
jupyter_core 5.3.1
jupyter-events 0.7.0
jupyter-lsp 2.2.0
jupyter_server 2.7.3
jupyter_server_terminals 0.4.4
jupyterlab 4.0.5
jupyterlab-pygments 0.2.2
jupyterlab_server 2.24.0
jupyterlab-widgets 3.0.8
kiwisolver 1.4.5
kombu 5.3.2
lazy-object-proxy 1.9.0
libneuronxla 0.5.476
llvmlite 0.40.1
locket 1.0.0
lockfile 0.12.2
lxml 4.9.3
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.7.3
matplotlib-inline 0.1.6
mccabe 0.7.0
mdurl 0.1.2
mistune 3.0.1
multidict 6.0.4
nbclient 0.8.0
nbconvert 7.8.0
nbformat 5.9.2
nest-asyncio 1.5.7
networkx 2.6.3
neuronx-cc 2.10.0.34+6c8792c6f
neuronx-hwm 2.10.0.5+7b1976adf
notebook 7.0.3
notebook_shim 0.2.3
numba 0.57.1
numpy 1.21.6
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
oauth2client 4.1.3
opencv-python 4.8.0.76
overrides 7.4.0
packaging 21.3
pandas 2.0.3
pandocfilters 1.5.0
parsel 1.8.1
parso 0.8.3
partd 1.4.0
pexpect 4.8.0
pgzip 0.3.5
pickleshare 0.7.5
Pillow 10.0.0
pip 23.2.1
pip-tools 7.3.0
pipenv 2023.2.4
pkg_resources 0.0.0
pkgutil_resolve_name 1.3.10
platformdirs 3.10.0
plotly 5.16.1
pluggy 1.3.0
prometheus-client 0.17.1
prompt-toolkit 3.0.39
Protego 0.3.0
protobuf 3.20.3
psutil 5.9.5
ptyprocess 0.7.0
pure-eval 0.2.2
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
pydantic 2.3.0
pydantic_core 2.6.3
PyDispatcher 2.0.7
Pygments 2.16.1
pylint 2.17.5
pyOpenSSL 23.2.0
pyparsing 3.1.1
pyproject_hooks 1.0.0
PySocks 1.7.1
pytest 7.4.2
python-daemon 3.0.1
python-dateutil 2.8.2
python-json-logger 2.0.7
pytz 2023.3.post1
PyYAML 6.0.1
pyzmq 25.1.1
qtconsole 5.4.4
QtPy 2.4.0
queuelib 1.6.2
referencing 0.30.2
regex 2023.8.8
requests 2.31.0
requests-file 1.5.1
requests-toolbelt 1.0.0
requests-unixsocket 0.3.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.5.2
rpds-py 0.10.2
rsa 4.7.2
ruamel.yaml 0.17.32
ruamel.yaml.clib 0.2.7
s3transfer 0.6.2
safetensors 0.3.3
scikit-learn 1.3.0
scipy 1.7.3
Scrapy 2.10.1
seaborn 0.12.2
Send2Trash 1.8.2
sentencepiece 0.1.99
service-identity 23.1.0
setuptools 68.2.1
shap 0.42.1
six 1.16.0
slicer 0.0.7
sniffio 1.3.0
soupsieve 2.5
stack-data 0.6.2
starlette 0.27.0
tenacity 8.2.3
terminado 0.17.1
threadpoolctl 3.2.0
tinycss2 1.2.1
tldextract 3.5.0
tokenizers 0.13.3
tomli 2.0.1
tomlkit 0.12.1
toolz 0.12.0
torch 1.13.1
torch-neuronx 1.13.1.1.11.0
torch-xla 1.13.1+torchneuronb
torchvision 0.14.1
tornado 6.3.3
tqdm 4.66.1
traitlets 5.9.0
transformers 4.33.1
transformers-neuronx 0.7.84
Twisted 22.10.0
typing_extensions 4.7.1
tzdata 2023.3
uri-template 1.3.0
uritemplate 3.0.1
urllib3 1.26.16
vine 5.0.0
virtualenv 20.24.5
virtualenv-clone 0.5.7
w3lib 2.1.2
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.6.3
wheel 0.41.2
widgetsnbextension 4.0.8
wrapt 1.15.0
yarl 1.9.2
zipp 3.16.2
zope.interface 6.0

@aws-rhsoln
Copy link
Contributor

Hello @sumaiyah , we tried to get the quantized checkpoint from the link you sent, however, we were not successful. For such accuracy debug, we would need the checkpoint. Is it possible to share the checkpoint and the script at this email: [email protected] . This would make the debug faster for us.

@sumaiyah
Copy link
Author

sumaiyah commented Oct 2, 2023

@aws-rhsoln sent

@enochlev
Copy link

@sumaiyah how did you compile the model... any special arguments for awq?

@aws-donkrets
Copy link

Hi @sumaiyah - This model uses the quantization algorithm called AWQ which is currently not supported in TnX. Is it possible to use the standard LLaMa 2 7B weights for your use-case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants