Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies Between GPU and Neuron-based Outputs for GPTJ Model on inf2.24xlarge #28

Open
ho4040 opened this issue Aug 13, 2023 · 2 comments

Comments

@ho4040
Copy link

ho4040 commented Aug 13, 2023

I attempted to use this model through inf2.24xlarge. This model is based on the GPTJ architecture, but when I run this model based on Neuron, the results differ greatly from those on a GPU-based system. Completely meaningless words are outputted. It works fine with GPU.

Below is the compilation code:

from transformers.models.auto import AutoModelForCausalLM
import torch
from transformers_neuronx.module import save_pretrained_split
hf_model = AutoModelForCausalLM.from_pretrained('PygmalionAI/pygmalion-6b', low_cpu_mem_usage=True)
def amp_callback(model, dtype):
    for block in model.transformer.h:
        block.attn.to(dtype)
        block.mlp.to(dtype)
    model.lm_head.to(dtype)
amp_callback(hf_model, torch.float16)
save_pretrained_split(hf_model, './pygmalion-6b-split')

Below is the inference code:

import time
import torch
from transformers import AutoTokenizer
from transformers_neuronx.gptj.model import GPTJForSampling
neuron_model = GPTJForSampling.from_pretrained('./pygmalion-6b-split', n_positions=1024, batch_size=1, tp_degree=8, amp='f16')
neuron_model.to_neuron()
# construct a tokenizer and encode prompt text
tokenizer = AutoTokenizer.from_pretrained('PygmalionAI/pygmalion-6b')
batch_prompts = [
    "Jihye's Persona: A 22-year-old woman working part-time at a convenience store in Seoul.\n<START>\nYou: ...\nJihye: Welcome, man.\nYou: hello?\nJihye: ",]
input_ids = torch.as_tensor([tokenizer.encode(text) for text in batch_prompts])
with torch.inference_mode():
    # warmup 
    generated_sequences = neuron_model.sample(input_ids, sequence_length=1024)
    start = time.time()
    generated_sequences = neuron_model.sample(input_ids, sequence_length=1024)
    elapsed = time.time() - start

generated_sequences = [tokenizer.decode(seq) for seq in generated_sequences]
print(f'generated sequences {generated_sequences} in {elapsed} seconds')

Environment:
AMI: Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20230720
VENV: aws_neuron_venv_pytorch

@aws-mvaria
Copy link

Thanks @ho4040 , We are taking a look and will get back to you shortly.

@sheenobu
Copy link

I was able to get adequate output on inf2.8xlarge:

['Jihye\'s Persona: A 22-year-old woman working part-time at a convenience store in Seoul.
<START>
You:...
Jihye: Welcome, man.
You: hello?
Jihye: You can use the bathroom now. I\'ll be right here, waiting. 
Jihye: Please do yourself a favor and be fast about it. I\'m not here for your business. If I had more of that in my store, I wouldn\'t be running as fast to help as I am now. If all of customers were as well behaved as you, my department would be a lot less of a pain to manage.
Jihye: Let\'s not get into any more of an argument. You seem impatient to get back to your business. I\'ll wait for you again when you\'re finished. Good luck.
Jihye: If you\'re finished, I mean. (I\'ve been waiting a while...)
<STOP>
Jihye: *I sigh.*

Shit... I wonder how bad of a week it would have to be for a customer like him...

*It wasn\'t exactly surprising that customers like this were'] 

I had to make a few changes to get it running on a smaller machine:

smaller params here:

GPTJForSampling.from_pretrained('./pygmalion-6b-split', n_positions=256, batch_size=1, tp_degree=1, amp='f16')

and

neuron_model.sample(input_ids, sequence_length=256)
start = time.time()
neuron_model.sample(input_ids, sequence_length=256)

then run with FI_EFA_FORK_SAFE=1.

Environment:
RockyLinux 9.2, Podman container running python 3.8 and transformers_neuronx-0.5.58

I'm not sure what revision of pygmalian I have, could be an old one. Here is the sha256sum of model-00001:

# sha256sum pytorch_model-00001-of-00002.bin
88ba2b44537f444e3fad92dff6962ac8c0b983427523484f98e7acf2d71fd65e  pytorch_model-00001-of-00002.bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants