You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
run the code examples/learn/generation/llm-field-guide/mpt/mpt-30b-chatbot.ipynb :
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])
it only returns:
Explain to me the difference between nuclear fission and fusion.
without model's answer
Expected Behavior
The answer to the model can be returned
Steps To Reproduce
import torch
import transformers
from transformers import StoppingCriteria, StoppingCriteriaList
from torch import cuda, bfloat16
device = f'cuda:0' if cuda.is_available() else 'cpu'
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-30b-chat',
trust_remote_code=True,
load_in_8bit=True, # this requires the bitsandbytes library
max_seq_len=8192,
init_device=device,
device_map="auto"
)
model.eval()
#model.to(device)
print(f"Model loaded on {device}")
stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
generate_text = transformers.pipeline(
model=model,
tokenizer=tokenizer,
return_full_text=True, # langchain expects the full text
task='text-generation',
# we pass model parameters here too
stopping_criteria=stopping_criteria, # without this model rambles during chat
temperature=0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 the max
top_p=0.15, # select from top tokens whose probability add up to 15%
top_k=0, # select from top 0 tokens (because zero, relies on top_p)
max_new_tokens=128, # mex number of tokens to generate in the output
repetition_penalty=1.1 # without this output begins repeating
)
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])
Relevant log output
[2023-08-30 17:16:12,664] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-08-30 17:16:13.203919: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Instantiating an MPTForCausalLM model from /root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-30b-chat/54f33278a04aa4e612bca482b82f801ab658e890/modeling_mpt.py
You are using config.init_device='cuda:0', but you can also use config.init_device="meta" with Composer + FSDP for fast initialization.
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|█████████████████████████████████████| 7/7 [01:07<00:00, 9.62s/it]
Model loaded on cuda:0
The model 'MPTForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
Explain to me the difference between nuclear fission and fusion.
Environment
-**OS**: ubuntu20.04
-**Language version**: Python 3.8.16
-**Pinecone client version**: not use
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Is this a new bug?
Current Behavior
run the code examples/learn/generation/llm-field-guide/mpt/mpt-30b-chatbot.ipynb :
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])
it only returns:
Explain to me the difference between nuclear fission and fusion.
without model's answer
Expected Behavior
The answer to the model can be returned
Steps To Reproduce
import torch
import transformers
from transformers import StoppingCriteria, StoppingCriteriaList
from torch import cuda, bfloat16
device = f'cuda:0' if cuda.is_available() else 'cpu'
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-30b-chat',
trust_remote_code=True,
load_in_8bit=True, # this requires the
bitsandbytes
librarymax_seq_len=8192,
init_device=device,
device_map="auto"
)
model.eval()
#model.to(device)
print(f"Model loaded on {device}")
tokenizer = transformers.AutoTokenizer.from_pretrained("mosaicml/mpt-30b-chat")
stop_token_ids = [
tokenizer.convert_tokens_to_ids(x) for x in [
['Human', ':'], ['AI', ':']
]
]
#define custom stopping criteria object
class StopOnTokens(StoppingCriteria):
def call(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
for stop_ids in stop_token_ids:
if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
return True
return False
stopping_criteria = StoppingCriteriaList([StopOnTokens()])
stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]
generate_text = transformers.pipeline(
model=model,
tokenizer=tokenizer,
return_full_text=True, # langchain expects the full text
task='text-generation',
# we pass model parameters here too
stopping_criteria=stopping_criteria, # without this model rambles during chat
temperature=0.1, # 'randomness' of outputs, 0.0 is the min and 1.0 the max
top_p=0.15, # select from top tokens whose probability add up to 15%
top_k=0, # select from top 0 tokens (because zero, relies on top_p)
max_new_tokens=128, # mex number of tokens to generate in the output
repetition_penalty=1.1 # without this output begins repeating
)
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])
Relevant log output
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: