Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-compatibility of the classification head for gpu on zeroshot classification task #33

Open
sandhyat opened this issue Jun 25, 2024 · 0 comments

Comments

@sandhyat
Copy link

Hello,

Thank you for providing the code for your work. When using a pre-trained Moment for zero-shot or fine-tuning classification, my code is erroring out with trace pointing to tensors being on two devices. I confirmed that the inputs are on cuda. I found the exact lines (66 and 68 in https://github.com/moment-timeseries-foundation-model/moment/blob/main/momentfm/models/moment.py#L54 ) where this is happening. If I move this linear layer to 'cuda' device explicitly then the code works fine.
Following is a code snippet that I have been using.

model = MOMENTPipeline.from_pretrained(
    "AutonLab/MOMENT-1-large",
    model_kwargs={
        'task_name': 'classification',
        'n_channels': 69,
        'num_class': 2
    },  # We are loading the model in classification mode
).to("cuda").float()
model.init()

def get_logits(model, dataloader):
    logits_list = []
    with torch.no_grad():
        for batch_x, batch_masks, _ in tqdm(dataloader, total=len(dataloader)):
            batch_x = batch_x.to("cuda").float()
            batch_masks = batch_masks.to("cuda")

            output = model(batch_x, input_mask=batch_masks)  # [batch_size x d_model (=1024)]
            logit = output.logits
            logits_list.append(logit.detach().cpu().numpy())
    logits_list = np.concatenate(logits_list)
    return logits_list

output_flow_logit_test = get_logits(model, dataloader_flow_test)
Loading data... > /pre_wkdir/train_modular_Moments.py(569)get_logits()
-> for batch_x, batch_masks, _ in tqdm(dataloader, total=len(dataloader)):
(Pdb) n
  0%|                                                                                                                                                                                       | 0/19 [00:00<?, ?it/s]
> /pre_wkdir/train_modular_Moments.py(570)get_logits()
-> batch_x = batch_x.to("cuda").float()
(Pdb) n
> /pre_wkdir/train_modular_Moments.py(571)get_logits()
-> batch_masks = batch_masks.to("cuda")
(Pdb) n
> /pre_wkdir/train_modular_Moments.py(573)get_logits()
-> output = model(batch_x, input_mask=batch_masks)  # [batch_size x d_model (=1024)]
(Pdb) batch_x.is_cuda
True
(Pdb) batch_masks.is_cuda
True
(Pdb) n
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
> /pre_wkdir/train_modular_Moments.py(573)get_logits()
-> output = model(batch_x, input_mask=batch_masks)  # [batch_size x d_model (=1024)]

I would appreciate it if you could comment on this based on your experience of your code development.

Thanks,
Sandhya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant