Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion of YOLOv8x torchscript to torch_neuron failing #86

Open
Harish-Sundaravel opened this issue Sep 17, 2024 · 1 comment
Open

Comments

@Harish-Sundaravel
Copy link

I’m encountering issues when trying to convert my YOLOv8x model from torchscript to torch_neuron on Kaggle. Here are the details:

  1. YOLOv8x Model (Single Class):
  • Trained model file: '.pt'
  • Conversion to torchscript: Successful
  • Conversion from torchscript to torch_neuron: Completed in 15 minutes, consuming approximately 19GB of RAM.
  1. YOLOv8x Model (Two Classes):
  • Trained model file: '.pt'
  • Conversion to torchscript: Successful
  • Conversion from torchscript to torch_neuron: Takes about 1.5 hours and causes RAM usage to spike to 205GB, ultimately failing.

Code Used for Conversion:

  1. Converting .pt file to torchscript:
    model = YOLO('my_yolov8x.pt')
    [# Attempting Half Precision:
    model.model.half() # Converts from torch.float32 to torch.float16] - tried this too!
    model.export(format='torchscript', imgsz=1024) #creates my_yolov8x.torchscript file

  2. Converting torchscript to torch_neuron:
    model = torch.jit.load("/kaggle/input/my_yolov8x.torchscript")
    model = model.float().eval()

example_input = torch.rand(1, 3, 1024, 1024)
neuron_model = torch_neuron.trace(model, example_input)
neuron_model.save('my_yolov8x.neuron')

Problem:
When converting the model trained with two classes, the process is extremely slow and consumes excessive memory, resulting in failure.

If anyone has insights or solutions to address this issue, your help would be greatly appreciated.

Thank you in advance!

@jyang-aws
Copy link
Contributor

@Harish-Sundaravel
thanks for reporting the problem. What's the instance type you used in this case and any chance you can try with a bigger instance?
For us to take a deeper look and duplicate the issue, any chance you can share a proxy model and error message you encountered?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants