Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About training at the 31st epoch #4

Open
BLUE-hub opened this issue Aug 23, 2023 · 3 comments
Open

About training at the 31st epoch #4

BLUE-hub opened this issue Aug 23, 2023 · 3 comments

Comments

@BLUE-hub
Copy link

Hello and thank you for your work. May I ask if you have encountered the following problem: a situation where training on NPM3D dataset gets stuck at the 31st epoch and does not continue. I have tried turning down the batch_size as well as changing the numwork to 0 and still have the same problem. Do you have a solution for this please?

@bxiang233
Copy link
Collaborator

Hello and thank you for your work. May I ask if you have encountered the following problem: a situation where training on NPM3D dataset gets stuck at the 31st epoch and does not continue. I have tried turning down the batch_size as well as changing the numwork to 0 and still have the same problem. Do you have a solution for this please?

Hi, thanks a lot for your interest on our work! 31st is the first epoch including the ScoreNet. I also meet the problem when I run it on a GPU with small size. Using a smaller batch_size or a smaller radius size of the sampling cylinders could solve the problem. If not, could you please provide more infos about the problem? Like what GPU did you use? What command and what dataset did you use for training? Thanks.

Best,
Binbin

@BLUE-hub
Copy link
Author

Thanks for your reply,I tried again and it produces the following error. My training data is NPM3D,GPU is rtx3090.Is there a solution, thanks.
File "D:/PanopticSegForLargeScalePointCloud-main/train.py", line 17, in main
trainer.train()
File "D:\PanopticSegForLargeScalePointCloud-main\torch_points3d\trainer.py", line 152, in train
self._train_epoch(epoch)
File "D:\PanopticSegForLargeScalePointCloud-main\torch_points3d\trainer.py", line 207, in _train_epoch
self._model.optimize_parameters2(epoch, i, self._dataset.batch_size)
File "D:\PanopticSegForLargeScalePointCloud-main\torch_points3d\models\base_model.py", line 274, in optimize_parameters2
self._grad_scale.step(self._optimizer) # update parameters
AttributeError: 'NoneType' object has no attribute 'step'

@bxiang233
Copy link
Collaborator

Thanks for your reply,I tried again and it produces the following error. My training data is NPM3D,GPU is rtx3090.Is there a solution, thanks. File "D:/PanopticSegForLargeScalePointCloud-main/train.py", line 17, in main trainer.train() File "D:\PanopticSegForLargeScalePointCloud-main\torch_points3d\trainer.py", line 152, in train self._train_epoch(epoch) File "D:\PanopticSegForLargeScalePointCloud-main\torch_points3d\trainer.py", line 207, in _train_epoch self._model.optimize_parameters2(epoch, i, self._dataset.batch_size) File "D:\PanopticSegForLargeScalePointCloud-main\torch_points3d\models\base_model.py", line 274, in optimize_parameters2 self._grad_scale.step(self._optimizer) # update parameters AttributeError: 'NoneType' object has no attribute 'step'

Hi, maybe this issue can give you a solution?
torch-points3d/torch-points3d#676 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants