Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target prune FLOPs ratio #4

Open
King4819 opened this issue Apr 1, 2024 · 12 comments
Open

Target prune FLOPs ratio #4

King4819 opened this issue Apr 1, 2024 · 12 comments

Comments

@King4819
Copy link

King4819 commented Apr 1, 2024

Excellent work !!! I want to ask whether the method is able to decide different prune FLOPs ratio. For example, I want to perform different level of pruning, such as prune FLOPs 30%, 60%, 90%, respectively. Thanks !

@Charleshhy
Copy link
Collaborator

Charleshhy commented Apr 1, 2024

Hi King4819,

Thanks for your interest in our work! The level of pruning depends on the hyper-parameters target flops and **loss_lambda** controlling the expected remaining flops and how strict you want the pruned model to be close to your target flops, respectively. These hyper-parameters are set in config files, e.g., L12-13 here.

Regards,
Haoyu

@King4819
Copy link
Author

King4819 commented Apr 2, 2024

@Charleshhy thanks for your reply. Is there a proper way to set hypaparameter theta when the target FLOPs is change? how do I know which value to set?
Thanks!!

@Charleshhy
Copy link
Collaborator

Charleshhy commented Apr 2, 2024

@Charleshhy thanks for your reply. Is there a proper way to set hypaparameter theta when the target FLOPs is change? how do I know which value to set? Thanks!!

In practice, I set these two values to let the FLOPs loss slightly higher than the cross-entropy loss at the beginning of training and find it works well. However, I didn't experiment much about the trade-off between these two losses and different settings may lead to better performance :)

@King4819
Copy link
Author

King4819 commented Apr 2, 2024

@Charleshhy Thanks for your reply. I want to ask it more explicitly. For example, I want to prune DeiT-S model at three different level: prune FLOPs ratio 30%, prune FLOPs ratio 60% and prune FLOPs ratio 90%. How do I alter hypaparameter theta for these three target FLOPs ? Or this is an unexperimented direction ?

@Charleshhy
Copy link
Collaborator

@Charleshhy Thanks for your reply. I want to ask it more explicitly. For example, I want to prune DeiT-S model at three different level: prune FLOPs ratio 30%, prune FLOPs ratio 60% and prune FLOPs ratio 90%. How do I alter hypaparameter theta for these three target FLOPs ? Or this is an unexperimented direction ?

Different target FLOPs would lead to different FLOPs loss and we need to adjust \theta to hack its value to be slightly higher than the cross-entropy loss in practice. Note that pruning 90% FLOPs is too aggressive and I have not tried it :)

@King4819
Copy link
Author

King4819 commented Apr 3, 2024

@Charleshhy Thanks for your reply!

@bo102
Copy link

bo102 commented Apr 7, 2024

Hello, I would like to ask you about the "theta": 1.5 in swin-transformer pruning, what does it mean?what do theta and 1.5 mean?In addition, the recognition accuracy is relatively low in the search process?ACC1% is only 10% in the number of search rounds about 40, is this normal?(I set theta to 0.5 in this process, and the target_flops is 2.9)

@Charleshhy
Copy link
Collaborator

Hi King4819,

Thanks for your interest in our work! The level of pruning depends on the hyper-parameters target flops and **loss_lambda** controlling the expected remaining flops and how strict you want the pruned model to be close to your target flops, respectively. These hyper-parameters are set in config files, e.g., L12-13 here.

Regards, Haoyu

@King4819 I made a mistake in explaining the hyper-parameters and have corrected it just now. loss lambda controls the importance of INSTEAD OF theta. theta here is another hyper-parameter to initialize the learnable gates to select the best options. Higher theta makes your model less likely to select convolutional operations and we empirically find 1.5 is a globally okay value for all settings.

Sorry for the confusion.

@Charleshhy
Copy link
Collaborator

Hello, I would like to ask you about the "theta": 1.5 in swin-transformer pruning, what does it mean?what do theta and 1.5 mean?In addition, the recognition accuracy is relatively low in the search process?ACC1% is only 10% in the number of search rounds about 40, is this normal?(I set theta to 0.5 in this process, and the target_flops is 2.9)

Hi Bo102, theta is introduced here. In my experiments, I expect the accuracy during the search to not drop too much (say only slightly lower than the dense model) so 10% of it is not normal. Try to reduce the hyper-parameter loss_lambda and make your architecture evolve slower.

@bo102
Copy link

bo102 commented Apr 9, 2024

您好,我想问您关于“theta”的问题:swin-transformer 修剪中的 1.5,这是什么意思?theta 和 1.5 是什么意思?另外,在搜索过程中识别准确率比较低?ACC1%在40轮左右的搜索轮数中只有10%,这正常吗?(在这个过程中,我将θ设置为0.5,target_flops为2.9)

嗨,Bo102,在这里介绍。在我的实验中,我希望搜索过程中的准确率不会下降太多(比如说只比密集模型略低),所以 10% 是不正常的。尝试减少超参数,并使架构演进速度变慢。theta``loss_lambda

Hello, excuse me, I used your configuration to search for the swin-transformer pruning architecture, I just modified the dataset to imagenet-tiny-200, and the accuracy is still very low during the search process. Is this due to the fact that I need to modify the parameters in the configuration file according to my actual data set? What is the reason for this? Also, I use the above searched files to guide pruning, and the generated model, I can't improve the accuracy of the training level, and it has been hovering around 50%.

@Charleshhy
Copy link
Collaborator

您好,我想问您关于“theta”的问题:swin-transformer 修剪中的 1.5,这是什么意思?theta 和 1.5 是什么意思?另外,在搜索过程中识别准确率比较低?ACC1%在40轮左右的搜索轮数中只有10%,这正常吗?(在这个过程中,我将θ设置为0.5,target_flops为2.9)

嗨,Bo102,在这里介绍。在我的实验中,我希望搜索过程中的准确率不会下降太多(比如说只比密集模型略低),所以 10% 是不正常的。尝试减少超参数,并使架构演进速度变慢。 thetaloss_lambda ``

Hello, excuse me, I used your configuration to search for the swin-transformer pruning architecture, I just modified the dataset to imagenet-tiny-200, and the accuracy is still very low during the search process. Is this due to the fact that I need to modify the parameters in the configuration file according to my actual data set? What is the reason for this? Also, I use the above searched files to guide pruning, and the generated model, I can't improve the accuracy of the training level, and it has been hovering around 50%.

During searching, a low accuracy means the model searched a trivial solution and that won't have good performance. My suggestion is to set a lower loss_lambda and/or learning rate during searching.

@bo102
Copy link

bo102 commented Apr 10, 2024

您好,我想问您关于“theta”的问题:swin-transformer 修剪中的 1.5,这是什么意思?theta 和 1.5 是什么意思?另外,在搜索过程中识别准确率比较低?ACC1%在40轮左右的搜索轮数中只有10%,这正常吗?(在这个过程中,我将θ设置为0.5,target_flops为2.9)

嗨,Bo102,在这里介绍。在我的实验中,我希望搜索过程中的准确率不会下降太多(比如说只比密集模型略低),所以 10% 是不正常的。尝试减少超参数,并使架构演进速度变慢。loss_lambda '' theta

您好,请问,我使用您的配置搜索了 swin-transformer 修剪架构,我刚刚将数据集修改为 imagenet-tiny-200,搜索过程中准确率仍然很低。这是因为我需要根据我的实际数据集修改配置文件中的参数吗?这是什么原因?另外,我使用上面搜索到的文件来指导修剪,而生成的模型,我无法提高训练级别的准确率,它一直在 50% 左右徘徊。

在搜索过程中,低准确率意味着模型搜索了一个微不足道的解决方案,并且不会有良好的性能。我的建议是在搜索过程中设置较低的和/或学习率。loss_lambda

Thank you so much, I wish you all the best, I'll give it a try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants