This version provides a better for multi-gpu calculation. Besides, user can adjust chunk_size_x to have better performance for specific gpu apparatus.
Also, different method is supported here.
This version provides a better for multi-gpu calculation. Besides, user can adjust chunk_size_x to have better performance for specific gpu apparatus.
Also, different method is supported here.