[Core feature] Improve flytekitplugins-kfpytorch user experience with default pod template and other reasonable defaults #5339
Labels
backlogged
For internal use. Reserved for contributor team workflow.
enhancement
New feature or request
Motivation: Why do you think this is important?
Currently, to use it with pytorch distributed data parallel with multiple nodees, you need to manually specify a custom pod template like so:
Needing to know about adding a shared memory volume and timeout needed for nodes to connect with each other at task startup adds a lot of burden to using this plugin.
Goal: What should the final outcome look like, ideally?
If the
Elastic
task config could expose some options with reasonable defaults that help the user understand the following:An example might be:
Where the
Elastic
class would be initialized with some default pod template:Describe alternatives you've considered
Another way to solve this problem is with documentation, but this burdens the user to discover the docs and add boilerplate to their code.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: