Determined helps deep learning teams train models more quickly, easily share GPU resources, and effectively collaborate. Determined allows deep learning engineers to focus on building and training models at scale, without needing to worry about DevOps or writing custom code for common tasks like fault tolerance or experiment tracking.
You can think of Determined as a platform that bridges the gap between tools like TensorFlow and PyTorch --- which work great for a single researcher with a single GPU --- to the challenges that arise when doing deep learning at scale, as teams, clusters, and data sets all increase in size.
- high-performance distributed training without any additional changes to your model code
- intelligent hyperparameter optimization based on cutting-edge research
- flexible GPU scheduling, including dynamically resizing training jobs on-the-fly and automatic management of cloud resources on AWS and GCP
- built-in experiment tracking, metrics storage, and visualization
- automatic fault tolerance for DL training jobs
- integrated support for TensorBoard and GPU-powered Jupyter notebooks
To use Determined, you can continue using popular DL frameworks such as TensorFlow and PyTorch; you just need to modify your model code to implement the Determined API.
For a brief introduction to using Determined, start with the Quick Start Guide.
To port an existing deep learning model to Determined, follow the tutorial for your preferred deep learning framework:
The documentation for the latest version of Determined can always be found here.
If you need help, want to file a bug report, or just want to keep up-to-date with the latest news about Determined, please join the Determined community!
- Slack is the best place to ask questions about Determined and get support. Click here to join our Slack.
- You can also join the community mailing list to ask questions about the project and receive announcements.
- To report a bug, file an issue on GitHub.
- To report a security issue, email
[email protected]
.