Kubeflow Training Operator

Overview

Before v1.2 release, tensorflow-operator can only support TFJob on Kubernetes. Starting from v1.3, Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/MXNet/XGBoost jobs on Kubernetes.

For a complete reference of the custom resource definitions, please refer to the API Definition.
For details on API design, please refer to the v1alpha2 design doc.
For details of all-in-one operator design, please refer to the All-in-one Kubeflow Training Operator
For details on its obersibility, please refer to the monitoring design doc.

Prerequisites

Version >= 1.16 of Kubernetes
Version >= 3.x of Kustomize
Version >= 1.21.x of Kubectl

Installation

Master Branch

kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=master"

Specific Release

kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.3.0"

Tensorflow Release Only

For users who prefer to use original tensorflow controllers, please checkout v1.2-branch, we will maintain the bug fix in this branch.

kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.2.0"

Quick Start

Please refer to the quick-start-v1.md and Kubeflow Training User Guide for more information.

API Documentation

Please refer to API Documentation.

Community

You can:

Join our Slack channel.
Check out who is using this operator.

This is a part of Kubeflow, so please see readme in kubeflow/kubeflow to get in touch with the community.

Contributing

Please refer to the DEVELOPMENT

Change Log

Please refer to CHANGELOG

Version Matrix

The following table lists the most recent few versions of the operator.

Operator Version	API Version	Kubernetes Version
`v1.0.x`	`v1`	1.16+
`v1.1.x`	`v1`	1.16+
`v1.2.x`	`v1`	1.16+
`v1.3.x`	`v1`	1.18+
`latest` (master HEAD)	`v1`	1.18+

Name		Name	Last commit message	Last commit date
Latest commit History 710 Commits
.github		.github
build/images/training-operator		build/images/training-operator
cmd/training-operator.v1		cmd/training-operator.v1
docs		docs
examples		examples
hack		hack
manifests		manifests
pkg		pkg
py/kubeflow		py/kubeflow
scripts		scripts
sdk/python		sdk/python
test		test
third_party/library		third_party/library
third_party_licenses		third_party_licenses
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.pylintrc		.pylintrc
.style.yapf		.style.yapf
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
PROJECT		PROJECT
README.md		README.md
go.mod		go.mod
go.sum		go.sum
prow_config.yaml		prow_config.yaml
submit_release_job.sh		submit_release_job.sh
vendor.go		vendor.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kubeflow Training Operator

Overview

Prerequisites

Installation

Master Branch

Specific Release

Tensorflow Release Only

Quick Start

API Documentation

Community

Contributing

Change Log

Version Matrix

About

Releases

Packages

Languages

License

d2iq-archive/tf-operator

Folders and files

Latest commit

History

Repository files navigation

Kubeflow Training Operator

Overview

Prerequisites

Installation

Master Branch

Specific Release

Tensorflow Release Only

Quick Start

API Documentation

Community

Contributing

Change Log

Version Matrix

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages