Skip to content

Concepts

Sarah Maddox edited this page Nov 6, 2018 · 5 revisions

Below are explanations of the most important concepts used within the Kubeflow pipelines service.

Pipeline

A description of a machine learning (ML) workflow, including all of its different components, and how they come together in the form of a graph, as well as a list of the parameters. This is the main shareable artifact in the Kubeflow pipelines service. You create and edit a pipeline separately from the pipelines UI, although you can upload and list pipelines on the UI.

Pipeline component

A building block in the pipeline template; self-contained user code that performs one step in the pipeline, such as preprocessing, transformation, training… etc. A component must be packaged as a Docker image.

Job

A copy of the pipeline with all fields (parameters) specified, plus an optional pipeline trigger. The pipelines system generates a job when you deploy the pipeline. A job with a recurring trigger runs periodically. You can enable/disable the trigger from the UI.

Run

A single execution of a pipeline job. A job with a certain trigger type causes multiple runs to start. Runs comprise an immutable log of all experiments that you attempt. Runs are designed to be self-contained to allow for reproducibility.

Pipeline trigger

You select one of multiple types of triggers to tell the system when a job should schedule its runs:

  • Run right away: for starting a one-off run.
  • Periodic: for an interval-based scheduling of runs (for example: every 2 hours, every 50 minutes).
  • Cron: for specifying cron semantics for scheduling runs. The UI also has an option for you to manually enter a cron expression.

Step

An execution of one of the components in the pipeline. The relationship of a step to its component is much like that of a job to its pipeline: an instantiation relationship. In a complex pipeline, components can execute multiple times in loops, or conditionally after resolving an if/else like clause in the pipeline code.

Step output artifacts

Artifacts are outputs emitted by the pipeline's steps, which the Kubeflow pipelines UI understands, and can render as rich visualizations. It’s useful for pipeline components to include artifacts so that you can provide for performance evaluation, quick decision making for the run, or comparison across different runs. Artifacts also make it possible to understand how the pipeline’s different components work. This can range from a plain textual view of the data to rich interactive visualizations.

Back end

A REST API server supports the front end. For user data stored in external services (for example, Google Cloud Storage), the front end makes requests directly to those services using their client libraries.

Developer Guide

Clone this wiki locally