Skip to content
This repository has been archived by the owner on Dec 31, 2020. It is now read-only.
Eron Wright edited this page Apr 5, 2017 · 12 revisions

flink-tensorflow

Welcome! flink-tensorflow is an open-source library for machine intelligence in Apache Flink, using TensorFlow for numerical computation within a Flink program. A TensorFlow model becomes a function that you can use to transform a stream. This can be combined with Flink connectors and other Flink libraries, to produce scalable, stateful, intelligent stream processing applications.

(TODO) Non-goal: support for unaltered TF programs

Getting Started

  • See the Building section to build the flink-tensorflow library
  • See the Developing an Application section to package your Flink application with the TensorFlow dependencies.

The below sections cover the key concepts of the library.

Introducing TensorFlow

TensorFlow has a lot in common with Apache Flink. It is based on the dataflow programming model, where you define a graph of numerical computations over a stream of data records. TensorFlow executes the computations using the CPU and GPU (when available). An interesting aspect of its design is, the 'pump' that streams records thru the graph is not internal to TensorFlow; it is the responsibility of application code. With flink-tensorflow, the Flink dataflow engine acts as the pump!

All inputs and outputs to a TensorFlow graph are stored as multi-dimensional arrays called tensors. This library provides various converters to read and write data records as tensors.

TensorFlow Models

A model is a pre-built TensorFlow graph with associated data and with well-defined interfaces. A given model might support image classification, text analysis, regression, or other forms of inference. An advanced model might even be stateful, learning from input data or identifying sequences using internal state.

TensorFlow defines a standard format for such models called the saved model format. The flink-tensorflow library fully supports the saved model format. It is also possible to use arbitrary TensorFlow graphs.

Using TensorFlow in Flink

The core functionality of the library is to enable the use of TensorFlow models within Flink data transformations (e.g. map, window, iterate). To interoperate with a diverse range of transformations, the library is designed to work with any transformation function. In the future, a higher-level API may also be introduced.

Loading a Model

(TODO)

Dealing with Tensors

(TODO)

Stateful Models

(TODO)

Clone this wiki locally