RFC: On-device training with TensorFlow Lite #390

miaout17 · 2021-06-07T16:52:07Z

We're sharing this RFC to reflect our newest thoughts of implementing on-device training in TensorFlow Lite.
We didn't setup a timeline to close the comments. We want to surface the RFC early for transparency and get feedback.

Status	Draft
Author(s)	Yu-Cheng Ling ([email protected]), Haoliang Zhang ([email protected]), Jaesung Chung ([email protected])
Sponsor	Jared Duke ([email protected])
Updated	2021-06-04

Introduction

TensorFlow Lite is TensorFlow's solution for on-device machine learning.
Initially it only focused on inference use cases. We have increasingly heard
from users regarding the need for on-device training. This proposal lays out
the concrete plan & roadmap for supporting training in TensorFlow Lite.

jijoongmoon · 2021-06-09T08:01:15Z

Thanks for sharing the RFC. But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup? And also I wonder if there is optimization techniques to make training on device realistic? I mean we might need big optimization in memory and computation perspective. Could you introduce some of them?

bhack · 2021-06-13T22:12:38Z

I suggest to take a look at Continual Learning on the Edge with TensorFlow Lite

And https://arxiv.org/abs/2105.13127

bhack · 2021-06-13T22:14:23Z

/cc @vlomonaco

bhack · 2021-06-14T10:19:32Z

Another interesting scenario to evaluate is training in the context of Edge federated learning:

https://github.com/tensorflow/federated/issues/749
https://arxiv.org/abs/2104.03042
https://arxiv.org/abs/1909.11875
https://www.sciencedirect.com/science/article/pii/S266729522100009X

vlomonaco · 2021-06-14T12:39:13Z

Thanks @bhack for the tag! @lrzpellegrini, the main author of "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" will take a look and provide some feedback.

bhack · 2021-06-14T13:06:42Z

/cc @gdemos01 @akhilmathurs

miaout17 · 2021-06-14T17:46:27Z

Replying to @jijoongmoon

But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup?

Great question.

When doing transfer learning on a classifier, changing the number of classes does not require changing the model "structure" (adding/removing ops). Changing the shape of the weights tensor should be sufficient. This proposal can handle the use case with no problem.

And also I wonder if there is optimization techniques to make training on device realistic? I mean we might need big optimization in memory and computation perspective. Could you introduce some of them?

For sure. We're focusing on making it generally work first. Once we reach that point, we can do more benchmarking and profiling to figure out what's most significant to be optimized, and work on it.

miaout17 · 2021-06-14T17:47:52Z

Thanks @bhack.
@vlomonaco @lrzpellegrini thanks for taking a look and please feel free to comment.

lc0 · 2021-06-14T20:33:51Z

Replying to @jijoongmoon

But I wonder if it is possible to change the model architecture on device at runtime. The first thing about the on-device training we can think of might be the transfer learning and we need to add new classes as user want. In that case, I think we need to change model architecture ( let's say, we need to change unit size of dense layer ). So is it possible with the current setup?

Great question.

When doing transfer learning on a classifier, changing the number of classes does not require changing the model "structure" (adding/removing ops). Changing the shape of the weights tensor should be sufficient. This proposal can handle the use case with no problem.

@miaout17 can you elaborate on how such a shape change process would work? I do not see such a use case in the current proposal. Thanks!

vlomonaco · 2021-06-15T17:35:33Z

Hi @miaout17, I had a more in-depth look. This direction looks promising and we are excited to finally see training on-device on the TFLite radar. I think for many Transfer Learning problems these features would be great. However, for Continual Learning (CL) flexibility is all that matters.

Can the model architecture, optimizer, loss function be changed over time?

It would be difficult to implement a CL approach without those features, apart from basic experience replay. @lrzpellegrini will provide more details.

lrzpellegrini · 2021-06-17T15:12:09Z

Hi there, I had a look at the RFC. It seems to me that it moves in a very good direction.

I'm not aware of the current capabilities of TF-Lite as I only had the chance to use it in a very high-level way, but I really appreciate that the focus of the RFC is on the ability to transfer whole tf.functions to the final model. This can really boost the ability to learn on-device without forcing the programmer to delve too much in the low-level side of mobile implementations.

As a comparison, while implementing the CORe app described in "Continual Learning at the Edge: Real-Time Training on Smartphone Devices" I had to manually translate the Python version of our Continual Learning algorithm in C++ so that it could be used along the Caffe deep learning library. In this scenario even simple things like moving data, accessing tensors (weights, inputs, ...) add a lot of complexity and with that comes an absurd overhead on the programming side, so I really appreciate this tf.functions based approach 👍.

As Vincenzo pointed out, the main issues are on the flexibility side. In the simple scenario of a limited on-device fine-tuning, a simple fit based approach seems the best solution. However, this would really limit the capabilities of the framework: as I suspect, a fit-based approach would only allow for a very simple instance replay mechanism, which may be insufficient when working with Continual Learning algorithms.

On the other hand, supporting Continual Learning algorithms may require some flexibility on:

Ability to easily manipulate (read, write, store, load) tensors linked to weights, activations, gradients, etcetera.
Ability to change the model architecture (alas, not limited to changing the number of outputs of a certain layer). Some algorithms also require the ability to dynamically add new layers to the existing model or even to add new detached/undetached models
Ability to change the optimizer, loss, lr schedulers and other training related components
Ability to selectively freeze and unfreeze certain parts of the model

Of course not all CL algorithms need all these capabilities.

Consider that CL is a very variegated field but most algorithms leverage an instance replay mechanism (implemented by inserting/replacing new instances into the dataset) plus some simple regularization/distillation/bias normalization algorithm (which mostly require flexibility on the tensors manipulation side). More recent algorithms really push on the idea of manipulating the architecture of the model, but I guess that supporting this behavior would be the most problematic part of this.

Alas, I don't have a clear understanding of the translation capabilities of tf.functions from Python to TFLite models, so I'm not able to fully grasp the complexity required to accomplish this kind of flexibility.

bhack · 2021-06-17T20:48:06Z

I think that fedarated and continual learning are more relevant in the on device/edge use case cause, in this context, It is still hard to achieve few-shot/zero-shot learning of "general pourpose" (recent) very large scale models. At least untill we figure out how knowledge "hard distillation" on these models could be achieved efficently on constrained devices.

miaout17 · 2021-06-23T04:39:14Z

can you elaborate on how such a shape change process would work?

Replying to @lc0

For example

Imagine you have a classifier where the last layer is a simple fully connected (e.g. tf.relu(tf.matmul(x, weight) + bias))
We can define a def set_classes_num(classes_num) TF function, which re-initializes the weight and bias variables to a different size. For example, if the number of hidden units is 1024 before the last layer, the weight can have shape [1024, classes_num] and the bias can have shape [classes_num]. The function can re-initialize the weights and bias to random value close to 0, and it will be ready to retrain the last layer.

We're building low level features to make describing the semantic possible. It's considerable to wraps these into easier to use API to make it more friendly for developers.

Let me know if this makes sense. I'm happy to try to write this as a more concrete pseudo code as well.

miaout17 · 2021-06-23T04:52:33Z

Replying to @vlomonaco and @lrzpellegrini

Thanks for the feedback!

For clarification: It sounds the continual learning automatically can modify the model structure without human interfering. Is my rough understand correct?

This seems more advanced than what we're currently targeting. Trying to break down the requirements:

Ability to easily manipulate (read, write, store, load) tensors linked to weights, activations, gradients, etcetera.

I think this should be doable (by wrapping required logic into TF functions).

Ability to change the model architecture (alas, not limited to changing the number of outputs of a certain layer). Some algorithms also require the ability to dynamically add new layers to the existing model or even to add new detached/undetached models
Ability to change the optimizer, loss, lr schedulers and other training related components

We haven't tried these yet. However I think in theory:

A TFLite model is like a TF function. There is no easy way to change it (e.g. adding a layer) after the TFLite model is created.
However, I think it's possible to model some of these behavior with control flow (e.g. if a value is true, skip a layer or switch to another optimizer algorithm)
In the future, we can also explore on-device generation / modification of TFLite model, but it would be an even more advanced route.

Ability to selectively freeze and unfreeze certain parts of the model

This should be doable with control flow (e.g. skip some gradient computation and variable update when a boolean value is true)

danieljanes · 2021-06-25T11:26:33Z

Thanks for sharing this, excited to see progress here. As one of the authors of the Flower federated learning framework, I can say that on-device training support is one of the biggest challenges for cross-device federated learning right now.

After reading the RFC I was wondering how setting/changing hyperparameters would work on-device. Would we just add additional arguments (like epochs) to e.g. the train method

@tf.function
  def train(self, inputs, labels, epochs):
    self.model.fit(inputs, labels, epochs=epochs)

and then call train(train_input, train_labels, epochs=3)?

bhack · 2021-06-27T16:17:56Z

About changing the model in training mode check:

https://discuss.tensorflow.org/t/how-to-implement-layerdrop-in-tensorflow-transformers/2396

gdemos01 · 2021-07-20T06:58:19Z

\cc @vassilisvas is the co-author of Continual Learning on the Edge with TensorFlow Lite and the leader of the Learning Agents & Robots MRG. This is an interesting conversation to keep our eyes on and maybe contribute to the discussion.

martinkersner · 2021-08-07T01:16:31Z

Thank you for bringing on-device training to TFLite!

Based on this proposal I am not sure where do you plan to manage a training loop. Are you thinking of (1) keeping it inside of TFLite or (2) letting developer decide how to the training loop will be structured on device?

As @danieljanes pointed out, the API doesn’t show how the actual training step or training phase would be controlled. Moreover, optimizer and loss do not seem to be accessible from saved model. How would train method know which one to use?

yingding · 2021-11-06T11:11:50Z

I have similar question to @martinkersner regarding the training loop from the context of Federated ML with TF-lite. It would be fantastic to let developer to decide how to train and structure the training loop on device. In this way, it opens up the possibility to forward the gradients from the training loop to further orchestration structure to allow centralised and decentralised Fed. ML.

I can understand the benefits to keep the training loop and structure inside TFLite, so that it can be distributed unified across all the platforms. And with the training loops open up to different platforms, you might need an additional lib extension for android, IoT and so on. But with the additional lib extensions to control training loop, you can reduce the dependencies on different platforms and speed up the development cycle for TFLite, since all the extension libs can have their own deployment cycle.

bhack · 2021-11-06T11:33:20Z

We had already some research work at ICML 2021 to joint Federated and Continual learning with a TF reference impl:

https://github.com/wyjeong/FedWeIT

It could be nice to open this research subdomain to the Edge devices with TFlite.

bhack · 2021-11-10T18:40:29Z

Is this finalized/approved?
https://blog.tensorflow.org/2021/11/on-device-training-in-tensorflow-lite.html?m=1

yingding · 2021-11-10T23:46:31Z

https://www.tensorflow.org/lite/examples/on_device_training/overview
This is live yesterday (9.Nov) on ML Community Day stream.

bhack · 2021-11-13T14:10:50Z

Another interesting use case, also if Imagenet probably It is a too large dataset for many edge computing TFlite platforms, Is this recent Deepmind paper One Pass ImageNet:

https://arxiv.org/abs/2111.01956

ematejska · 2022-01-24T18:49:21Z

Is this ready for community feedback? Are you ready to take this through review?

Initial check in TFLite training RFC.

4896e4f

miaout17 requested review from ematejska, ewilderj and theadactyl as code owners June 7, 2021 16:52

google-cla bot added the cla: yes label Jun 7, 2021

bhack mentioned this pull request Jun 27, 2021

tf.function-decorated function tried to create variables on non-first call tensorflow/tensorflow#27120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: On-device training with TensorFlow Lite #390

RFC: On-device training with TensorFlow Lite #390

miaout17 commented Jun 7, 2021 •

edited by ematejska

Loading

jijoongmoon commented Jun 9, 2021 •

edited

Loading

bhack commented Jun 13, 2021

bhack commented Jun 13, 2021

bhack commented Jun 14, 2021

vlomonaco commented Jun 14, 2021

bhack commented Jun 14, 2021

miaout17 commented Jun 14, 2021

miaout17 commented Jun 14, 2021

lc0 commented Jun 14, 2021

vlomonaco commented Jun 15, 2021

lrzpellegrini commented Jun 17, 2021

bhack commented Jun 17, 2021 •

edited

Loading

miaout17 commented Jun 23, 2021

miaout17 commented Jun 23, 2021

danieljanes commented Jun 25, 2021

bhack commented Jun 27, 2021

gdemos01 commented Jul 20, 2021

martinkersner commented Aug 7, 2021

yingding commented Nov 6, 2021

bhack commented Nov 6, 2021

bhack commented Nov 10, 2021

yingding commented Nov 10, 2021

bhack commented Nov 13, 2021 •

edited

Loading

ematejska commented Jan 24, 2022

RFC: On-device training with TensorFlow Lite #390

Are you sure you want to change the base?

RFC: On-device training with TensorFlow Lite #390

Conversation

miaout17 commented Jun 7, 2021 • edited by ematejska Loading

Introduction

jijoongmoon commented Jun 9, 2021 • edited Loading

bhack commented Jun 13, 2021

bhack commented Jun 13, 2021

bhack commented Jun 14, 2021

vlomonaco commented Jun 14, 2021

bhack commented Jun 14, 2021

miaout17 commented Jun 14, 2021

miaout17 commented Jun 14, 2021

lc0 commented Jun 14, 2021

vlomonaco commented Jun 15, 2021

lrzpellegrini commented Jun 17, 2021

bhack commented Jun 17, 2021 • edited Loading

miaout17 commented Jun 23, 2021

miaout17 commented Jun 23, 2021

danieljanes commented Jun 25, 2021

bhack commented Jun 27, 2021

gdemos01 commented Jul 20, 2021

martinkersner commented Aug 7, 2021

yingding commented Nov 6, 2021

bhack commented Nov 6, 2021

bhack commented Nov 10, 2021

yingding commented Nov 10, 2021

bhack commented Nov 13, 2021 • edited Loading

ematejska commented Jan 24, 2022

miaout17 commented Jun 7, 2021 •

edited by ematejska

Loading

jijoongmoon commented Jun 9, 2021 •

edited

Loading

bhack commented Jun 17, 2021 •

edited

Loading

bhack commented Nov 13, 2021 •

edited

Loading