Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables saved in converted model #305

Open
tfeher opened this issue Jul 18, 2022 · 0 comments
Open

Variables saved in converted model #305

tfeher opened this issue Jul 18, 2022 · 0 comments

Comments

@tfeher
Copy link
Contributor

tfeher commented Jul 18, 2022

A converted model contains the frozen variables (const), the original variables, and the variables saved into the TRT engine as weights. This can lead up to 3x size of the converted model compared to the original.

Expected behaviour

The original Variables should not be saved in the converted model, only the frozen ones and the TRT weights.

Explanation

  • By default, TF-TRT freezes the model variables before converting the graph. By definition, freezing the model means that the variables are converted to constants.
  • In this process new Const nodes are added to the graph, which will serve as inputs for nodes that were previously taking Variables as inputs.
  • After we add the Const nodes to the graph, the original variables are not used by the model anymore. Therefore their values shall not be saved in the saved model.

Steps to reproduce

import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.ops import array_ops

class MyModule(tf.Module):
    def __init__(self):
        self.v = None

    @tf.function
    def __call__(self, x):
        if self.v is None:
            self.v =  tf.Variable(tf.random.uniform((1024,2048), dtype=tf.float32))
        x = tf.matmul(x, self.v)
        x = tf.nn.relu(x)
        return array_ops.identity(x, name="output_0")

my_module = MyModule()
cfunc2 = my_module.__call__.get_concrete_function(tf.TensorSpec([None, 1024], tf.float32))
tf.saved_model.save(my_module, 'matmul_func', signatures=cfunc2)

def matmul_calibration_input_fn():
    for i in range(4):
        yield (tf.random.uniform((512, 1024), dtype=tf.float32),)

converter = trt.TrtGraphConverterV2(
        input_saved_model_dir="./matmul_func", precision_mode='INT8')

converter.convert(calibration_input_fn=matmul_calibration_input_fn)
converter.save('matmul_func_trt')

After excuting the following code we could see the following output:

tfeher@4fda6a0d0bbd:/workspace/tf/bugs/model_size$ tree -sh
[4.0K]  .
|-- [4.0K]  matmul_func
|   |-- [4.0K]  assets
|   |-- [ 11K]  saved_model.pb
|   `-- [4.0K]  variables
|       |-- [8.0M]  variables.data-00000-of-00001
|       `-- [ 205]  variables.index
|
|-- [4.0K]  matmul_func_trt
|   |-- [4.0K]  assets
|   |   `-- [2.1M]  trt-serialized-engine.TRTEngineOp_000_000
|   |-- [4.0M]  saved_model.pb
|   `-- [4.0K]  variables
|       |-- [8.0M]  variables.data-00000-of-00001
|       `-- [ 205]  variables.index

The original function (matmul_func) contains 8 MiB of variable.data. In the converted model, the parameters shall be stored in the frozen model (saved_model.pb) as well as in the serialized engine (the actual size of these depends on conversion parameters, TRT version, and target GPU). But the variables.data is not needed by the converted model therefore it shall not be saved.

This output was produced using the nvcr.io/nvidia/tensorflow:22.06-tf2-py3 docker image on a T4 gpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant