TensorFlow SavedModel: How to Deploy Models with SavedModel Format

TensorFlow's SavedModel format is the recommended way to save, restore, and deploy trained models. The format encapsulates both the model architecture and its weights, which allows model reusability across different environments without requiring additional code.

Understanding the SavedModel Format
1. File Structure
Saving a Model
Loading a SavedModel
1. Serving a SavedModel
Benefits of Using SavedModel

Understanding the SavedModel Format

The SavedModel format represents a TensorFlow model that is independent of the code that created it. The core abstraction is the SavedModel which consists of a signature definition and a collection of variable values associated with the model.

This format is versatile in being able to support models under multiple levels of graph transformations, exporting computations onto different hardware, or using other backend improvements.

File Structure

A typical SavedModel directory contains:

saved_model.pb or saved_model.pbtxt: The core file containing the model's computation graph.
variables/: Contains the variable data used by the model.
assets/: Any additional assets the model depends upon.

Saving a Model

To save a model using the SavedModel format, you can utilize TensorFlow's high-level APIs. Here's a basic example using Keras:

import tensorflow as tf

# Create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu', input_shape=(128,)),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Save the model
model.save('path/to/saved_model/my_model')

The model.save() method exports a model as a SavedModel. The optional argument include_optimizer=True specifies whether to preserve the optimization phase in the model.

Loading a SavedModel

Loading a SavedModel is seamless and allows for evaluation, retraining, or inference:

import tensorflow as tf

# Load a SavedModel
loaded_model = tf.keras.models.load_model('path/to/saved_model/my_model')

# Use the model to predict
predictions = loaded_model.predict(my_input_data)

The load_model() function recovers the whole Keras model including its weights, architecture, and optimizer state if any.

Serving a SavedModel

Once your model is in SavedModel format, you can deploy it using a service like TensorFlow Serving. TensorFlow Serving is a high-performance serving system for machine learning models, and it works seamlessly with SavedModel.

Here is how you would serve your model using Docker and TensorFlow Serving:

docker pull tensorflow/serving

docker run -p 8501:8501 --name=my_model_serving \
   --mount type=bind,source=$(pwd)/path/to/saved_model/my_model,target=/models/my_model \
   -e MODEL_NAME=my_model -t tensorflow/serving

In this command, the --mount flag is used to specify the path to the SavedModel you would like to serve, and MODEL_NAME is used to name the serving endpoint.

Benefits of Using SavedModel

The SavedModel format provides several benefits including:

Portability: SavedModel lets you transport model artifacts across different architectures and run models in a variety of environments including Python, TensorFlow.JS, and TensorFlow Lite used in mobile and embedded devices.
Convenience: Built-in APIs to save and load models facilitate rapid prototyping and deployment.
Compatibility: Supports various TensorFlow Serving platforms and tools seamlessly integrating into pipelines.

By using the SavedModel format, you can leverage TensorFlow's development flow from training to product deployment without unnecessary conversion or redundancy in your ML pipeline.

Next Article: TensorFlow SavedModel: Versioning and Compatibility

Previous Article: TensorFlow SavedModel: Understanding Model Signatures

Series: Tensorflow Tutorials

Tensorflow