TensorFlow SavedModel: Serving Models with TensorFlow Serving

TensorFlow is a powerful open-source library for machine learning and deep learning tasks. One of its core components that makes it exceptionally versatile is its ability to deploy trained models with TensorFlow Serving. In this article, we'll walk through how to save a trained model in TensorFlow using SavedModel format and then serve it with TensorFlow Serving.

Understanding the TensorFlow SavedModel Format
Saving a Model Using SavedModel
Setting Up TensorFlow Serving
Serving the SavedModel
Testing Your Model Endpoint
Conclusion

Understanding the TensorFlow SavedModel Format

The SavedModel format is TensorFlow’s universal format for serializing trained models. It allows the model to be independent of the code that created it and is favorable because:

It saves everything required to share or deploy a model, including weights, computation graphs, variables, and even optimizers.
It is portable across different programming environments and tools.
It can serve multiple users across various platforms easily.

Saving a Model Using SavedModel

To begin with, let's look at how to save a model in the SavedModel format. Assume you have a simple neural network model built using TensorFlow:

import tensorflow as tf

# Building a simple Sequential model
def create_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation='relu', input_shape=(28, 28)),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy', 
                  metrics=['accuracy'])
    return model

# Create an instance of the model
model = create_model()

# Train your model (example with dummy data)
model.fit(x_train, y_train, epochs=5)

# Save the entire model
tf.saved_model.save(model, "/tmp/saved_model/my_model")

In this example, tf.saved_model.save() is used to export the model to the specified directory "/tmp/saved_model/my_model". Make sure to replace x_train and y_train with your actual dataset.

Setting Up TensorFlow Serving

TensorFlow Serving makes it easier to deploy models quickly. To set up TensorFlow Serving:

First, you need Docker installed on your machine as TensorFlow Serving is provided via Docker images.
Once Docker is installed, pull the TensorFlow Serving Docker image:

docker pull tensorflow/serving

Serving the SavedModel

With your SavedModel ready and Docker running TensorFlow Serving, you can now start serving your model:

docker run -p 8501:8501 --name=tf_model_serving \
   --mount type=bind,source=/tmp/saved_model/my_model,target=/models/my_model \
   -e MODEL_NAME=my_model -t tensorflow/serving

Here's the breakdown of the command:

-p 8501:8501: Maps port 8501 of Docker image to host, allowing us access over the network.
--mount type=bind,...: Mounts the directory where SavedModel resides to the model directory in Docker.
-e MODEL_NAME=my_model: Sets the environment variable MODEL_NAME.

Testing Your Model Endpoint

After setting this up, navigate to your browser or a tool like Postman and access the URL: http://localhost:8501/v1/models/my_model.

Your model will respond, confirming it’s ready. You can use a POST request to test your model inference like so:

{
  "signature_name": "serving_default",
  "instances": [
    {"input_tensor": [...sample input data as list...]}
  ]
}

Ensure your signature_name and input formats match your model's expected structure.

Conclusion

TensorFlow Serving is an efficient framework for serving your TensorFlow models in production. Using the SavedModel format allows seamless integration and deployment across different platforms and environments, leveraging TensorFlow's robust capabilities in a myriad of systems.

Next Article: TensorFlow SavedModel: Debugging Common Save Issues

Previous Article: TensorFlow SavedModel: Converting Keras Models to SavedModel

Series: Tensorflow Tutorials

Tensorflow