Serverless Prediction at Scale, Part 2: Custom Container Deployment on Vertex AI

Jinmiao Zhang
7 min readAug 25, 2021

The second part of a 2-part series on the serverless deployment of custom-built models on Google Cloud Platform. This part covers the custom container deployment on Vertex AI.

Image by Author

In the previous post [1], I have shared an experience of deploying a custom-built machine learning model to Google Cloud Platform (GCP) by using the custom prediction routine on AI Platform. The custom prediction routine deployment is serverless and proven to be highly scalable. In this post, I will continue this line of discussion by sharing another experience with GCP’s new Vertex AI [2] by using the custom container approach [3].

Vertex AI

Announced recently [4], Vertex AI is a unified machine learning platform on GCP that offers a comprehensive set of tools and products for building and managing the life cycle of ML models in a single environment. It consolidates many of the previous offerings from the legacy AI Platform and AutoML (Table/Vision/NLP), and supplements with several new popular ML products and services such as labeling tasks, pipelines, feature store, experiments, model registry etc.

From the model deployment point of view, Vertex AI currently supports two types of deployment for custom-built models:

  • Pre-built container
  • Custom container

The pre-built container is intended for models built from the commonly-used ML frameworks including Scikit-learn, XGBoost and Tensorflow. At prediction time, the pre-built container directly calls the predict() method from the saved model artifacts of the specified framework. The pre-built container does not support custom serving code at prediction time such as the custom code needed for pre- and post-processing. The custom container, however, supports all types of ML frameworks and custom serving code. It also supports the deployment of custom models trained outside of Vertex AI. The down side of this option is that users need to build their own custom docker container in order for the deployment.

For the adverse drug reaction (ADR) model we have experimented in the previous post [1], we are not able to use the pre-built container on Vertex AI. We will use the custom container approach instead.

Create a Docker Container Image

The first step of our deployment process is to create a docker image for the custom container to be deployed. As far as the model artifacts are concerned, Vertex AI allows them to be stored in a cloud storage bucket and loaded into the container during the container startup time. Alternatively, the model artifacts can also be directly embedded into the docker image as part of the image content itself. We will use the embedding option in this experimentation.

I created a docker folder in my work directory containing all of the contents that are needed for building the docker container image:

  • model_files: a subfolder containing the same 4 saved model artifact files as described in the previous post [1]
  • gcp_adr_predictor.py: the same custom serving predictor class as shown in the previous post [1]
  • adr_serving_utils.py: the same custom utility helper functions as discussed in the previous post [1]
  • Dockerfile: docker build file
  • requirement.txt: all required Python library dependencies
  • server.py: Flask server code with the endpoint route definitions
  • wsgi.py: a simple Flask server runner

For the model predictor class, I have re-used the same model serving code as described in the previous post [1]. The content of docker build file and the required python libraries for the custom container are:

Here, I used Gunicorn and Flask as the HTTP web server for the docker container. Port 5050 is exposed from the container to serve the incoming HTTP requests. Then, I built the docker image by running:

This created a custom docker image on my local machine with the specified image tag name. {PROEJCT_ID} and {REGION} are the placeholders for the names of my GCP project id and region. {REPOSITORY} and {IMAGE} are the names given to my artifact registry repository and docker image for the custom container.

Test Custom Container Locally

After the docker image was built, I tested it locally by starting up the custom container to ensure that the custom container can serve the HTTP request for prediction as expected:

Here, I mapped the container’s internal port 5050 to my local machine’s port 5050 for serving the HTTP request. Then I tested the container by:

sample_input.json” is a JSON file containing a sample of input data to the model for testing as used in the previous post [1]:

Note that all of the model input data fields need to be wrapped by a top-level element “instances” in an array. This data structure is required by Vertex AI in the custom container implementation.

The custom container is expected to return a HTTP response with a JSON content of the same prediction values as shown in the previous post [1]. This confirms that the docker image is built correctly and the custom container is working appropriately in the local environment.

Deploy Custom Container to Vertex AI

After local testing, the custom container can now be deployed to Vertex AI. At first, I created an Artifact Registry repository on GCP and pushed the docker image to this repository:

Then, I imported a custom model on Vertex AI using the docker image pushed in the artifact repository:

Note that in the above command, the container port is specified as 5050 and the health check and prediction routes are also specified based on their definitions in the Flask web server code. Once the model import is completed, it can be confirmed by navigating to Vertex AI console or running the following command:

Finally, I created an endpoint and deployed the custom model to the endpoint for serving:

where {ENDPOINT_ID} is the endpoint id assigned by Vertex AI after the endpoint was created. Here I specified that each node of the cluster uses the standard 2-vCPU machine type, the minimum number of nodes is 1 and the maximum number is 100 (auto-scaled between 1 to 100). All traffic to the model endpoint is routed to the current version of the deployed model. Vertex AI allows for a traffic percentage split in the endpoint-model deployment bindings. This traffic split can be used to implement a blue-green deployment strategy if a gradual release of new model versions is desired.

Vertex AI Endpoint and ADR Model Bindings

Test Custom Container on Vertex AI

After the custom container was deployed, I ran the following tests to ensure that the endpoint works correctly:

  • a) Test using gcloud SDK:
  • b) Test using HTTP request:

c) Test using Python client:

Load Test of Model API

Just like the custom prediction routine discussed in the previous post [1], the predictive service endpoint provided by the custom container can also be exposed as a REST API through an Apigee Proxy. I have run a very similar load testing of the custom container’s model API by simulating 1 and 100 concurrent online users. The service response times from both load tests are captured as below:

Load Test 1: Single User

Load Test 2: 100 Concurrent Users

The average response time is about 1.47 second for the single user test case and 2.23 second for the 100 concurrent users test case. This test result is very consistent with what we have observed from the custom prediction routine in the previous post [1], indicating the high scalability of the deployed custom model on Vertex AI.

Below are some of the model operational metrics I captured from the endpoint monitoring console on Vertex AI. These metrics correspond to the 3 consecutive runs of the 2nd load test case of 100 concurrent users (a total of 30 mins).

Summary

In this post, I have shared another experience of deploying a custom-built model on Vertex AI using the custom container approach. Similar to the custom prediction routine discussed in the previous post [1], custom container deployment on Vertex AI is proven to be very flexible and highly scalable.

At the time of this post’s writing, custom prediction routine is not supported on Vertex AI yet. If you prefer to use the custom prediction routine instead of creating your own custom docker container, you will need to continue using the legacy AI Platform at this moment.

Acknowledgements

I appreciate the support from Google customer engineers, Brendan Doohan and Nathan Hodson, in this experimentation.

References

  1. https://medium.com/mlearning-ai/serverless-prediction-at-scale-custom-model-deployment-on-google-cloud-ai-platform-d2d0807a0b8f

2. https://cloud.google.com/vertex-ai

3. https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container

4. https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-vertex-ai-unified-platform-for-mlops

--

--

Jinmiao Zhang

Ph.D., Senior Data Scientist and Machine Learning Architect at Cardinal Health