Deploying, monitoring and extending machine learning model on AWS


By aparna dhinakaran
Compile | Flin
Source: towards science

Deploying, monitoring and extending machine learning model on AWS

Deploying robust, scalable machine learning solutions is still a very complex process that requires a lot of human participation and effort. As a result, new products and services take a long time to come to market or are abandoned in the prototype state, which reduces the interest in it in the industry. So, how can we promote the process of putting machine learning models into production?

Cortex is an open source platform for deploying machine learning models as production network services. It uses a powerful AWS ecosystem to deploy, monitor, and extend framework independent models as needed. Its main characteristics are summarized as follows:

  • Framework independent: cortex supports any Python code; like other Python scripts, tensorflow, pytorch, scikit learn, xgboost are supported by the library.
  • Auto scaling: cortex automatically scales your API to handle production loads.
  • CPU / GPU support: using AWS IAAs as as the underlying infrastructure, cortex can run in CPU or GPU environment.
  • Spot instances: cortex supports EC2 spot instances to reduce costs.
  • Rolling updates: cortex applies any updates to the model without any downtime.
  • Log flow: cortex uses docker like syntax to save the logs in the deployment model and stream them to the CLI.
  • Forecast monitoring: cortex monitors network indicators and tracks forecasts.
  • Minimum configuration: cortex deployment configuration is defined as a simple yaml file.

In this article, we use cortex to deploy an image classification model to AWS as a web service. So, to get to the point, let’s introduce cortex.

Deploy the model as a web service

In this example, we use the Library( )From the first course of related MOOC( )Using pets classification model. The following sections describe the installation of cortex and the deployment of pets classification model as a web service.


If not already installed, you should first create a new user account on AWS with programmatic access. To do this, select the Iam service and select it from the right panelUsers, and finally pressAdd UserButton. Give the user a name and selectProgrammatic access

Deploying, monitoring and extending machine learning model on AWS

Next, in thePermissions On the screen, selectAttach existing policies directlyTab, and selectAdministratorAccess

Deploying, monitoring and extending machine learning model on AWS

You can leave the tag page blank to view and create users. Finally, pay attention to the access key ID and key access key.

On the AWS console, you can also create an S3 bucket to store any other artifacts that the trained model and code might generate. You can name the bucket as long as it is a unique name. Here, we create acortex-pets-modelThe bucket of.

Next, we must install cortex cli on the system and start the kubernetes cluster. To install cortex cli, run the following command:

bash -c “$(curl -sS"

By accessing the corresponding document section( ), check that you are installing the latest version of cortex cli.

We are now ready to build a cluster. Creating kubernetes clusters with cortex is simple. Simply execute the following command:

cortex cluster up

Cortex will ask you for some information, such as your AWS key, the region you want to use, the number of computing instances you want to start, and the number of them. Cortex will also let you know how much you will spend on the service of your choice. The whole process can take 20 minutes.

Deploying, monitoring and extending machine learning model on AWS

Training your model

Cortex doesn’t care how you create or train your models. In this case, we use the Library and Oxford IIIT pet dataset. This data set contains 37 different kinds of dogs and cats. Therefore, our model should divide each image into these 37 categories.

Create atrainer.pyfile

import boto3
import pickle

from import *

# initialize boto session
session = boto3.Session(

# get the data
path = untar_data(URLs.PETS, dest='sample_data')
path_img = path/'images'
fnames = get_image_files(path_img)

# process the data
bs = 64
pat = r'/([^/]+)_\d+.jpg$'
data = ImageDataBunch.from_name_re(path_img, fnames, pat, 
                                   ds_tfms=get_transforms(), size=224, bs=bs) \

# create, fit and save the model
learn = cnn_learner(data, models.resnet18, metrics=accuracy)

with open('model.pkl', 'wb') as handle:
    pickle.dump(learn.model, handle)

# upload the model to s3
s3 = session.client('s3')
s3.upload_file('model.pkl', 'cortex-pets-model', 'model.pkl')

As with other Python scripts, run the script locally: Python

However, be sure to provide your AWS credentials and S3 bucket name. This script takes the data, processes them, fits a pre trained RESNET model and uploads it to S3. Of course, you can use several techniques (more complex architecture, differentiated learning rates, more era oriented training) to extend this script to make the model more accurate, but it’s not about our goal. If you want to learn more about RESNET architecture, please refer to the following article.…

Deployment model

Now that we have trained the model and stored it in S3, the next step is to deploy it as a web service into the production environment. To do this, we created apredictor.pyPython script, as shown in the following figure:

import torch
import boto3
import pickle
import requests

from PIL import Image
from io import BytesIO
from torchvision import transforms

# initialize boto session
session = boto3.Session(

# define the predictor
class PythonPredictor:
    def __init__(self, config):
        s3 = session.client('s3')
        s3.download_file(config['bucket'], config['key'], 'model.pkl')
        self.model = pickle.load(open('model.pkl', 'rb'))

        normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        self.preprocess = transforms.Compose(
            [transforms.Resize(224), transforms.ToTensor(), normalize]

        self.labels = ['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair',
                       'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue',
                       'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier',
                       'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel',
                       'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese',
                       'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland',
                       'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu',
                       'staffordshire_bull_terrier', 'wheaten_terrier',  'yorkshire_terrier']

        self.device = config['device']

    def predict(self, payload):
        image = requests.get(payload["url"]).content
        img_pil =
        img_tensor = self.preprocess(img_pil)
        img_tensor =
        with torch.no_grad():
            prediction = self.model(img_tensor)
        _, index = prediction[0].max(0)
        return self.labels[index]

This file defines a predictor class. When instantiating it, it retrieves the model from S3, loads it into memory, and defines some necessary transformations and parameters. During reasoning, it reads the image from the given URL and returns the name of the predicted class. A method for initialization__init__And a prediction method for receiving payloads and returning resultspredict

The predictor script has two accompanying files. Of a repository dependency requirements.txt Files (such as python, fastai, boto3, etc.) and a yaml configuration file. The minimum configuration is as follows:

- name: pets-classifier
    type: python
      bucket: cortex-pets-model
      key: model.pkl
      device: cpu 

In this yaml file, we define which script to run for reasoning, on which device (such as CPU), and where to find the trained model. More options are available in the documentation.

Finally, the structure of the project should follow the following hierarchy. Please note that you can submit a model at least if you cantrain .py

- Project name

With all this, you just need to runcortex deployWithin seconds, your new endpoint is ready to accept requests. implementcorted get pets-classifierTo monitor the endpoint and view additional details.

status   up-to-date   requested   last update   avg request   2XX   
live     1            1           13m           -             -
curl: curl -X POST -H "Content-Type: application/json" -d @sample.json
name: pets-classifier
endpoint: /pets-classifier
  type: python
    bucket: cortex-pets-model
    device: cpu
    key: model.pkl
  cpu: 200m
  min_replicas: 1
  max_replicas: 100
  init_replicas: 1
  workers_per_replica: 1
  threads_per_worker: 1
  target_replica_concurrency: 1.0
  max_replica_concurrency: 1024
  window: 1m0s
  downscale_stabilization_period: 5m0s
  upscale_stabilization_period: 0s
  max_downscale_factor: 0.5
  max_upscale_factor: 10.0
  downscale_tolerance: 0.1
  upscale_tolerance: 0.1
  max_surge: 25%
  max_unavailable: 25%

The rest is to test it with curl and Pomeranian images:

curl -X POST -H "Content-Type: application/json" -d '{"url": ""}'

Releasing resources

When we complete the services and clusters, we should release resources to avoid additional costs. Cortex is easy to do:

cortex delete pets-classifier
cortex cluster down


In this article, we saw how to use cortex, an open source platform, to deploy machine learning models as production web services. We trained an image classifier, deployed it to AWS, monitored its performance and tested it.

For more advanced concepts such as predictive monitoring, rolling updates, cluster configuration, auto scaling, etc., please visit the official documentation site( )And the GitHub page of the project(… 。

Link to the original text:…

Welcome to visit pan Chuang AI blog station:

Sklearn machine learning Chinese official document:

Welcome to pay attention to pan Chuang blog resource collection station: