[mlflow Series 1] construction and use of mlflow

Time:2020-11-23

background

mlflowIt is an open source machine learning management platform of databrick. It can decouple algorithm training and algorithm model services, so that algorithm engineers can focus on model training without paying too much attention to service,
And in our company has more than 10 services stable operation for more than two years.

build

Mlflow is mainly used to build mlflow tracking server, which is mainly used for model metadata and model data storage
This time, we use Minio as the storage background of model data, and MySQL as the storage of model metadata, because this mode can meet the online requirements, not only for testing

  • Construction of Minio
    Refer to my previous articleConstruction and use of MinioAnd create a bucket named mlflow to facilitate subsequent operations
  • Building mlflow

    • Installation of CONDA
      referenceinstall conda, install different CONDA environments according to your system
    • Mlfow tracking server installation

      #Create the CONDA environment and install Python 3.6  
      conda create -n mlflow-1.11.0 python==3.6
      #Activate CONDA environment
      conda activate mlflow-1.11.0
      #Dependency packages for installing mlfow tracking server Python
      pip install mlflow==1.11.0 
      pip install mysqlclient
      pip install boto3
    • Start of mlflow tracking server

      The Minio URL and the required ID and key are exposed because the mlflow tracking server needs to upload the model file   
      export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
      export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
      export MLFLOW_S3_ENDPOINT_URL=http://localhost:9001
      mlflow server \
         --backend-store-uri mysql://root:AO,[email protected]/mlflow_test \
         --host 0.0.0.0 -p 5002 \
         --default-artifact-root s3://mlflow

      visit localhost:5002 You can see the following interface:
      [mlflow Series 1] construction and use of mlflow

use

Copy of wine.py file

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
import mlflow.sklearn


def eval_metrics(actual, pred):
  rmse = np.sqrt(mean_squared_error(actual, pred))
  mae = mean_absolute_error(actual, pred)
  r2 = r2_score(actual, pred)
  return rmse, mae, r2


if __name__ == "__main__":
  warnings.filterwarnings("ignore")
  np.random.seed(40)

  # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
  wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
  data = pd.read_csv(wine_path)

  # Split the data into training and test sets. (0.75, 0.25) split.
  train, test = train_test_split(data)

  # The predicted column is "quality" which is a scalar from [3, 9]
  train_x = train.drop(["quality"], axis=1)
  test_x = test.drop(["quality"], axis=1)
  train_y = train[["quality"]]
  test_y = test[["quality"]]

  alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
  l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
  mlflow.set_tracking_uri("http://localhost:5002")
  client = mlflow.tracking.MlflowClient()
  mlflow.set_experiment('http_metrics_test')
  with mlflow.start_run():
      lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
      lr.fit(train_x, train_y)

      predicted_qualities = lr.predict(test_x)

      (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

      print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
      print("  RMSE: %s" % rmse)
      print("  MAE: %s" % mae)
      print("  R2: %s" % r2)

      mlflow.log_param("alpha", alpha)
      mlflow.log_param("l1_ratio", l1_ratio)
      mlflow.log_metric("rmse", rmse)
      mlflow.log_metric("r2", r2)
      mlflow.log_metric("mae", mae)

      mlflow.sklearn.log_model(lr, "model")

be careful:

1.` mlflow.set_ tracking_ uri(" http://localhost : 5002 ")    
  2.` mlflow.set_ experiment('http_ metrics_ Set the name of the experiment    
  3. Install the python package that the program depends on   
  4. If it is not in the same CONDA environment, it must be executed 
  
   
    export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
    export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    export MLFLOW_S3_ENDPOINT_URL=http://localhost:9001

It is convenient for Python client to upload model files and model metadata
Execute Python directly wine.py If successful, access the following under the mlflow tracking server UI
[mlflow Series 1] construction and use of mlflow
Click 2020-10-30 10:34:38, as follows:
[mlflow Series 1] construction and use of mlflow
[mlflow Series 1] construction and use of mlflow

Start mlflow algorithm service

Execute commands in the same CONDA environment

export MLFLOW_TRACKING_URI=http://localhost:5002 
mlflow models serve -m runs:/e69aed0b22fb45debd115dfc09dbc75a/model -p 1234 --no-conda

Where ebd45dbdfui is the ebd75dbdfuid

If modulenotfounderror is encountered: no module named ‘sklearn’
Execute PIP install scikit learn = = 0.19.1
Module notfounderror encountered: no module named ‘SciPy’
Execute PIP install SciPy

Request access to the service started by the model:

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations

output[5.455573233630147]Indicates that the model service is successfully deployed

At this point, the main simple use of mlflow is completed. If there are algorithms that mlflow does not support, you can refer to itCustom model