background
mlflowIt is an open source machine learning management platform of databrick. It can decouple algorithm training and algorithm model services, so that algorithm engineers can focus on model training without paying too much attention to service,
And in our company has more than 10 services stable operation for more than two years.
build
Mlflow is mainly used to build mlflow tracking server, which is mainly used for model metadata and model data storage
This time, we use Minio as the storage background of model data, and MySQL as the storage of model metadata, because this mode can meet the online requirements, not only for testing
- Construction of Minio
Refer to my previous articleConstruction and use of MinioAnd create a bucket named mlflow to facilitate subsequent operations -
Building mlflow
- Installation of CONDA
referenceinstall conda, install different CONDA environments according to your system -
Mlfow tracking server installation
#Create the CONDA environment and install Python 3.6 conda create -n mlflow-1.11.0 python==3.6 #Activate CONDA environment conda activate mlflow-1.11.0 #Dependency packages for installing mlfow tracking server Python pip install mlflow==1.11.0 pip install mysqlclient pip install boto3
-
Start of mlflow tracking server
The Minio URL and the required ID and key are exposed because the mlflow tracking server needs to upload the model file export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY export MLFLOW_S3_ENDPOINT_URL=http://localhost:9001 mlflow server \ --backend-store-uri mysql://root:AO,[email protected]/mlflow_test \ --host 0.0.0.0 -p 5002 \ --default-artifact-root s3://mlflow
visit localhost:5002 You can see the following interface:
- Installation of CONDA
use
Copy of wine.py file
import os
import warnings
import sys
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
import mlflow.sklearn
def eval_metrics(actual, pred):
rmse = np.sqrt(mean_squared_error(actual, pred))
mae = mean_absolute_error(actual, pred)
r2 = r2_score(actual, pred)
return rmse, mae, r2
if __name__ == "__main__":
warnings.filterwarnings("ignore")
np.random.seed(40)
# Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
wine_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "wine-quality.csv")
data = pd.read_csv(wine_path)
# Split the data into training and test sets. (0.75, 0.25) split.
train, test = train_test_split(data)
# The predicted column is "quality" which is a scalar from [3, 9]
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]
alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
mlflow.set_tracking_uri("http://localhost:5002")
client = mlflow.tracking.MlflowClient()
mlflow.set_experiment('http_metrics_test')
with mlflow.start_run():
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
(rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)
print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
print(" RMSE: %s" % rmse)
print(" MAE: %s" % mae)
print(" R2: %s" % r2)
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
mlflow.sklearn.log_model(lr, "model")
be careful:
1.` mlflow.set_ tracking_ uri(" http://localhost : 5002 ")
2.` mlflow.set_ experiment('http_ metrics_ Set the name of the experiment
3. Install the python package that the program depends on
4. If it is not in the same CONDA environment, it must be executed
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
export MLFLOW_S3_ENDPOINT_URL=http://localhost:9001
It is convenient for Python client to upload model files and model metadata
Execute Python directly wine.py If successful, access the following under the mlflow tracking server UI
Click 2020-10-30 10:34:38, as follows:
Start mlflow algorithm service
Execute commands in the same CONDA environment
export MLFLOW_TRACKING_URI=http://localhost:5002
mlflow models serve -m runs:/e69aed0b22fb45debd115dfc09dbc75a/model -p 1234 --no-conda
Where ebd45dbdfui is the ebd75dbdfuid
If modulenotfounderror is encountered: no module named ‘sklearn’
Execute PIP install scikit learn = = 0.19.1
Module notfounderror encountered: no module named ‘SciPy’
Execute PIP install SciPy
Request access to the service started by the model:
curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations
output[5.455573233630147]
Indicates that the model service is successfully deployed
At this point, the main simple use of mlflow is completed. If there are algorithms that mlflow does not support, you can refer to itCustom model