How to use alink localpredictor easily?

Time:2021-2-26

Machine learning training algorithm is more complex, often need distributed, but the trained model for prediction is relatively simple, generally a single node loads the whole model, there can be multiple prediction nodes at the same time, each load the whole model, so as to carry out multi-channel prediction. Many model independent data preprocessing algorithms are also in this mode. A single node can complete the operation, and multiple nodes can process in parallel.

We use our algorithm component to predict batch data or streaming data directly. Users also hope that we can provide SDK, that is, we can directly build a local Java instance from parameter or model data, which we call local predictor, to predict a single data. In this way, the prediction no longer has to be completed by the Flink task, and can be embedded into the prediction service system that provides the rest API, or embedded into the user’s business system.

This paper will take Chinese sentiment analysis scene as an example to introduce the specific use of alink local predictor. For how to build a pipeline and train a pipeline model, readers can refer to the following article:

How to use alink for Chinese sentiment analysis?

In the above example, after the model training is completed, it is directly used for prediction, which is completed in a task without model saving. We need to save the model in order to load the model in the local predictor of another application. Alink pipelinemodel provides a simple save method. It provides a file path and runs the save method. Note that the save method connects the model to the sink component, and you need to wait until the BatchOperator.execute () to actually write the model.

The Java code is as follows:

model.save("/Users/yangxu/alink/temp/sentiment_hotel_model.csv");

BatchOperator.execute();

The Python code is as follows:

model.save("/Users/yangxu/alink/temp/sentiment_hotel_model.csv")

BatchOperator.execute()

With the stored model, how to build a local predictor?

Through the load method of pipelinemodel, load the model data to get the loadedmodel, and then call the getlocalpredictor method to get the instance of localpredictor.

PipelineModel loadedModel = PipelineModel.load("/Users/yangxu/alink/temp/sentiment_hotel_model.csv");

LocalPredictor localPredictor = loadedModel.getLocalPredictor("review string");

Note: since the data to be predicted is also of row type, it is also necessary to describe the schema of the data column, that is, to enter an alink schema string format parameter. The parameter here is “review string”.

So far, the local predictor has been built. Before introducing how to use it for prediction, let’s have a deep understanding of the local predictor. Its input data is in row format, and its output result is also in row format, that is, it will contain multi column information. Using getoutputschema () method, we can get the schema information of the prediction result. For the local predictor we just built, use this method and print it.

System.out.print(localPredictor.getOutputSchema());

The running results are as follows

root
 |-- review: STRING
 |-- featureText: STRING
 |-- featureVector: LEGACY(GenericType<com.alibaba.alink.common.linalg.SparseVector>)
 |-- pred: INT

It can be seen that the prediction output has four columns, and the most critical classification prediction result column “PRED” is at the end.

The local predictor uses the map () method to predict. The specific code is as follows:

Row[] rows = new Row[] {
  Row.of "Yes, it should be recommended in hotels of the same grade! "),
  Row.of ("the room feels OK, but the quality of the towel used for washing is not good, I don't think it's cleaned, and the room's sound insulation effect is not good"),
  Row.of "The service attitude is mechanical, the expression is rigid, and the management is not humanized. If you leave 10 minutes late, you will be charged half a day's room fee."),
  Row.of "It's not easy to find hotel seats, but the front desk service needs to be improved. Other things are not bad."
};

for (Row row : rows) {
  System.out.print(localPredictor.map(row).getField(3));
  System.out.print("\t");
  System.out.println(row);
}

The results are as follows

1 yes, it should be recommended in hotels of the same grade!
I feel that room 0 is OK, but the quality of the towel used for washing is not good, I feel that it is not cleaned, and the sound insulation effect of the room is not good
The service attitude is mechanical, the expression is rigid, and the management is not humanized. If you leave 10 minutes late, you will be charged half a day extra
1. It's hard to find a seat in the hotel. The front desk service needs to be improved. Others are not bad

above. Alink is a machine learning algorithm platform based on Flink. Please visit alink’s GitHub for download links and more information. Welcome to join alink open source user group for communication ~

Link to alink GitHub:
https://github.com/alibaba/Alink

To join alink technology exchange group

How to use alink localpredictor easily?