Use of Flink batch broadcast variable and ml mapwithbcvariable method

Time:2020-6-29

Method 1: using the Flink dataset API

points.map (New selectnearestcenter). Withbroadcastset (current centers, "centroids") // declares to broadcast the map operation
import scala.collection.JavaConverters._
final class SelectNearestCenter extends RichMapFunction[DenseVector, (Int, DenseVector)] with Serializable{
  private var centroids: Traversable[DenseVector] = null
  override def open(parameters: Configuration) {
    centroids = getRuntimeContext.getBroadcastVariable[DenseVector]("centroids").asScala
  }
  def map(p: DenseVector): (Int, DenseVector) = {
    //use centroids ...
  }
}

Method 2: Flink ml mapwithbcvariable method was used

points.mapWithBcVariable(currentCentroids) {
          (point, center) => {
            //Use broadcast variable center directly
          }
        }