Learning Flink from 0 to 1 – Introduction to data sink

Time:2021-11-27

Learning Flink from 0 to 1 - Introduction to data sink

preface

In the last articleLearning Flink from 0 to 1 – Introduction to data sourceAfter explaining the Flink data source, let’s talk about Flink data sink here.

First, sink means:

Learning Flink from 0 to 1 - Introduction to data sink

You can probably guess! Data sink means storing data.

Learning Flink from 0 to 1 - Introduction to data sink

As shown in the figure above, source is the source of data. In the middle, compute is actually what Flink does. It can do a series of operations. After the operation, sink the calculated data results to a certain place. (it can be mysql, elasticsearch, Kafka, Cassandra, etc.). Here, I’m talking about my current alarm, which is to directly alarm the result sink calculated by compute (send alarm messages to nail groups, emails, SMS, etc.). This sink does not necessarily mean to store data somewhere. In fact, it is more appropriate to use the connector on the official website to describe the place to go. This connector can include mysql, elasticsearch, Kafka, Cassandra rabbitmq, etc.

Flink Data Sink

Previous articleLearning Flink from 0 to 1 – Introduction to data sourceThis paper introduces what Flink data source is, and here’s a look at what Flink data sink supports.

Learning Flink from 0 to 1 - Introduction to data sink

What are the source code?

Learning Flink from 0 to 1 - Introduction to data sink

You can see the sink methods of Kafka, elasticsearch, socket, rabbitmq, JDBC, Cassandra POJO, file, print, etc.

SinkFunction

Learning Flink from 0 to 1 - Introduction to data sink

As can be seen from the above figure, the sinkfunction interface has an invoke method, which has a richsinkfunction abstract class.

We can see that the above self-contained sink inherits the richsinkfunction abstract class and implements its methods. If we define our own sink, we should actually follow this routine.

Here is a simple printsinkfunction source code:

@PublicEvolving
public class PrintSinkFunction<IN> extends RichSinkFunction<IN> {
    private static final long serialVersionUID = 1L;

    private static final boolean STD_OUT = false;
    private static final boolean STD_ERR = true;

    private boolean target;
    private transient PrintStream stream;
    private transient String prefix;

    /**
     * Instantiates a print sink function that prints to standard out.
     */
    public PrintSinkFunction() {}

    /**
     * Instantiates a print sink function that prints to standard out.
     *
     * @param stdErr True, if the format should print to standard error instead of standard out.
     */
    public PrintSinkFunction(boolean stdErr) {
        target = stdErr;
    }

    public void setTargetToStandardOut() {
        target = STD_OUT;
    }

    public void setTargetToStandardErr() {
        target = STD_ERR;
    }

    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        StreamingRuntimeContext context = (StreamingRuntimeContext) getRuntimeContext();
        // get the target stream
        stream = target == STD_OUT ? System.out : System.err;

        // set the prefix if we have a >1 parallelism
        prefix = (context.getNumberOfParallelSubtasks() > 1) ?
                ((context.getIndexOfThisSubtask() + 1) + "> ") : null;
    }

    @Override
    public void invoke(IN record) {
        if (prefix != null) {
            stream.println(prefix + record.toString());
        }
        else {
            stream.println(record.toString());
        }
    }

    @Override
    public void close() {
        this.stream = null;
        this.prefix = null;
    }

    @Override
    public String toString() {
        return "Print to " + (target == STD_OUT ? "System.out" : "System.err");
    }
}

You can see that it implements the richsinkfunction abstract class, and then implements the invoke method. Here, the invoke method prints the records without other additional operations.

How to use?

SingleOutputStreamOperator.addSink(new PrintSinkFunction<>();

This is OK. If it is another sink function, it needs to be replaced with the corresponding one.

The effect of using this function is to print the data from source, which is the same as that of direct source. Print ().

Learning Flink from 0 to 1 - Introduction to data sink

In the next article, we will explain how to customize your own sink function, and use a demo to teach you, let you know this routine, and be able to customize your own sink function in your work to meet your own work needs.

last

This paper mainly talks about the data sink of Flink, introduces the common data sink, also looks at the sinkfunction of the source code, introduces the use of a simple function, and tells you the routine of customizing the sink function. The next article will take you to write a.

Pay attention to me

Please indicate the original address for Reprint:http://www.54tianzhisheng.cn/2018/10/29/flink-sink/

In addition, I have compiled some Flink learning materials, and I have put all the official account of WeChat. You can add my wechat: Zhisheng_ Tian, and then reply to the keyword: Flink, you can get it unconditionally.

Learning Flink from 0 to 1 - Introduction to data sink

Related articles

1、Learning Flink from 0 to 1 – Introduction to Apache Flink

2、Learning Flink from 0 to 1 — an introduction to building Flink 1.6.0 environment and building and running simple programs on MAC

3、Learn Flink from 0 to 1 – detailed explanation of Flink profile

4、Learning Flink from 0 to 1 – Introduction to data source

5、Learn Flink from 0 to 1 – how to customize the data source?

6、Learning Flink from 0 to 1 – Introduction to data sink

7、Learn Flink from 0 to 1 – how to customize data sink?