How to debug Flink source code

Time:2021-11-23

Original address:How to debug Flink source code
Recently, I’m working on a real-time computing platform based on Flink, and I’m going to write a blog to record some daily work.
This article is mainly about how to debug the source code of Flink, taking the standalone mode as an example.

environment

The establishment of the Flink standalone cluster environment depends on the official documents, which will not be repeated here. Let’s talk about the preparation of the local environment:
IDE: IntelliJ idea (Scala plugin and SDK need to be installed)
java version: 1.8.0_92
flink version 1.5.4

Open IDE and import the project as follows:
How to debug Flink source code

Source debugging

Debugging through single test code

The simplest way to debug the source code is to pass the single test code. There is a corresponding test code under each module. This method is the simplest, but it is simulated after all.

Remote debugging

First, to enable remote debugging, we need to add a JVM startup parameter, which can be found in the following figure
How to debug Flink source code

That is – agentlib: jdwp = transport = DT_ socket,server=y,suspend=n,address=5005
The port number depends on the time. If you want to start debugging from the startup method, you need to set suspend = y

So where does the startup parameter need to be added to the flick?

Let’s look at the startup script and find config.sh from start-cluster.sh. We can find that Flink will add the following three configuration items to the specified JVM startup parameters:
How to debug Flink source code

Env.java.opts: parameters added during jar package startup, applicable to jobmanager and taskmanager;
Env.java.opts.jobmanager: startup parameters of jobmanager;
Env.java.opts.taskmanager: startup parameters of taskmanager;

These three configurations can be configured in flink-conf.yaml, and our remote debugging parameters should be configured to the latter two and use different ports,
If it is configured in env.java.opts, a port occupancy conflict will occur when jobmanager and taskmanager are started.

Add in flick-conf.yaml:
env.java.opts.jobmanager: -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005
env.java.opts.taskmanager: -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5006

Then start the cluster
./start-cluster.sh

The next steps in idea are as follows:
How to debug Flink source code

How to debug Flink source code

How to debug Flink source code

How to debug Flink source code

Finally, the standalone cluster mode,
The entry class of jobmanager is org.apache.flink.runtime.entrypoint.standalonesessionclusterentrypoint,
The entry class of taskmanager is org.apache.flex.runtime.taskexecutor.taskmanagerrunner,
Find the main method and hit the breakpoint to start debugging.