Learn Flink from 0 to 1 – detailed explanation of Flink profile

Time:2021-11-29

Learn Flink from 0 to 1 - detailed explanation of Flink profile

In the previous article, we already know what Flink is. After installing Flink, let’s take a look at the configuration file under the installation path.

Learn Flink from 0 to 1 - detailed explanation of Flink profile

The installation directory mainly includes flink-conf.yaml configuration, log configuration file, ZK configuration and Flink SQL client configuration.

flink-conf.yaml

Basic configuration

#IP address of jobmanager
jobmanager.rpc.address: localhost

#Port number of jobmanager
jobmanager.rpc.port: 6123

#Jobmanager JVM heap memory size
jobmanager.heap.size: 1024m

#Taskmanager JVM heap memory size
taskmanager.heap.size: 1024m

#Number and size of task slots provided by each task manager

taskmanager.numberOfTaskSlots: 1

#Number of parallel computations by default
parallelism.default: 1

#File system source
# fs.default-scheme

High availability configuration

#You can select 'none' or 'zookeeper'
# high-availability: zookeeper

#File system path, allowing Flink to persist metadata in high availability settings
# high-availability.storageDir: hdfs:///flink/ha/

#The machine IP and port number of the arbiter in the zookeeper cluster
# high-availability.zookeeper.quorum: localhost:2181

#The default is open. If zookeeper security is enabled, the value will be changed to Creator
# high-availability.zookeeper.client.acl: open

Fault tolerance and checkpoint configuration

#Used to store and checkpoint status
# state.backend: filesystem

#The default directory where data files and metadata for checkpoints are stored
# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints

#Default destination directory for savepoints (optional)
# state.savepoints.dir: hdfs://namenode-host:port/flink-checkpoints

#Flag to enable / disable incremental checkpoints
# state.backend.incremental: false

Web front end configuration

#The address at which the web - based Runtime monitor listens
#jobmanager.web.address: 0.0.0.0

#Runtime monitor port for Web
rest.port: 8081

#Enable job submission from web-based jobmanager
# jobmanager.web.submit.enable: false

Advanced configuration

# io.tmp.dirs: /tmp

#Should taskmanager managed memory be pre allocated when taskmanager starts
# taskmanager.memory.preallocate: false

#In the order of class loading and parsing, first check the user code jar ("child first") or the application class path ("parent first"). The default setting indicates that the class is loaded first from the user code jar
# classloader.resolve-order: child-first


#The fraction of JVM memory used for network buffers. This determines how many streaming data exchange channels the task manager can have at the same time and the degree of channel buffering. If the job is rejected or you receive a warning that the system does not have enough buffers, increase this value or the minimum / maximum value below. Also note that "taskmanager. Network. Memory. Min" and "taskmanager. Network. Memory. Max" may override this score

# taskmanager.network.memory.fraction: 0.1
# taskmanager.network.memory.min: 67108864
# taskmanager.network.memory.max: 1073741824

Flink cluster security configuration

#Indicates whether to read from the Kerberos ticket cache
# security.kerberos.login.use-ticket-cache: true

#Absolute path to the Kerberos key table file that contains the user credentials
# security.kerberos.login.keytab: /path/to/kerberos/keytab

#The Kerberos principal name associated with the KeyTab
# security.kerberos.login.principal: flink-user

#A comma separated list of login contexts used to provide Kerberos credentials (for example, 'client, kafkaclient' uses credentials for zookeeper authentication and Kafka authentication)
# security.kerberos.login.contexts: Client,KafkaClient

Zookeeper security configuration

#Override the following configuration to provide a custom ZK service name
# zookeeper.sasl.service-name: zookeeper

#The configuration must match the list (with one) in "security. Kerberos. Login. Contexts"
# zookeeper.sasl.login-context-name: Client

HistoryServer

#You can start and shut down the historyserver through the bin / historyserver.sh (start|stop) command

#Upload completed jobs to the directory
# jobmanager.archive.fs.dir: hdfs:///completed-jobs/

#Address of Web-based historyserver
# historyserver.web.address: 0.0.0.0

#The port number of the web-based historyserver
# historyserver.web.port: 8082

#A comma separated list of directories used to monitor completed jobs
# historyserver.archive.fs.dir: hdfs:///completed-jobs/

#The time interval (in milliseconds) between refreshing the monitored directory
# historyserver.archive.fs.refresh-interval: 10000

View the other two slave / Master configurations

Learn Flink from 0 to 1 - detailed explanation of Flink profile

2、slaves

Inside is the IP / hostname of each worker node. After each worker node, a task manager will be run, one by one.

localhost

3、masters

host:port

localhost:8081

4、zoo.cfg

#Milliseconds per tick
tickTime=2000

#The number of ticks that can be used in the initial synchronization phase
initLimit=10

#The number of ticks that can be passed between sending a request and getting an acknowledgement
syncLimit=5

#Directory where snapshots are stored
# dataDir=/tmp/zookeeper

#The port to which the client will connect
clientPort=2181

# ZooKeeper quorum peers
server.1=localhost:2888:3888
# server.2=host:peer-port:leader-port

5. Log configuration

Flink log files running on different platforms

log4j-cli.properties
log4j-console.properties
log4j-yarn-session.properties
log4j.properties
logback-console.xml
logback-yarn.xml
logback.xml

sql-client-defaults.yaml

execution:
  # 'batch' or 'streaming' execution
  type: streaming
  # allow 'event-time' or only 'processing-time' in sources
  time-characteristic: event-time
  # interval in ms for emitting periodic watermarks
  periodic-watermarks-interval: 200
  # 'changelog' or 'table' presentation of results
  result-mode: changelog
  # parallelism of the program
  parallelism: 1
  # maximum parallelism
  max-parallelism: 128
  # minimum idle state retention in ms
  min-idle-state-retention: 0
  # maximum idle state retention in ms
  max-idle-state-retention: 0
  
deployment:
  # general cluster communication timeout in ms
  response-timeout: 5000
  # (optional) address from cluster to gateway
  gateway-address: ""
  # (optional) port from cluster to gateway
  gateway-port: 0  

Flink SQL client: you can learn from the official website herehttps://ci.apache.org/project…

summary

This article takes the configuration file under the installation directory file to explain all the configurations under the Flink directory.

You can also learn more here on the official website:https://ci.apache.org/project…

Pay attention to me

The address of this article is:http://www.54tianzhisheng.cn/2018/10/27/flink-config/

In addition, I have compiled some Flink learning materials, and I have put all the official account of WeChat. You can add my wechat: Zhisheng_ Tian, and then reply to the keyword: Flink, you can get it unconditionally.

Learn Flink from 0 to 1 - detailed explanation of Flink profile

Related articles

1、Learning Flink from 0 to 1 – Introduction to Apache Flink

2、Learning Flink from 0 to 1 — an introduction to building Flink 1.6.0 environment and building and running simple programs on MAC

3、Learn Flink from 0 to 1 – detailed explanation of Flink profile

4、Learning Flink from 0 to 1 – Introduction to data source

5、Learn Flink from 0 to 1 – how to customize the data source?

6、Learning Flink from 0 to 1 – Introduction to data sink

7、Learn Flink from 0 to 1 – how to customize data sink?

Recommended Today

On the mutation mechanism of Clickhouse (with source code analysis)

Recently studied a bit of CH code.I found an interesting word, mutation.The word Google has the meaning of mutation, but more relevant articles translate this as “revision”. The previous article analyzed background_ pool_ Size parameter.This parameter is related to the background asynchronous worker pool merge.The asynchronous merge and mutation work in Clickhouse kernel is completed […]