Tag:Parallelism

  • Flink — running architecture

    Time:2020-11-20

    Task submission process Flink submits the task, and the client uploads Flink jar package and configuration to HDFS Then submit the task to the Yan ResourceManager The ResourceManager allocates the container resources and notifies the corresponding nodemanager to start the applicationmaster. After the applicationmaster starts, it loads Flink jar package and configuration to build the […]

  • Research on uneven allocation of link keyby in subtask

    Time:2020-11-4

    Recently, in the real-time data migration of large amount of data, keyby hash is frequently used to balance the data. However, it is found that the amount of data executed by subtask is not very balanced, which leads to frequent timeout of checkpoint. Therefore, we began to look for solutions Background of the problem Use […]

  • Flink system — task execution — tasks — parallelism

    Time:2020-10-27

    Flink parallelism: Priority: operator level > environment level > client level > system level Operator level (operator level) final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<String> text = […] DataStream<Tuple2<String, Integer>> wordCounts = text .flatMap(new LineSplitter()) .keyBy(0) .timeWindow(Time.seconds(5)) .sum(1).setParallelism(5); wordCounts.print(); env.execute(“Word Count Example”); Copy code Operators, data sources, and data sinks can call the setparallelism() method to […]

  • Apache Flink infrastructure and concepts

    Time:2020-10-15

    cxhfuujust https://www.cnblogs.com/cxhfu… Apache Flink is an open source computing platform for distributed data stream processing and batch data processing. It can support streaming and batch processing applications based on the same Flink runtime. The existing open source computing solutions take streaming and batch processing as two different application types, because they provide totally different SLAs: […]

  • Top 10 reasons to migrate from Oracle to PostgreSQL

    Time:2020-9-29

    Author: Paul namugPaul namuag has been able to hold various positions and has benefited from the opportunity to use various technologies over the past 18 years. He has been a graphics artist and Ms. net developer since 2005, moving to open source technology and a web developer using lamp stack. Later, he was a software […]

  • Link operator – data fl ows datasource

    Time:2020-8-17

    Data fl ows datasource Flink supports many data sources, such as HDFS, socket, Kafka and collections. Flink also provides addsource mode to customize data sources File Source Create a data source by reading local, HDFS filesIf you are reading a file on HDFS, you need to import Hadoop dependencies <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.5</version> </dependency> import […]

  • About Flink

    Time:2020-8-15

    Flink’s world view is data flow For Flink, the main scenario to be processed is stream data. Batch data is only a limited special case of stream data. Therefore, Flink is also the computing engine of real streaming batch system Unbounded streams?Bounded streams? Unbounded flow and bounded flow stateful computations? Stateful calculation: each time data […]

  • Getting started with Flink through wordcount, understanding Flink infrastructure, and getting started with Flink (1)

    Time:2020-8-11

    Hello, I am later, I will share with you the learning and work experience, hoping that I have an opportunity to help you with a certain article. All articles will be launched in the official account. Welcome to my official account.Later, X big dataThank you for your support and recognition. A few days ago, the […]

  • Window and time semantics of Flink, watermark mechanism, detailed explanation of multiple code cases, introduction to Flink learning (3)

    Time:2020-7-31

    Hello, I am later, I will share with you the learning and work experience, hoping that I have an opportunity to help you with a certain article. All articles will be launched in the official account. Welcome to my official account.Later, X big dataThank you for your support and recognition. Through the study of the […]

  • Analyzing spark streaming & tispark of spark data partition

    Time:2020-3-31

    This article is from oppo Internet technology team, the third in a series of articles on “analyzing spark data partition“. In this article, we will analyze the data partition in spark streaming, tispark. Series 1: analyzing Hadoop partition of spark data partition Series 2: analyzing spark RDD partition of spark data partition Series 3: analyzing […]

  • How to use multithreading in Flink operator to avoid losing data?

    Time:2020-2-12

    By analyzing the pain points, optimizing synchronous batch requests to asynchronous requests, multithreaded client mode, multithreaded implementation in Flink operator, and summarizing the four parts, I will help you understand the optimization of using multithreaded in Flink and how to use multithreaded in Flink operator to ensure no data loss. Analysis of pain points There […]

  • Spark data skew and its solution

    Time:2020-2-8

    This article starts with vivo Internet technology WeChat public number https://mp.weixin.qq.com/s/lqMu6lfk-Ny1ZHYruEeBdA. About the author: Zheng Zhibin, graduated from computer science and Technology (bilingual class) of South China University of technology. Has been engaged in e-commerce, open platform, mobile browser, recommended advertising and big data, artificial intelligence and other related development and architecture. At present, I […]