Author: Zhang Jianfeng (Jian Feng)
Introduction:It has been 11 years since I started working in 2008. I have been dealing with data all the way. I have done a lot of development of big data underlying framework kernel (Hadoop, pig, tez, spark, Livy). Now, I am the PMC of several Apache projects.
I believe you have read a lot of Flink articles. Today, I would like to take this opportunity to talk about a simple question: why should we learn Flink, that is, how to rationally view whether a new technology is worth learning from.
So, is it because of a technology fire that you learned it? Did you learn it because your boss decided to use it? Have you ever thought about why this technology is so popular and why your boss decided to use it. Today, we will take Flink as an example to talk about why we should learn Flink and how to look at whether a new technology has potential. I hope it will enlighten you.
▌ Flink: the core of big data + AI stack
Most people’s first impression of Flink is big data engine, but I would like to say that Flink is not just big data. Flink’s target is the sea of stars.
Let’s first take a look at the big data scenario. This graph basically includes all the big data scenarios you need to deal with everyday. From the data producer on the far left, to data collection, data processing, and then to data application (BI + AI). You’ll find that Flink can be applied to every step. It involves not only big data, but also AI, so learning Flink is like learning big data + AI full stack.
Flink will connect IOT and AI to open up the whole link of end-to-end data value mining.
Flink’s upstream data scale will continue to grow, especially due to the development and maturity of IOT technology and the deployment of 5g technology in the future. In the foreseeable future, the data scale will continue to grow rapidly. Flink’s rich connector ecology enables Flink to connect to almost all data sources.
There is still a lot of room for the development of Flink’s downstream data industry. Bi technology has been very mature, but in recent years, the demand for real-time Bi is more and more strong, Flink has a very strong application in real-time Bi scenarios. In addition, the rapid development of AI technology will drive the development of big data engine. Flink itself is also developing machine learning related technologies. Last year, alink  is a machine learning library based on Flink, which allows ordinary software engineers without theoretical knowledge and engineering experience of machine learning to easily use machine learning technology.
▌ integration of batch and flow, the general trend
Flink is proud of its super streaming computing power. However, streaming processing is just Flink’s housekeeping skills, not all of his. The real strength of Flink is its batch flow unification.
We can still explain it with the picture above. You can see that each stage can be solved in two ways: batch flow. We can use Flink to solve the data collection task of batch processing, and we can use the method of stream processing to solve the problem of faster real-time. The same is true for data processing. We can use streaming ETL or batch ETL. To the data application layer, we can not only use batch processing to make daily dashboards, but also use stream computing to do real-time dashboards. In AI, we can not only do model training based on historical data in batch processing, but also do online learning in the way of stream processing to update the model in real time. In short, you will find that Flink’s batch stream fusion perfectly fits the end-to-end application of big data.
▌ multi language support, embracing Ai Community
People who have used Flink generally know that Flink has java and scala APIs, and many people know the classic wordcount example. Java and scala API are very friendly programming language tools for general software engineers, but for people in other fields (such as data analysts and data scientists), Java and scala are not very friendly languages. In order to attract these people to use Flink, Flink has launched SQL and python API to further reduce the use threshold of Flink. With the development of database technology for so many years, there are new technological innovations every few years. Only SQL has become the eternal database system entry language. SQL is a language with strong vitality. There is a strong ecosystem around SQL language. Most Bi tools and data analysis software can connect with SQL. Due to the popularity of AI, python language has a strong growth momentum recently, and the number of users is increasing day by day. It can be said that Python is the first language in AI field. With the launch of pyflink, the Flink community and python community are connected, and data scientists can use Flink’s computing power at a lower learning cost.
▌ it is not only a library, but also a platform
As an Apache project, Flink can be regarded as a library based on which users can develop various programs. But as a library, we only have a narrow understanding of Flink, and a more accurate understanding should be a platform. Users can expand the functions based on this platform and connect with external systems to establish a more perfect solution. For example, Zeppelin notebook  integrates Flink. Users can write Flink SQL and UDF on Zeppelin, run Flink job (batch & streaming), and visualize data. Small and medium-sized enterprises can use Zeppelin to build a big data platform. Verica platform is an enterprise level multi tenant Flink job management and control platform. You can easily submit and manage Flink jobs. Moreover, ververica platform is easy to connect with various cloud platforms, and it can perfectly and seamlessly connect with your existing application system based on cloud platform.
▌ not just China
As we all know, Alibaba continues to invest heavily in Flink, which makes Flink develop rapidly in China. Last year, Flink forward Asia was held in Beijing, attracting 2000 participants. But maybe many people don’t know that Flink is also developing very fast abroad. Flink also has European and American venues. Flink forward has become one of the few technology conferences that can be held on three continents. Besides Flink, spark and Kafka are probably the only big data fields.
▌ embrace cloud computing
At the beginning, Flink is prepared for the data center environment. Users need to build their own cluster environment, such as standalone, yarn or mesos, which poses a great challenge to the operation and maintenance ability of users. Every time the capacity is expanded and the version is upgraded, it is a headache. Now Flink has fully embraced the cloud environment, and its support for k8s is becoming more and more perfect. In the near future, we can expect more and more cloud native Flink applications. At the same time, it also puts forward new requirements for developers on cloud technology. Only by mastering cloud technology can we better play Flink’s ability.
The reason to put Flink on top of kubernetes is that it has the following advantages:
- First, kubernetes can bring a better experience to Flink in multi tenant scenarios.
- Second, major companies are gradually adopting kubernetes to manage it facilities. If Flink can run on kubernetes, users can achieve a larger scale of resource sharing and unified management, reduce costs and improve efficiency.
- Third, the primary ecology of kubernetes cloud is developing very rapidly. If Flink can integrate with kubernetes ecology well, Flink can enjoy the technological dividend of kubernetes ecology, and Flink can provide operation and maintenance guarantee in the production environment.
▌ not just now
——Learning Flink helps your career
Finally, I’d like to say that deciding to learn a technology may not only be a technical issue, but more realistically, it may be a career affecting issue.
Due to the development of Flink technology, the demand of Flink related technical personnel is also increasing day by day. According to the situation of participating in Flink forward Asia last year, almost all the first and second tier Internet companies in China have adopted Flink. We can expect that other Internet companies and some non internet companies will adopt Flink one after another in the next few years. Talents related to Flink should become the target of many companies’ scramble in the next few years. The following is an example of some representative companies using Flink at home and abroad.
Learning Flink is not only for the present, but also for the accumulation and reserve of technology in the future. As mentioned above, Flink not only has a solid foundation in streaming computing, but also makes efforts in other fields. It also embraces future oriented technologies (especially AI and Cloud Computing). Therefore, learning Flink is not only for the present, but also for the future.
Another thing I want to say is that learning Flink is not just about learning Flink itself. You can also broaden your horizons, learn other technologies, and make a lot of like-minded friends. Due to the powerful ecosystem of Flink itself, you can learn about other fields, such as IOT, cloud native, AI and so on. In addition, Flink community is booming, and a large number of learning materials and talents have been accumulated in China. The number of nailing groups of Flink has exceeded 15000, and the Flink index on Baidu Index has surpassed spark. In Flink community, you can learn a lot from others. I believe that you will have unexpected surprise in Flink community.
To sum up, it is the author’s analysis of why to learn Flink. The emergence of emerging technologies is day and night. The original intention of this paper is not to blindly learn a technology, but to think more and gain more in the future. That’s all for this sharing. Thank you.
The author introduces:
Zhang Jianfeng (Jian Feng), a veteran of the open source industry, GitHub ID: @ zjffdu, Apache member, once worked in hortonworks. At present, he is a senior technical expert in Alibaba computing platform business department. He is also the PMC of Apache tez, Livy and Zeppelin open source projects, and the committer of Apache pig. Fortunately, I have been in touch with big data and open source very early. I hope I can make some contribution to big data and data science in the open source field.