Apache Flink Li Yu: “to be a Chinese community, the first problem to be solved is to lower the threshold of entry.”

Time:2020-9-29

Apache Flink Li Yu:

On August 16, ALC Beijing’s first offline salon activity – “how difficult is open source? 》It was held in Microsoft building as scheduled. This salon is mainly to share the experience of open source development, explore how to make open source projects more robust, and share the success of ASF in managing and operating open source projects.

Jiang Ning, member of Apache Software Foundation, mentor of Apache Software fund incubator and initiator of ALC Beijing; Li Yu, PMC member of Apache Flink community; Guo Wei, project director of Apache dolphin scheduler and CTO of Yiguan; sun Jincheng, ASF member and Alibaba tech lead (flower name: Jinzhu); Wen Ming, Apache apiix PMC member, founder of Shenzhen Branch Technology; ALC Beijing Member and open source evangelist Li Jiansheng shared and discussed in this activity.

In the activity, Li Yu, PMC member of Apache Flink community, shared the topic “development and challenges of Apache Flink Chinese community”. The following text version is sorted out.

Development and challenges of Apache Flink Chinese community

Hello, everyone. Today, on behalf of the students of Flink Chinese community, I would like to introduce to you the challenges of the community in the process of development, as well as our measures to deal with these challenges.

Big data and AI are two hot directions in the field of Internet and computer. In the direction of big data, in addition to the amount and diversification of data, the 4V model is mainly developed to real-time, so as to obtain more value.

Apache Flink Li Yu:

Flink, which I share today, is generally defined as a third-generation stream computing engine.

There are three generations of stream computing engine from 2010 to now. The first generation mainly aims at real-time, but there are some defects in data consistency and correctness. The famous lambda architecture is real-time through storm, but it still needs batch calculation to correct the results.

Apache Flink Li Yu:

The second generation is usually represented by spark streaming, which corrects the defect of consistency, but it still has defects in real-time. As the third generation engine, Flink is a stream based computing engine, which supports the architecture of flow batch integration.

The development of Flink is similar to spark, both originated from universities and research institutes. Spark was born in amplab of Berkeley University in 2009, and Flink was also a research project initiated by Berlin University of technology in 2009 and other institutions, and became the top project of Apache in 2014. It has been 10 years since its development. On the whole, the development of Flink is similar to spark.

Apache Flink Li Yu:

We can see that Flink is actually an open source project originated in Europe and the United States, so the process of building Chinese community is actually a process of localization of European and American projects. Here we extend the open source projects. I personally think that they can be divided according to the origin. One is the European and American origin and localization in China, and the other is the community of our local origin. The challenges and difficulties faced by these two types of projects are different.

Let’s take a look at the application scenarios of Flink, mainly for some real-time scenarios, such as real-time large screen, risk control, advertising, real-time recommendation, video analysis, etc.

Flink is now widely used in China. This is a logo wall. We can see that many front-line companies, including bat and TMD, are using it.

Apache Flink Li Yu:

As we mentioned earlier, this is a community we set up to localize Apache Flink in China. The community has also been officially authorized by the Apache Flink project and officially opened in June 2018, during which a lot of offline activities have been organized.

In terms of operation ideas, we think that the localization work of a European and American community needs a special team to operate. Therefore, there is a special operation team behind the Flink Chinese community. The actual work corresponding to these contents introduced by us is completed by the operation students. In the team, each person has his own responsibility.

After introducing these basic information, what are the main challenges of localization of a project?

First of all, it is very significant because this is a project originated from Europe and the United States. Before the establishment of Flink Chinese community, both introductory materials and video tutorials were mainly in English, lacking Chinese content. This leads to the domestic students in learning relatively difficult.

On the other hand, in 2018, the concept of real-time computing was in an early stage in China, and everyone was just beginning to consider whether to transform from lambda architecture.

Because Flink has a relatively high learning threshold, leading to the lack of a centralized learning and communication platform, the content deposited on the network is relatively scattered, and this further increases the threshold for beginners, forming a vicious circle. We set up a Chinese community mainly to solve these problems.

So what are the approaches we have taken? I think there are mainly two parts.

In the first part, we have made a precipitation for developers, giving us a relatively smooth way of communication. On the one hand, we can communicate with each other. On the other hand, we can deposit our questions and answers, as well as the contents of learning and sharing. I think this is very important.

On the other hand, we will organize online or offline activities to expand the whole circle. The first part is developers. We hope that developers can have a lower and easier entry threshold, and more developers will join the Flink project. On the other hand, it is aimed at users. Users can get more cases and materials and can use Flink more conveniently.

In fact, there can be a positive feedback between the two. The more users we use, the greater the sense of achievement of our developers; and the more developers we have, the more confident our users will be in using the software. We can increase the number of users on the right by means of the left, thus turning the vicious circle into a positive one.

Next, we will introduce our practice and difficulties, as well as how to solve them.

On the whole, we have done the following four things in the part of developer precipitation:

First of all, we use nails to create a community communication group. This group is mainly used for technical live broadcasting, and to give you a more convenient space for communication.

Second, we have created a public official account for WeChat. The official account will have some relatively new information about community and related technical articles.

Third, Chinese official account and live broadcast have been organized into a Chinese learning website, which is a unified entrance to solve the problems of less Chinese data and scattered content.

Fourth, we have created a Chinese file list through the official channel of Apache Flink project, so that we can deposit the information related to Q & A.

Let’s take a look at the pitfalls in each of these four events and what difficulties we have encountered.

Because Apache projects emerge in endlessly, if other partners also want to promote and implement some new European and American projects, we can avoid the difficulties we have encountered or stepped into before.

First of all, in the part of establishing the learning website, I think it is a very typical process of localizing projects in Europe and the United States. We have also made some typical mistakes here, so I would like to introduce it to you.

First of all, when we first created this website, we didn’t get the official authorization, that is, the operation was not in compliance. Our name was Flink China, so we got a warning email from the Apache Software Foundation.

This is a typical problem. We have a keen desire to do a good thing, but we still need reasonable and legal ways and means. The Apache Software Foundation has a very full process, and it also encourages people to localize and promote. Therefore, after receiving the warning from the website, we start to communicate with the community and apply for official authorization, including the operation of the website and other related situations. We will interact with the community.

In this process, there are still many relevant requirements for the conditions and standards of the website, such as what conditions are needed for building a website, what process is needed to use the logo of Apache project, and so on. This is the first stage of the website. Although the website has been created, it has experienced the loss of users due to the non-compliance of operation, which is also a pit that we have stepped on. However, it is also very important for us, which has prompted us to interact more frequently with the official community in the process of subsequent Chinese community construction.

The second pit is that we put the server in Hong Kong, China, but not in the mainland. At that time, the consideration was that the Chinese website was not only read by students from mainland China, but also by some overseas students. However, the problem with the server outside was that it would be blocked. This is also an experience. At that time, our daily traffic reached 5000, but we had to change a domain name.

The third is the domain name, ververica.cn 。 The first problem here is actually that when a company supports its operation, it uses the commercial name of verica, which contains both open-source content and some commercial information in this domain name. Therefore, the positioning is not clear.

On this point, we also discussed with Flink’s official community. Since July this year, we have launched a brand-new Flink- learning.org.cn Domain name. This Flink learning website is a domain name officially approved by the community, and it has two different versions in Chinese and English. It mainly provides Flink learning related materials, which are divided into technical materials and enterprise practices, including text and video. In addition, it will carry some activities of the community.

So we from the construction of learning website, there is a very typical process. From the beginning, we have some “guerrilla” style of play, and then gradually accumulate experience, change to a regular way.

The second aspect, I think, is the construction of nail groups. Of course, there is no problem with wechat group, but one of the advantages of pinning is that it can launch and carry out live broadcasting at the same time. Now the whole Chinese community is on the nail group. The two groups should have about 15000 people in total. We are mainly used for Q & A, live broadcast and interaction, including some online and offline information transmission, which is the purpose of our construction of nail group.

The third aspect is the establishment of a Chinese mailing list. We found that many students in the nail group asked technology related questions, but there is a problem with this kind of instant messaging group, that is, there are more people chatting and swiping the screen, and the information is mixed. It is not particularly reasonable for us to use nail group to precipitate and accumulate Q & A.

It’s very difficult to find a problem by searching chat records, and chat records are limited. So we discussed with the official community that if there was only an English mailing list, many students with different languages might not want to ask.

Therefore, we have established a Chinese mailing list to make technology precipitation, and let domestic developers or users feel more official sense of participation, instead of saying that the Chinese community is just playing its own game, which is out of touch with the official community. We can see that more than 1600 related topics have been settled in Flink’s Chinese mailing list. If you are interested in Flink, you can also go to the above for some questions.

The fourth aspect is to organize some community activities. The three aspects mentioned above are mainly to lower the threshold of entry and establish a way of communication and a position for everyone.

Community activities are another dimension, which aims to expand the influence of the whole relevant community and let more people know that Apache Flink is a very advanced computing engine. In addition, it is also a chance for us to have face-to-face communication.

Due to the impact of the epidemic, we can only communicate online, and the effect is good. However, we believe that people are animals of a society, and offline communication is necessary. Therefore, we try to promote and expand the whole community through a series of activities. We organize community activities in the following ways, you can also refer to.

The first is an official Apache conference. In order to promote its influence in the world, Apache has different branches in the world, such as European Conference, Asian Conference and so on. Official meetings are a very important channel, because the words “official” will make people feel more formal and willing to participate. In addition, we will organize some small meetup meetings. We don’t need too many people. We can have some simple exchanges.

The second way is that we will organize some challenges. Whether it’s a conference or meetup, it mainly introduces the future development plan of the community, new functions, or the use experience of large factories, the problems encountered, and the solutions, etc. These are more user oriented.

Our challenge is more for developers and students who have not graduated. Students are the future of the industry or the motherland. In fact, we are also training developers. In order to make the challenge have better effect, we will also do some training camp. These are equivalent to a related promotion of Flink’s own technology.


After a series of work above, we got some good results.

Apache Flink Li Yu:

This figure shows the growth rate of GitHub star of Apache Flink after the establishment of the Chinese community. This is not to say that it is only because of our promotion that we will star our project, but this can also reflect from a side that the influence of a Chinese community on the whole Apache official community will be improved.

In addition, we can see that the number of visits to Flink’s official website in China has surpassed that of the United States. If you go to Google or Baidu trend, you can also see that Flink’s popularity in China is very high.

In the latest ASF fy2020 report, we can see that Flink should be the most active Apache mailing list for the third consecutive year, and the number of visits and submissions ranked second among all projects. This is an achievement we have achieved.

Now we look back and find that to be a Chinese community is equivalent to localizing projects in Europe, the United States or overseas. The first problem to be solved is how to lower the threshold for entry.

The means we take is to establish social groups, let developers and users have a communication position, and then organize some corresponding live broadcast, so that we can have a way to acquire relevant knowledge of Chinese.

In addition, through the official account of WeChat, we spread some technical articles, and also set up a Chinese learning website. But if it is not a specialized organization, the latter team may be relatively difficult.

You can also do the precipitation through official account plus B station and other video websites. Because I think the most important thing is how to precipitate data, and there is a unified entrance for some beginners or enthusiasts to get information conveniently. There is also a very important point here, that is, we must go through official channels and contact with the official community.

The other is to organize a series of online and offline activities for our users and developers, so as to continuously expand the influence of the project in China and form a positive cycle of synchronous expansion of developers and user groups.

Finally, with a little advertisement, Apache Flink Chinese community will organize a series of community activities in the second half of this year, including the challenge we are currently carrying out, as well as meetup in various places. I hope you can join us.

In addition, our community is also recruiting some volunteers, and we are looking forward to more students to help with English translation and technical articles sorting of Flink community.

Related reading:
Activity review: ALC Beijing’s first meetup: how difficult is it to open source? “
Guo Wei, CTO of Yiguan: open source is not a dessert for genius, but a feast for diligent people

Apache Flink Li Yu:

More guests’ sharing and round table discussions will be released in the segment fault community and the official ALC Beijing channel in the near future. Please look forward to it.

As a media partner of this event and ALC Beijing, segmentfault has always attached great importance to the communication and construction of open source culture and open source ecology. In May this year, it launched the “sfossp – Open Source Project Support Plan”, which has helped more than a dozen open source projects to promote.

In the future, we will work with ALC Beijing and other open source communities, open source project teams and open source practitioners to create an open source ecosystem and solve practical problems in the process of open source. Please look forward to it.

Apache Flink Li Yu: