Code! Flink contributor Quick Guide

Time:2020-11-12

This article is organized from the Apache Flink PMC Wu Jie (Yunxie) live sharing, aiming to provide some experience and process of participation and contribution for students who have a certain foundation of big data and are interested in the development of Flink community.

Why participate in the open source community

As an Apache Flink PMC, Yunxie summarized three main reasons for participating in the development of open source community according to his own experience.

Code! Flink contributor Quick Guide

1. Open source spirit

“Freedom” can be said to be the core of the open source spirit. Freedom means unrestrained communication, sharing and the collision of ideas in the world. “Take me personally, I happened to experience the stage of Hadoop and spark fire in my college stage. At that time, I especially looked forward to open source, especially worshipped the God who can read the source code well, and especially hoped that I could write a lot of open source code one day, so that the code I wrote could be used by tens of thousands of users. So for me, participating in open source is like a hobby, willing to pay time and effort for it. I was also lucky to be in touch with the open source community after graduation. “

2. Technological growth

Participating in open source is a good way to improve the quality of personal code. The open source community is very demanding on code and design, unlike some internal projects that are relatively casual. For design, the Flink community has a dedicated flip mechanism, and any significant contribution will be discussed openly and carefully. For the code, Flink CTO personally wrote a 26 page code style guide [1]. In addition, after submitting PR, Flink CTO also received the review suggestions from the committers. Therefore, continuous contribution of code in the open source community has greatly improved personal system thinking ability and code ability.

3. Career planning

If you are preparing to change jobs or promote within the company, besides the existing title, the experience of participating in the open source community is definitely a bonus item, because the open source products are labeled with Internet Celebrities, and participating in them will help you improve your influence and get to know the big bulls in the same industry. Open source contribution not only can directly reflect your code ability, but also can prove your enthusiasm, perseverance and soft skills in communication and cooperation (because it requires you to continuously complete high-quality contributions, work with other members of the community, and maintain an open and friendly attitude when you intend to disagree, etc.).

How to become a contributor

1. Contribution path

Whatever the original intention, Flink is very welcome to build and improve the community together. Before we start the specific contribution steps, let’s briefly introduce several ways to participate in contribution, as well as some inherent impression of clarify about open source contribution.

Code! Flink contributor Quick Guide

When it comes to open source contributions, the intuitive response is to contribute code. However, in Flink community, there are many ways to participate in contribution, including documentation, translation, Q & A, testing, and code, and the community puts document contribution first and code contribution last. Because the attitude of the Apache community towards code contribution is Community Over CodeThe importance of Flink is not explained in person.

Code! Flink contributor Quick Guide

Why? Because the benign development of open-source projects does not simply rely on the crazy code, the source code of open-source projects without communities will always stay in the stage of “narcissism”. “I have an idea, this is my code” may be the worst way to contribute, because in the absence of any documents, the committers have to try to understand the intention of the contributors through the code. This kind of reverse derivation often consumes the extra time and energy of the committers and the contributors themselves, resulting in very high communication costs and longer code consolidation cycle.

On the other hand, the lack of strict code review mechanism and standardized pull requst process will lead to a significant decrease in the quality of open source code. This is why a complete set of mechanisms is required for finding typo and simple bugfix, while detailed design documents and voting are required for the refactoring of a module and the addition of features. The proportion of this part of the work is very high, so it’s natural to get to the stage of writing code. And perfect detailed documents, timely and accurate Q & A, all kinds of technology blog can create high-quality community ecology, attract more users to participate in the use, and then feed back the community.

Take Konstantin and Seth, who have recently become members of the committee, for example, that their main contribution is documents. This also shows that Flink PMC Committee has recognized and attached great importance to the contribution of documents. In particular, the threshold for contribution of Chinese documents (translation) is relatively low, as long as they have a certain English foundation& The ability of text expression is enough, which is the best choice for beginners to start open source contribution. The Flink community is currently recruiting translators. The specific process of translation will also be described in detail below.

2. Preparation

  • Subscribe to mailing list

Flink community discussions are mainly done through email, so the first step in contributing is to join the mailing list to get the latest discussion information. The main mailing lists are user mailing lists( [email protected] & [email protected] )And developer mailing lists( [email protected] )。 For more information about mailing lists, please refer to [2]. Send an email to the corresponding mailing list and reply to the confirmation message to subscribe.

E. G. if you want to subscribe to the developer mailing list, send a content free email to [email protected] The community will reply to an email asking if you want to join, and then reply to confirm.

There are a lot of emails in Flink community every day. Effective filing can help you quickly locate relevant topics. Yunxie shared his Gmail receiving rules [3] here for reference.

Code! Flink contributor Quick Guide

  • Focus on JIRA module

Flink community manages all issues through JIRA, so we need to have a JIRA account before we start contributing. Although JIRA does not support focusing on a particular module, we can use JIRA filters to track the modules we are interested in. The operation steps are as follows:

  1. Switch to the JIRA issues page and switch the search box from basic to advanced mode.
  2. Add filters that you are interested in. For example, taking Chinese translation as an example, component = Chinese translation will screen out all translation related issues, and resolution = unresolved and assignee in (empty) will delete the translation tasks that are available and have not been assigned to other people. Complete filtering conditions: Project = link and component = Chinese translation and resolution = unresolved and assignee in (empty) order by updated date DESC, and then click save. In addition, you can create multiple filters according to the modules you are interested in to facilitate subsequent use.
  • At present, only the Committee in Flink community has the right to assign issue to itself, so if a contributor wants to solve it, he can leave a message under issue to apply for assignment. In general, if it is a simple typo or bugfix, the committer will assign it directly. However, if more complex changes or new feature implementations are involved, it is necessary to explain the implementation scheme at the code level when applying, and assign only after reaching an agreement with the Committee. In addition, you can click watchers on the issue page to add a follow-up. Any updates to this issue will be sent to the email address used when registering for JIRA.

Code! Flink contributor Quick Guide

  • Fork Flink repository & Download Flink source code

First you need to have a GitHub account, and then open Flink’s GitHub home pagehttps://github.com/apache/flinkClick the fork button to create an image in your own private warehouse.

Then in the local clone Flink warehouse, it is used to synchronize the master code git clonehttps://github.com/apache/fli… ${your-local-dir}。

Add ${remote for your own development https://github.com/ ${your-github-id}/ flink.git 。

E. G. git remote add myhttps://github.com/wuchong/fl…

Start your first pull request Tour

For starters who participate in the community, translation module is usually the highest choice of “ROI”. Because it’s not only easy to use but also covers the standard contribution process, it takes you minutes to become an Apache Flink contributor. Next, we will show the complete pull request (PR) process through a Chinese translation example. However, before taking off, we need to understand the translation norms [4]. Here we briefly summarize three points:

  • Translation using plain text tools
  • There should be spaces between Chinese characters and English and numbers
  • Chinese document links need to add zh adaptation after the corresponding English document baseurl

After the completion of the above preparatory work, we will enter the exciting stage of actual combat.

Step1:Apply to be an assignee of a JIRA issue. Since this is a demo task, we open the translation task “link-17939 translate” Python table API installation “page into Chinese [5], and assign it to ourselves (Yunxie).

Code! Flink contributor Quick Guide

Step2:Start work & review the content to be submitted. Note that all documents are suffixed with. MD, and the Chinese document name will have the ZH identifier. In the initial state, the contents of Chinese documents are in English. We switch to the local warehouse, switch to the doc directory, find the document to be translated, and then we can start to work according to the translation specification.

Code! Flink contributor Quick Guide

After the translation work is completed, it is better to render locally and check the effect before submitting. The method is as follows:

Switch to the docker directory under docs and start the docker environment

cd ${your-local-dir}/flink/docs/docker./run.sh

After that, it takes 1 to 2 minutes to compile the local document

./build docs.sh -p

Then open itlocalhost:4000After switching to the Chinese version, you can check the rendered document, such as the typesetting format and whether the hyperlinks in the page can be opened normally, etc. after confirmation, the document can be submitted. Note that hyperlinks to other documents should be adapted in Chinese.

Code! Flink contributor Quick Guide

Step3:Preparation for submission phase. The best practice is to create a branch for commit and commit changes to that branch. For example, here we create a branch called installation translate and switch to it.

git checkout -b installation-translate

Flink community has certain requirements on the format of commit message, which is generally [${JIRA issue ID}] [${affected component}] ${JIRA issue Title}.

Take demo as an example, which is link-17939 translate “Python table API installation” page into Chinese

After the local submission, you can push the changes to the remote private warehouse of your fork by using the following command.

git push my installation-translate

Step4:Prepare pr. After pushing the changes to the remote warehouse of its own fork, GitHub will automatically create a new PR and return to the PR page link [4]. On this basis, the following information needs to be filled in, so that the reviewer can quickly understand the PR to be reviewed and improve the merging efficiency.

  • What is the purpose of the change
    Generally speaking, it can be described by JIRA issue description
  • Brief change log (what changes have been made to the commits involved in PR)
    Fill in this as needed. For example, you can write transleflink / Doc / DVE / table / Python for translation tasks/ installation.zh.md 。 If it is a more complex change involving multiple commits, it is better to briefly summarize the contents of each submission according to the order of submission, and attach the commit log link.

The last three are multiple-choice questions, please check them as needed.

  • Verifying this change
  • Does this pull request potentially affect one of the following parts
  • Documentation (does the change require a new document)

Code! Flink contributor Quick Guide

Step5:Wait for the committee review. When you refresh the JIRA page, you can usually see the PR update on issue links. Generally speaking, the Committee assigned to us by JIRA issue will check whether there is a PR of the module that he is concerned about. Occasionally, when the committee is very busy, you can also @ a certain committee in the PR to help review the submission. Usually, the committer will give some comments, and the committer will reply, and sometimes it may be necessary to make changes & submit again. It should be noted that the Flink community does not recommend using git squash to merge and compress multiple commits, because it will lose the historical change records. It is recommended that the modified versions be directly appended to the original commits. Sometimes this step can be repeated several times & with multiple committers involved until the committers and the committers agree. For a contributor, the whole PR is finished at this stage. Subsequent committers will merge it into the master branch and close pr.

To summarize briefly, for a contributor, the steps to complete the PR submission are as follows.

Step 1: claim the issue you are interested in on JIRA, and ask the committee to assign it to yourself.
Step 2: complete the issue task and check before submission.
Step 3: fill in the commit information according to the specification and submit it to the remote private warehouse.
Step 4: fill in the PR information according to the specification and wait for the committee review.
Step 5: deal with the comments of the committee, sometimes including modifying the code, and repeat this step until the members agree that the change is OK.

Congratulations on becoming a Flink contributor! Each release announcement of Flink will have a list of contributors to list all contributors. Meanwhile, the GitHub contributor page will list the list of top 100 contributors in history.

Code! Flink contributor Quick Guide

How to be an excellent contributor

Submitting the first PR is only the first step in the long march. How to be an excellent contributor or even a committer? Here are three tips that might help you.

1. Actively participate in users’ Q & A

Code! Flink contributor Quick Guide

Flink community encourages more people to participate in the user’s mailing list. Apache financial report in 2019 shows that Flink community ranks first in mailing list activity. Every month, the community counts the contributors who actively answer questions in each mailing list and looks for potential contributors from these active contributors.

2. Code quality wins community trust

For code contributors, the best practice is:

  • Follow the code style specification and configure in idea checkstyle.xml Check at any time to avoid low-level problems such as nonstandard style in PR.
  • Carefully fill in the PR description template, especially the “brief change log” section, for referencehttps://github.com/apache/fli…andhttps://github.com/apache/fli…
  • Any new features should have test coverage, which tends to be unit testing rather than integration testing.
  • Any new functions should be synchronized to cover documents, and both Chinese and English documents need to be updated or issue created.
  • Pay attention to the test results of azure lab.

3. Have a sense of community

Finally, the title of contributor or committee not only brings us “professional aura”, but also brings a sense of responsibility and helps the community to become better from the bottom of my heart. For example, we should not pick jobs, help new people become contributors, help review new PR (review guide [6]), and so on.

Finally, I would like to borrow the classic line “it is the time you wasted on your rose that makes your rose important.”.

Reference link:

[1]https://flink.apache.org/cont…
[2]https://flink.apache.org/comm…
[3]https://gist.github.com/wucho…
[4]https://cwiki.apache.org/conf…
[5]https://issues.apache.org/jir…
[6]https://flink.apache.org/cont…
[7] How to participate in Flink community from 0 to 1https://www.bilibili.com/vide…
[8] How to be a qualified ASF contributorhttps://enjoyment.cool/2020/0…
[9] How to grow from Xiaobai to Apache committer?http://wuchong.me/blog/2019/0…