Happy it world for 20 years, sharpen the experience of operation and maintenance game development

Time:2021-6-13

I joined the work in 1996, engaged in system management and software development in the banking system. Since then, I have formed an indissoluble bond with operation and maintenance development. The first three years is the foundation stage, in the banking system, engaged in database, UNIX system related operation and development.

In 2000, I went to Chuangdang alone and successively engaged in R & D work in blue dot, Tencent and other companies. Blue dot belongs to the stage of technology upgrading. It mainly understands Linux and network, and develops some network security related products. Tencent’s work expands its vision, involving the architecture design, operation and maintenance development of IT system for massive users. At this stage, Tencent has come into contact with many projects dealing with massive users and data. In addition, I like Tencent’s humanistic care and try to bring this atmosphere to other places.

After working for 10 years, I came to Shanghai Shanda with my family in 2010. In addition to the research and development of basic services and operation and maintenance, he is also responsible for the operation of POI and data. The biggest experience of this stage is that good opportunities need to be insisted on. At that time, the tourism, LBS, o2o and other products we did were very good, but the direction of change was too fast. If we can insist on it for one or two years, the income will be very considerable.

In August 2012, I came to the No.1 store in Zhangjiang with Shanda and started to create an operation and maintenance development team. After less than four years of development, the private cloud of No.1 store has grown from nothing, from small to large, and the operation and maintenance development team has grown from 2 or 3 people to nearly 20 people. A seed has grown into a small tree. Now my team is the platform support department. In addition to the platform R & D work: providing the company with operation and maintenance automation, SOA infrastructure and office automation services, the whole department also needs to be responsible for the company’s monitoring, quality assurance, testing, emergency response, etc. The platform support department is a large department, and I am responsible for the following parts in the green box:

In addition to these specific businesses, I am also responsible for the emergency command center (a virtual team, composed of on duty staff, on duty manager, system consulting, architecture, and various businesses) and drills. Recently, I have carried out drills such as switch switching, large scale capacity reduction, and traffic switching. The purpose is to find potential problems through the drills and make our business more robust.

Establish operation and maintenance development team from scratch

What homely food store is not seen in the August 2012 when you first arrived at shop 1, except for a Tucao’s publishing system. It’s hard for us to make complaints about the problem. There are too many people and the situation is chaotic. People have different opinions. Some ask for rollback, some need to restart the application based on experience, and some suggest checking the DB.

At that time, the students who worked on the publishing system just left, and the two students who were in charge of maintenance also felt very confused. One of them just graduated for a year or two, and felt that he couldn’t play any more. After a release, there is a high probability of failure. Sometimes there is a problem with rollback, and it actually rolls back for several hours. As a result, people would rather publish manually than use the publishing system. www.sangpi.com

Although it is a black line, but also on ah. I spent several nights to summarize and classify all the problems of the publishing system, including the publishing system itself, business, operation and other categories, and then communicated with the corresponding teams to improve the scheme, and internally exchanged and guided the Devops concept. Why do we need the participation of various teams? If the release is not efficient and stable, it will affect the operation, R & D, testing, product, operation and maintenance teams, which will eventually seriously affect the normal operation of the business and the rapid iteration of features. Finally, through the concerted action of all teams, we finally stabilized the release system in more than two weeks, which laid the foundation for the current operation and maintenance development team.

At that time, there was no such thing as a private cloud. There were also some different opinions on monitoring and selection. Some wanted to complete self-development, some asked not to be seen by business R & D, and some thought that only open source could be used. Combined with the personnel situation and the actual situation of the company at that time, I adopted the method of open source + a small amount of self-research, and advocated the concept of full staff monitoring, so that all teams could use convenient monitoring in a short time. With the ability of publishing and monitoring, the demand for basic data is getting higher and higher, and the leadership’s support for us is also increasing. We have the opportunity to start building the first version of private cloud, covering CMDB, installation, publishing, monitoring, configuration management, etc. With more and more products and more users, we began to optimize and beautify the products, through the API way to get through the internal and external platform system, and gradually developed into the current Ledao cloud platform. We choose the name of Ledao, which has three meanings: the way to be happy, the way to live in poverty, and the way to talk with relish.

At present, the technical team of No. 1 store is about 1000 people, providing powerful technical architecture, R & D and operation and maintenance services for tens of millions of users nationwide. I am responsible for the operation and maintenance development team of 20 people, working closely with other teams (operation and maintenance, R & D, architecture, etc.) to effectively manage the smooth operation of tens of thousands of devices. Other teams are mainly the demand side of operation and maintenance automation. We will arrange the communication, design, rapid prototyping and iteration of requirements according to the urgency and importance of the business.

In the past four years, we have persisted in operation and maintenance development, just catching up with the rapid development of e-commerce and cloud computing. With the support of leaders and teams, we have made some achievements. I look forward to making more useful products in the cloud and intelligence in the future.

The most successful ones are the teams I have brought

Say something that touched me a lot 20 years ago: when I was working in the banking system, the leader said to me that “leadership is service”. At that time, I didn’t quite understand it, but after having some work experience, I more and more recognized this view. Everyone in the team is the object of my service. When encountering problems, the leaders should charge ahead, when encountering rewards, let the most meritorious students go up, when encountering black pot, I will even check if there are more pots? All of them are on me. I will try my best to fight for what resources I need, and I always think about what I can’t get in the short term. Try every means to make more valuable products for the team and enhance the sense of achievement of members“ Leadership is service “is now bone marrow, thanks to the leader at that time: president Huang Changyun.

Now leading the team, I pay great attention to happiness, efficiency and growth. In fact, everyone has their own advantages and strengths. I will strive to create a good environment for each member to perform their duties, do their best and do their best.

I like the phrase “a good life, take your time”, which has gradually become a team saying. My view is to work fast, do things as far as possible in the front, efficient and high-quality R & D; But life has to slow down. We have to tell stories with our baby at home on weekends, enjoy tea on the balcony, read books and listen to music, and have a good time with the whole family at the dining table. Usually, if I see a team member working overtime, I will ask him / her to go home early. In the future, what he / she needs is healthy body and creative thinking, not code workers.

If you ask me what makes me feel most successful in my work these years? I have led the team, they all have common characteristics: happy, efficient, stable, strong combat effectiveness.

Suggestions on the development of technical career

Here I would like to talk about an experience in my early years. At that time, Tencent just took over the fast tips project, which is the QQ pop-up window you can see. The former classmate who handed over the project had just been punished for the accident. He kindly reminded me to take as little demand as possible so as not to make the same mistake again. On the one hand, I was touched by the improvement of Tencent’s system, and all kinds of penalties for the accident will be investigated to VP level by level; On the other hand, I don’t think doing less is the solution. So, I quickly developed a test background, no matter who wants to test the pop-up window, it can be used directly, and the influence range is controllable. Finally, such a solution can make it convenient for teams to use new products without causing accidents. Therefore, my team in store 1 has also established an internal reward and punishment system, including punishment for Yellow River related activities, to facilitate the accumulation of funds. At the same time, a private cloud platform is established to make it easy for each team to publish, monitor and view logs, creating a good environment for efficient R & D of the business team.

Take this experience as an example to illustrate how technical personnel should view and deal with the company’s project requirements. Do less, one can not respond to business needs, two also lost the opportunity to exercise themselves. Through thinking and efforts, we should complete the work as far as possible and ensure the quality. If you give a broad proposal, it is to take the initiative to solve difficult problems, which is of great benefit to the development of everyone in the workplace. With the development of a person’s career, the problems to be solved become bigger and harder. If the problems to be solved become more and more monotonous or even simpler, it is likely that the career path will become narrower and narrower.

To be more specific, for technical personnel at different stages of their technical career, the development focus is slightly different. For the new technical talents who have just entered the industry, we should pay attention to the stage of laying the foundation. No matter what the scale of the company, I suggest that:

Think more: why is it designed and implemented in this way? What benefits can it bring? What problems will it bring.

Do more: don’t let yourself idle down. It’s no harm to do more, even if there’s no intuitive income in the short term, “the scenery is long, and it’s better to look at the quantity.”.

Communication: reply in time, update the progress in time, and ask for help when there are difficulties.

Also, it’s important to have a team and boss who can grow up with you.

After working for three to five years, we should keep learning and developing. What’s more, to assess whether the bottleneck of career development has been reached, I suggest starting from the following four perspectives:

Space: their position in the team, the company’s position in the industry, the development of the industry, a comprehensive predictable development space.

Resume: update your resume every six months. Instead of encouraging people to change jobs frequently, we can check whether we have made any achievements or progress in the past six months.

Sustainability: we may immediately face problems such as family formation, whether the existing work life and learning styles are sustainable, and if not, how to adjust them.

Goals: are we closer to long-term goals? Is there a better way to ensure the realization of the goal?

About overtime

For overtime, my personal view is: if it is for their own growth or business, take the initiative to work overtime, very good, but also where the professionalism; But if it’s just passive overtime, bound by unwritten regulations and culture, and restricted by unreasonable progress, I don’t agree.

As a knowledge worker, if you can’t set aside time for thinking, energy for developing hobbies, exercise on the playground, and exchange experience with friends and friends, it will be unsustainable in the long run.

Indeed, in this era, technocrats are very popular. Especially in some companies where humanistic care is in place, besides the salary, the treatment is also very good. However, our dream is more important. If there is a conflict between dream and stability, I suggest that we still choose dream: only doing what we really like can we live a lifetime. When your career is your dream, you will love working time from the bottom of your heart.

I see it development and its future

Combined with these two decades of work, I can feel four aspects of the development of the industry environment:

Hardware: memory from several m to several G, hard disk from hundreds of m to hundreds of G. Don’t mention the tape, even the common disk can’t be found.

Programming language: maybe few people know COBOL now. It used to be very popular in banks. C language is still fresh and fresh, while Python and JS are becoming more and more popular.

Network conditions: from tens of kilos of cat, to now home 100m optical broadband and mobile phone 4G, not in the same breath.

Industry: in the past 20 years, there have been more and more popular languages and concepts, but many basic software and hardware have not changed. The server is x86, the OS is Linux, Java and PHP are used on Tomcat and Apache, many databases are MySQL and Oracle, and the network protocol is TCP / IP. What’s changed is that we are more and more dependent on it and have more and more strict requirements, resulting in many new concepts and new products, such as e-commerce, e-payment, and constantly upgraded smart phones.

In the future, with the evolution of software, hardware and network, people hope to get efficient and secure services anytime and anywhere. I think it will be reflected in the following three aspects:

Cloud: most people only need one access terminal to handle business, which may be a mobile phone, tablet or the surface of a building. They can handle company and family affairs at any time.

Intelligence: many jobs are outsourced to intelligent robots, and the intelligent housekeeper at home is responsible for cleaning, cooking and security; The intelligent machinery of the factory is responsible for the operation of the assembly line; Investment goes to smart consultants

Safety: from buying clothes, books and mobile phones to mobile phone transfer, financial management and communication, we have higher and higher requirements for safety, which is also promising. Therefore, cloud based, intelligent and secure it services will be the direction of technology development.

These future development trends, specifically corresponding to our operation and maintenance industry personnel, should keep up with and lead the development of technology.

First of all, the cloud based operation and maintenance is realized, so that each team can understand the business health status anytime and anywhere, and build, test and release conveniently;

Secondly, through machine learning to make the operation and maintenance more intelligent, we need to explore how to alarm only the root cause to avoid being submerged by the phenomenon, how to automatically repair typical faults, and how to automatically expand the capacity in advance to avoid accidents;

Finally, in the aspect of security, in addition to the security of the operation and maintenance platform itself, how to help the business achieve better security? Take e-commerce as an example, through one key peak shaving, CC prevention, cattle prevention and other security projects, let the business run safely and smoothly.

At present, the main content of our research and development is private cloud, which has a solid foundation, and the integration with public cloud is also very convenient. Store 1 has many successful implementation experiences, and the integration with public cloud needs to quickly connect public cloud and private cloud through automatic deployment, release, configuration, etc.