Create n-channel threshold of data center from scratch


Sharing Mo Jiangcheng

Sorting out Northwest China

On the evening of December 15, it was another night to share dry goods inside the cloud. Mo Jiangcheng, the network operation and maintenance engineer of youpai cloud, brought the sharing of data center to his partners. He explained in detail the significance of the existence of data center, its composition, energy and location, and also compared the differences between domestic and foreign data centers.
Xiaopai kept sorting out Mo Jiangcheng’s share for you, very comprehensive about the data center dry goods! Don’t read quickly.

Hello, I’m Mo Jiangcheng from dcin group of operation and maintenance department. I’m mainly responsible for the maintenance of network, data center, infrastructure and EP engineering productivity. Today, I’m going to share with you about the data center.

The significance of data center

  1. Host server

  2. Everywhere

  3. Supporting the world

  4. scale effect

  5. Full redundancy

The data center is a very low-level thing, and its existence time is relatively short.
Wiki has a very interesting definition of data center, called server farm. Literally, it means “data farm”, a place with a large number of servers and a large area. The meaning of the data center is to provide a storage place for the server.

For example, the data center is the place where thousands of terabytes of data, countless terabytes of customer source data and access request data of CDN edge nodes with t bandwidth are managed in a unified way.

For example, if I only serve dozens of people every day, a computer can fully meet the requirements. However, when the user volume increases, the bandwidth is insufficient, and a dedicated line needs to be pulled in. When the number of users increases again, you will find that there will be problems in the reliability of power and network, which requires a lot of transformation of existing facilities, plus some additional cooling and cooling facilities. Because the heat generated by servers when they pile up to work together is huge. The waste heat generated by the daily work of about 5000 cabinets in the computer room can heat a university campus. Therefore, this kind of ancillary equipment for cooling and cooling will also increase the operating cost of the entire data center, which is generally called TCO, that is, the total cost of ownership.

At present, there are about hundreds of thousands of data centers supporting the whole Internet. Without these data centers, it is impossible for the Internet to develop into such a scale. The larger the scale, the better the ratio of benefit and expenditure, and the lower the unit cost.

The data center needs to be highly reliable, which is the lifeline of the data center, because the data center needs to provide timely and reliable support for all servers. Power, network and cooling systems are redundant, and building automation systems (BMS) for environmental control are required. In addition, the security system of the data center is also particularly important. When the number of employees increases, you can’t know everyone. Who can enter the data center and who can manage the server becomes a real problem. Data center can help enterprises to control personnel rights, complete the certification of personnel composition, and grant corresponding access or operation rights.

Composition of data center

  1. Architecture: main structure

  2. Environmental control: temperature and humidity, lighting

  3. Safety: fire protection and security

  4. Network: generic cabling

  5. Power: utility, UPS, generator

The three most critical parts of the server are power supply, temperature and network. The data center provides services for servers, so it is necessary to provide guarantee in these three aspects. Maintaining a data center is a very complex systems engineering.

The main components of the data center are relatively simple, but the building structure will be involved.
The most important part of the data center is to provide power for the facilities of the whole computer room, including UPS, generator, and external power access.

The server will have certain requirements for working temperature and humidity, and different servers may have different optimal working environment. The data center should regulate the whole environment and maintain the best environment for servers.

The data center also has safety, fire and security requirements. Security is mainly related to the control of personnel authority. There will be early warning and smoke detection for fire fighting. For high-level computer rooms, it is generally able to empty all the people in the room within a few minutes and put out the fire by releasing inert gas.

The personnel involved in the data center include high-voltage electrical engineer, low-voltage engineer, HVAC, drainage staff, fire security personnel, as well as BMS automation, power environment monitoring, network engineer, network monitoring personnel, it software and hardware maintenance personnel. In addition, support personnel from equipment suppliers, such as ups and generators, are also included.

The task of data center is to provide safe, stable and good server operation conditions under any circumstances. Some computer rooms even make emergency plans for riots, terrorist invasion, etc.; for example, Google’s data center will install a car blocking net at the front door to avoid car punching.

Building and site selection of data center computer room

  1. Close to something: users, service themes, power, cooling resources, transportation hub, backbone network

  2. Away from something: natural disaster risk, hot spots, hot spots

  3. Cheap electricity: local electricity costs, wind, water, solar energy

  4. Cool: the climate is warm in winter and cool in summer

  5. Local large: can accommodate a large number of additional facilities

The construction and site selection of computer room is a very important link in the preliminary work of data center. Buildings are generally divided into warehouse type and building type.

1. Warehouse type: tile structure, relatively low, generally speaking, up to three floors, this structure is easier to build;
2. Building type: the computer room of domestic building type is usually reconstructed from office building.

The location of computer room should avoid unstable factors, such as natural disasters, hot areas and hot spots. Of course, many times the location of the computer room is not absolutely controllable. For example, it is very difficult to find the location that will never happen in Japan.

Due to the pursuit of functions, the data center will choose a special location to build the computer room for some functions. A typical example is the cloud CDN node. The closer the CDN node is to the user, the better. Therefore, the cloud is taken again to place its own servers in the main first and second tier cities of each province. The main reason for choosing a data center in these locations is that it is physically close to the target it serves.

Data center’s demand for resources

  1. Electricity demand, choose the place with low electricity price;

  2. The cooling system needs to choose the area with lower annual average temperature, or close to the sea or river. The water source can be used as water cooling cycle;

  3. The choice of roads close to the backbone of the traffic network.

  4. Because the space occupied by many computer rooms is very large, we should choose the place with enough area.

Create n-channel threshold of data center from scratch

Microsoft chose to build data center in water

This is the site selection of Microsoft Data Center. This practice is a case of extreme pursuit of cooling water resources. They put the cabinet in a sealed tank, sink it directly into the water bottom, and then cool the internal server through water cooling cycle.

Energy security of data center

  1. At least two power lines are connected

  2. At least two sets of UPS

  3. Generator with full load power and N + 1 configuration

The most important place of data center is energy. The power of computer room is its lifeline. A data center with 5000 cabinets can provide electricity for a small city or a university, which is not available anywhere.

In the early stage of the establishment of the computer room, it will generally negotiate with the local power grid to see whether it can provide power access and at least two power supply substations around. Some high-level data centers may be connected to the third line city power supply. For example, the computer room of No.1 building of Unicom provincial hub in Hangzhou is built in the hub building of China Unicom. Therefore, the power guarantee conditions there are particularly high. There will be three substations in different locations directly connected to the computer room to supply power for it. In this case, its power security level is very high, and it is unlikely that all municipal power and all backup power will fail.
Create n-channel threshold of data center from scratch

Delta 35KVA high voltage power access equipment

This is a 35KVA high voltage power access facility. High voltage power access is mainly from the substation, which is basically divided into 10KVA, 35KVA and 110KVA, which mainly depends on the scale and load of the machine room.

Create n-channel threshold of data center from scratch

Delta data center generator

Next, let’s talk about generators. In some data centers, one generator is worth 6.7 million yuan, and more than 10 generators will be configured. The highest safety standard is that the generator and UPS all have 100% double redundancy, that is to say, after the overall outage and failure of any system, the other set can also provide full load capacity. There is a tier4 data center called data hub in Luxemburg, a European country. It uses exactly the same two sets of generator equipment, any one of which can provide full power load capacity for the data center.
Speaking of the generator, a very interesting story. When a hurricane was raging in the United States, a computer room had to switch to the generator power supply because of the power cut off. At that time, the hurricane was not over, and the oil companies could not timely replenish the spare oil. Because most of the data center generators can only provide 8 to 10 hours of power guarantee for the computer room at full power, beyond this time, the computer room will be completely shut down. A VPS provider sends an email to all users, telling them to choose other routes. However, the lines in that computer room were full, because everyone was curious about what it was like to use diesel generators to supply power.

Create n-channel threshold of data center from scratch

Battery system in UPS of a data center

There are a lot of batteries in the data center. In the above figure, there are more than 30000 such batteries in the data center, which can provide power guarantee for the computer room for about two hours.
Compared with the generator, UPS can last a very short time. Therefore, the data center will not take ups as a long-term power guarantee. In reality, UPS usually takes no more than one minute to fully undertake the power supply of the whole computer room. Basically, the higher level computer room will automatically switch the power supply to the diesel generator when the ups takes over the power supply. Because it takes 15 to 20 seconds for the diesel generator to start up formally and generate power on grid, UPS mainly works in this short period of time, but it can not provide less than half an hour of energy reserve, otherwise there will be risks.

Create n-channel threshold of data center from scratch

A simple power supply structure of computer room

Analyze the power structure of the computer room
An ATS switch is an automatic power control switch. It will automatically switch the input power to the generator when the mains power is interrupted. After the high-voltage power is transferred to the low-voltage cabinet, it will be converted into low-voltage DC power, which will be supplied to ups and then to the server.

At the front of each row of cabinets, there will be a cabinet which is specially used for electric power. This cabinet is specially used to discharge strong electric equipment.

The ups goes to the front cabinet, and then the server is connected under the front cabinet, that is, the load. UPS is always running in the network, which is to supply power to the server. In fact, the UPS is always used. This is why the data center will not flash off when the power is switched off. Because ATS has switching time, which is usually between 15 and 30 ms. If it is a particularly sensitive machine, it will feel the flicker, which will lead to service failure. If a UPS is connected in the middle, the problem will not occur again, because the UPS is still supplying power when the switching flashover occurs.

A better computer room usually has four sets of UPS, because the circuit of each cabinet is generally divided into two circuits, AB and AB, which are completely independent UPS power supply. As long as the server equipment has redundant power (commonly referred to as “dual power”), even if one of them is power-off, or flash off, it will not affect the operation of the server. In addition, the data center to be used by cloud in the future is composed of four sets of UPS, two sets for route a and two sets for route B.

Cooling system of machine room

The cooling system of computer room is usually divided into air cooling and water cooling. Air cooling is the traditional sense of air conditioning, and the principle of home air conditioning is the same, but this kind of air conditioning is called precision air conditioning, it can very accurately control the environment temperature and humidity, for the server to create the best operating conditions.

Create n-channel threshold of data center from scratch

Water cooling system

There is a big difference between the principle of water cooling and air cooling. The cost of water cooling is quite high, and the volume is also very large, because it will have chillers, that is, compressors, heat exchangers in the sense of traditional air conditioning, and external cooling fans; the heat of the server will be transferred to the outside through the water pipe, and then the heat will be exchanged by the chiller or plate heat exchanger to the outside radiator to eliminate.

Advantages and disadvantages of the two systems

Water cooling: high cost, large volume, complex maintenance, high reliability, special design, high energy efficiency
Air cooling: low cost, small size, offset, simple maintenance, low energy efficiency

When the climate is relatively cool, the water-cooling system does not need to use a compressor, can directly through the plate heat exchanger, very energy-saving to reduce the temperature.

Create n-channel threshold of data center from scratch

Δ heat dissipation inside the cabinet

This is the cooling mode of the conventional computer room, which is to heat the cabinet in the computer room. In the early days, there was no distinction between hot and cold air in the computer room. Just like the early PC, few people said that it was necessary to control the air duct in the cabinet. However, with the progress of the times, there are signs that the air duct has been designed in advance. In addition, various baffles are used in the cabinet to regulate the air flow. The cold air is forced to flow through the equipment and the heat dissipation area to become hot air, which is then discharged from the rear port of the chassis.

In fact, the principle of data center heat dissipation is similar to this. At first, without distinguishing between hot and cold air, the air conditioner consumes very high energy, and the air conditioner is forced to dissipate heat. However, it is impossible to control whether the air conditioner inhales cold air or hot air. Later, in the design of the computer room, the cold and hot air flow will be standardized by planning the cold channel.

Create n-channel threshold of data center from scratch

Schematic diagram of △ cold air conversion

This box is two rows of face-to-face cabinets. The red one is waste heat and the blue one is air-conditioning. The cold air is discharged into the ground through the air conditioning unit nearby, and is discharged upward through the prefabricated air duct under the cold channel. The front of the server, which is the suction side, draws in cold air, which turns into hot air when it is discharged from the rear. At this time, the air conditioner inhales hot air and cools the hot air to complete a cooling cycle, which is the most commonly used cold channel mode at present.

At present, a simple cooling method is also common in the relatively new computer room. Instead of planning the cold channel between the two cabinets, it directly opens a hole under the cabinet to discharge the cold air from the bottom, and then the cold air rises to the front of the server and is sucked out. That is to solidify the cold channel into a single cabinet. This way ensures that the cold air is sucked up from below, and the hot air is discharged from behind. The front panel of this cabinet is sealed.

The more sophisticated the planning of cold air, the higher the price. For example, there is one thing that can be done under this structure, which is to directly transfer the hot air point-to-point into the air-conditioning unit by laying pipes above the cabinet, so as to ensure that the air-conditioning unit obtains all the hot air that needs to be cooled. In this case, the energy consumption can be reduced more effectively.

But what the enterprise pursues is a total cost of ownership, not how much effect can be achieved on a single subsystem. Therefore, the specific choice of refrigeration mode depends on the situation, rather than the more advanced, it must be used.

Google also found this problem when building data centers, because the cost of using a large number of panels or metal parts to isolate cold channels can be very high. Google directly to the convenience store to buy transparent curtain, transparent plastic sheet, directly hanging the curtain, plastic sheet from the top of the machine room, through the plastic sheet to form a cold and hot channel isolation area, to complete the cold and hot isolation. The cost is very low, and the effect is very good. According to Google itself, it has reduced pue by more than half (the energy consumed by the whole facility, excluding the proportion of the total energy consumed by the server itself). This kind of thinking is very good, when encountering not very complex problem, completely can use simple method to solve.

With regard to pue, the pue of computer rooms in China generally ranges from 2 to 2.5. Overseas, such as Google and Facebook, generally control Poe below 1.3, so the gap is quite large.

Data centers need to withstand natural disasters

Excellent data centers will consider various unexpected factors in the early stage of construction. As I said just now, Google will consider violent terrorist attacks, and Japan will consider more natural disasters.

Although the probability of earthquake, flood, tsunami, hurricane, storm, lightning and volcanic eruption is very low, for a certain computer room, this kind of event may never happen. However, when the number of computer rooms is large, or if you have a computer room for a long enough time, everything is possible. In my actual work, I have encountered the computer room flooded by water and the machine room cut off by lightning
Let’s talk about the earthquake. When I went to Japan before, I visited NTT’s computer room. I was particularly impressed.

There are two points to resist earthquakes

Site selection should avoid seismic zone;
Improve the seismic grade of buildings and cabinets.
In the seismic aspect of the data center, the frame structure is usually used. In fact, the safest way is to build the cabinet under the ground, because the earthquake damage mainly comes from the surface wave of the earthquake, that is, the force that it transmits on the ground to cause the buildings to shake left and right. Therefore, building underground is the safest way to resist earthquake. Of course, the price of building underground is extremely expensive.

Focus on the explanation of the frame structure. The frame structure refers to the load-bearing through the frame of the building, rather than the wall surface to bear the load. The frame structure can generally provide the resistance to the earthquake of magnitude 8. During the Wenchuan earthquake, there was a telecommunication room with frame structure. There was no collapse at that time, but the wall cracked.

Japan has a lot of black technology in earthquake resistance, because it is difficult for Japan to find a place without an earthquake zone. Moreover, their land is expensive, and they can not resist the earthquake by building large-scale frame structures. Therefore, the computer rooms in Japan’s big cities are generally buildings. The machine room with more than ten floors is especially dangerous in the earthquake. Japan’s practice is to hit the foundation very deep, to a very deep granite layer, can withstand extremely high earthquake without fracture.

It’s that the engine room is really like a black building in Japan.

Create n-channel threshold of data center from scratch

△ lateral damping

The Japanese machine room absorbs the suspension of lateral movement. It is a hydraulic structure, which can move without following the trend of the ground during an earthquake. This design can ensure the stability of the building. Through shock absorbing rubber, it can absorb longitudinal vibration when encountering earthquake.

Create n-channel threshold of data center from scratch

Δ longitudinal damping

Cloud room selection

For the core data center, taking cloud again will comprehensively consider all the factors mentioned above, and then take into account security, stability, reliability, and network connectivity through technical evaluation. After making a strict evaluation, we can determine whether the computer room can meet the needs of cloud shooting again and whether it can be used by cloud again.

Create n-channel threshold of data center from scratch

Delta cloud in New York data center

Overseas data centers will place equipment in a very dense manner, which also involves the layout habits and energy conditions of data centers at home and abroad. Each cabinet in overseas data center can provide 4.5 kW or higher power support. In China, it is about 3 kW.

Create n-channel threshold of data center from scratch

Delta cloud in New York data center

From the pictures of the New York data center, we can see that there are nets blocking objects from falling, which is a big difference at home and abroad.

When designing and planning foreign data centers, people’s factors, or interactive experience, will make you feel very comfortable when using them. These are specially evaluated. Foreign data centers attach great importance to beauty, and will consider the visual factors. When building the computer room, they will consider many things beyond the hard index.
In China, there are still deficiencies in this aspect. Domestic data centers are relatively excellent in terms of hard indicators such as power and security. However, there are still deficiencies in the details.

Create n-channel threshold of data center from scratch

Delta cloud’s data center in Hong Kong

It’s true that the things used by foreign data centers are very reliable, but when you encounter some problems that require staff support or flexibility, they may charge you $200 an hour for service. In the domestic free shelf, installation, installation system and other things. If you go abroad, you may be charged $200 an hour. This reflects the greatness of our Chinese people, because the computer rooms opened by Chinese people abroad are all free of charge.

Recommended Today

Implementation example of go operation etcd

etcdIt is an open-source, distributed key value pair data storage system, which provides shared configuration, service registration and discovery. This paper mainly introduces the installation and use of etcd. Etcdetcd introduction etcdIt is an open source and highly available distributed key value storage system developed with go language, which can be used to configure sharing […]