With the rapid development of the business of JD cloud, the number of physical machines, virtual machines and all kinds of containers that need to be managed has reached hundreds of thousands. In front of the issue of how to manage such a large number of resource machines, JD cloud realizes that it must develop its own efficient, safe and stable resource machine management system, which will provide a solid foundation for the development of various businesses of JD cloud and even the whole Jingdong group With reliable backing, the “door god” system came into being under such circumstances, and became more mature and stable after being tested by Jingdong 618, 11.11 and other major activities.
As the name implies, “door god” is the patron saint who protects the security of the entire JD resource cloud. It is a set of online machine operation and maintenance platform based on the role authorization of service tree, which is independently developed by the JD cloud platform. The platform supports authentication login, system operation and maintenance and security audit, and can carry out unified access control and operation history records on all hosts of JD cloud platform, which is in line with 4A Professional operation and maintenance audit system, build a unified, efficient and safe operation and maintenance channel, ensure that the cloud operation and maintenance work comply with the requirements of laws and regulations, reduce the human security risk, and improve the operation and maintenance efficiency.
In order to adapt to the rapid development of JD cloud business and the exponential increase in the number of physical machines, virtual machines and containers to be managed, and to meet the requirements of the company’s security certification, efficient operation and maintenance, operation audit and authority control, the door god has formulated the following objectives at the initial stage of design:
– Security certification
It supports the double factor authentication mechanism, controls the risk of account and password leakage through technologies such as two-dimensional code and dynamic token, and prevents the identity of operation and maintenance personnel from being used and reused.
– Efficient operation and maintenance
Self developed SSH interactive interface, simple and easy to use, convenient to manage a large number of hosts, simplify the operation and maintenance and security operation, and improve the operation and maintenance efficiency; the door god supports password free shuttle between resource machines after successful login.
– Operational audit
The operation behavior of operation and maintenance personnel is recorded in the whole process. The operation content supports various dimensional information query and operation record playback. The whole process audit operation and maintenance operation is taken as the guarantee of event tracing and the basis of accident analysis.
– Authority control
Taking the JD cloud service tree as the source of account number and resource machine authorization information, unifying the operation and maintenance entrance, unifying the authority relationship between the natural person and the host account, and unifying the operation and maintenance operation audit control point.
– High availability
Each module of door god realizes distributed design and deployment, and single node problem will not affect the normal service provided by the whole system; it supports tens of thousands of Concurrent Session operation and maintenance capabilities.
1. Key technical points
The door god system involves many technologies, among which the core technologies include:
– Multi factor authentication
Users log in to relay by password + verification code, in which the verification code can select 6-digit digital verification code based on Yunyi or two-dimensional code scanning verification based on Jingdong internal working client (jingme).
– Kerberos authentication
Door god user identity authentication is based on Kerberos protocol, which ensures secure authentication login in the non secure network environment, and after one login relay, it can shuttle between authorized resource machines without secret.
– Nsswitch extension
Nsswitch is used to replace / etc / passwd to obtain user information. The extension is deployed to each resource machine to obtain the uid and GID information of login account from the door API module.
Based on sudoers, the role-based user rights control is implemented. The operation and maintenance roles can operate under the root account, while the development roles can only operate with their own account, but not sudo.
Internal module interactive white list mechanism, the password only signed, not transmitted; asymmetric encryption to save the password; automatic black mechanism to prevent blasting; timed expiration mechanism.
2. Design details
As can be seen from the overall structure diagram of door god, the data source of door god is the core data module (service tree module) developed by JD cloud. All user and resource machine information is obtained from the service tree module and saved to the database and Kerberos. The changed data in the service tree is synchronized in real time by script, so as to ensure that the data can be synchronized to the door god system in real time. The main modules of door god include relay, Kerberos, relay server, door god API and DG client, which needs to be installed on the resource machine. The design and implementation of the core module are described below.
– Relay module
The module is the “front-end” for users to log in to the door god, which is started as an independent container, and its sshd enables Kerberos authentication. Users need to enter the user name and password when they log in Using SSH protocol, and log in to the container after Kerberos authentication; login shell secondary verification also requires the user to input the verification code obtained from cloud wing, or use jingme code scanning verification, and the user can enter the user interface only after the secondary verification is correct.
The door god user interface has been repeatedly polished by the development team, which is simple and easy to use, and the human-computer interaction is natural and smooth. The main interface is the resource machine information authorized to the user. The right part shows the shortcut key information and login history information, and the user input area is below. Users can log in to the resource machine through the following ways:
a. Enter index number in the main interface and select the resource machine to log in;
b. Input IP login directly;
c. Input the application name or IP for fuzzy matching and select the specific resource machine to login;
d. Enter the index number of the history on the right to log in (starting with “!”.
The recorder performs TTY hijacking to realize the screen recording function. All the user’s operations will be recorded and sent to Kafka cluster through syslog. Finally, it will be consumed and stored by ES cluster as the data source for users to provide operation audit query.
- Kerberos module
As the security authentication module of door god system, it is the key to ensure the safe operation and maintenance of hundreds of thousands of online machines in JD cloud.
The module obtains the resource machine information from the service tree and the user information from the door god database (MySQL) by script, registers to its own Kerberos database, and synchronizes the data every minute to ensure the timeliness of the data.
- DG client client
DG client is a so file implemented by C language. All resource machines under the control of door god need to install this file nsswitch.conf So, the user can get all the information through the file. In addition, the Kerberos configuration file also needs to be downloaded to the etc directory of the resource machine.
- Log storage
The door god log is monitored and sent to Kafka cluster by syslog service. The log parsing service will consume the data in Kafka and analyze it. After parsing, the data will be sent to es cluster;
The data before parsing is transferred to JD cloud OSS.
Cloud wing can query the operation log according to the target IP, operation type, keyword (support precise matching and fuzzy matching), time, so as to realize the whole process audit of user operation; the door god also supports the user operation playback function, which can audit the user operation more clearly.
The door god system provides a unified operation and maintenance entrance for JD cloud software development, testing, and operation and maintenance personnel. You only need to remember a password to log in to all authorized hosts, and freely shuttle between authorized hosts, which greatly reduces the workload of software engineers and improves work efficiency. At the same time, it also provides technical guarantee for the operation and maintenance safety. The audit work is carried out in the whole process of operation and maintenance, which effectively ensures the operation and maintenance safety of online machines.
At present, door god has become the main platform for online machine operation and maintenance of JD cloud. It has successfully provided operation and maintenance support for Jingdong 618, 11.11 and other major activities, and has become one of the important forces to help JD cloud develop rapidly and with high quality. The door god team is developing a console product of the door god system, fortress computer. The product will be released in two ways: open source version and commercial version. Please give your valuable comments.
Click “read” to learn about JD cloud wing products!
Welcome to click “Jingdong cloud” to learn more