Tidb Best Practices Series (6) the use of haproxy


Author: Li Zhongshu

Haproxy is a free and open source software written in C language, which provides high availability, load balancing, and application proxy based on TCP and HTTP. Well known websites such as GitHub, bitbucket, stack overflow, reddit, Tumblr, twitter and tuenti, as well as Amazon’s network service system, are using haproxy.

As a stateless computing node with infinite horizontal expansion, tidb server needs to be able to provide stable and high-performance load balancing components and provide services with unified external interface address. Haproxy occupies a large market in the load balancing ecology Users can apply this mature and stable open source tool to their online business to undertake the function of load balancing and high availability.

Tidb Best Practices Series (6) the use of haproxy

Introduction to haproxy

Haproxy was written in 2000 by Willy tarreau, a core contributor to the Linux kernel, who is still responsible for maintaining the project and providing version iterations for free in the open source community. The latest stable version 2.0.0 was released on August 16, 2019, bringing more excellent features.

Some core functions of haproxy

  • High availability: haproxy provides high availability functions of graceful closing service and seamless switching;
  • Load balancing: L4 (TCP) and L7 (HTTP) load balancing mode, at least 9 kinds of balancing algorithms, such as roundrobin, leastconn, random, etc;
  • Health check: check the HTTP or TCP mode status of haproxy configuration;
  • Session persistence: if the application does not provide session persistence, haproxy can provide this function;
  • SSL: support HTTPS communication and resolution;
  • Monitoring and Statistics: through the web page, the service status and specific traffic information can be monitored in real time.

Haproxy deployment operation

1. Hardware requirements

According to the official document of haproxy, the following suggestions are put forward for the hardware configuration of haproxy server (the actual calculation can also be made according to the load balancing environment, and the server configuration can be improved on this basis)

hardware resource Minimum configuration
CPU 2-core, 3.5 GHz
Memory 16 GB
storage capacity 50 GB (SATA disk)
network card 10 Gigabit network card

2. Software requirements

According to the official introduction, we have the following suggestions for the operating system and the dependency package (if the haproxy software is installed through the yum source deployment, the dependency package does not need to be installed separately)

operating system

  • Linux 2.4 operating system, supporting x86, x86_ 64, alpha, SPARC, MIPs and PA-RISC architectures.
  • Linux 2.6 or 3. X operating system, supporting x86, x86_ 64, arm, SPARC and ppc64 architecture.
  • Solaris 8 or 9 operating system, supporting UltraSparc II and UltraSparc III architectures.
  • Solaris 10 operating system, supporting Opteron and UltraSparc architecture.
  • FreeBSD 4.10 ~ 10 operating system, supporting x86 architecture.
  • OpenBSD 3.1 and above, supports i386, AMD64, macppc, alpha and sparc64 architectures.
  • AIX 5.1 ~ 5.3 operating system, support power ™ framework.

Dependency package

  • epel-release
  • gcc
  • systemd-devel

3. Recommended version

According to the official suggestion, the stable version of haproxy is stable version 2.0. Refer to this article for feature introduction.

4. Operation steps

The operation of haproxy to configure database load balancing scenario is simple. The following step by step operations are universal and not special. It is recommended to configure the relevant configuration files according to the actual scenario.

  1. Install haproxy: Yum installation is recommended

    #Yum install haproxy
    yum -y install haproxy
    #Verify that the haproxy installation was successful
    which haproxy
  2. Configure haproxy

    #The configuration template will be generated during Yum installation
    vim /etc/haproxy/haproxy.cfg
  3. Start haproxy

    Method 1: start directly

    haproxy -f /etc/haproxy/haproxy.cfg

    Method 2: SYSTEMd starts haproxy and reads it by default (recommended)

    systemctl start haproxy.service
  4. Stop haproxy

    Method 1: Kill – 9

    ps -ef | grep haproxy 
    kill -9 haproxy.pid

    Method 2: SYSTEMd stops haproxy (if started with SYSTEMd)

    systemctl stop haproxy.service

Introduction to haproxy command

View the list of haproxy’s commands with the following command:

$ haproxy --help
Usage : haproxy [-f <cfgfile|cfgdir>]* [ -vdVD ] [ -n <maxconn> ] [ -N <maxpconn> ]
        [ -p <pidfile> ] [ -m <max megs> ] [ -C <dir> ] [-- <cfgfile>*]
parameter describe
-v Displays brief version information.
-vv Displays detailed version information.
-d Debug mode is on.
-db Disable background mode only
-dM [<byte>] Execute the allocation of memory.
-V The boot process displays configuration and polling information.
-D Turn on the daemons mode.
-C <dir> Change the directory location before loading the configuration file.
-W Master slave mode.
-q Silent mode, no information output.
-c Only check the configuration file and exit before attempting to bind.
-n Set the maximum total number of connections to 2000.
-m Limits the maximum available memory in MB.
-N Set the maximum number of single point connections. The default value is 2000.
-L Local instance peer name.
-p Write the PID information of all subprocesses of haproxy to this file.
-de Prohibit the use of specific epoll, which is only available on Linux 2.6 and some customized Linux 2.4 systems.
-dp Epoll is prohibited and is only available on Linux 2.6 and some custom Linux 2.4 systems.
-dS Prohibit the use of specific epoll, which is only available on Linux 2.6 and some customized Linux 2.4 systems.
-dR Do not use so_ REUSEPORT。
-dr Ignore server address resolution failure.
-dV SSL is prohibited on the server side.
-sf/-st <unix_socket> After startup, the finish signal is sent to PID in pidlist. The process that receives this signal will wait for all sessions to complete before exiting, that is, gracefully stops the service. This option must be specified last, followed by any number of PID, sigttou and SIGUSR1 are sent.
-x <unix_socket>,[<bind options>…] Get socket information.
-S <unix_socket>,[<bind options>…] Assign a new socket.

Haproxy best practices

Global ා global configuration
   Log local0 ා defines a global syslog server, up to two
   Chroot / var / lib / haproxy ා set the current directory as the specified directory, set the super user rights to start the process, and improve the security
   pidfile     /var/run/ haproxy.pid         #Write the haproxy process to the PID file
   Maxconn 4000 ා sets the maximum number of concurrent connections accepted by each haproxy process lock
   User haproxy ා the same as uid parameter, using is the user name
   Group haproxy ා the same as GID parameter. It is recommended to use a dedicated user group
   Nbproc 40 ා starting multiple processes to forward requests needs to be adjusted to a value large enough to ensure that haproxy itself will not become a bottleneck
   Daemon ා let haproxy work in the background as a daemons, which is equivalent to the function of "- D" option. Of course, it can also be disabled on the command line with the "- DB" option.
   Stats socket / var / lib / haproxy / stats ා defines where statistics are saved

Defaults ා default configuration
   Log global ා log inherits the settings of global configuration section
   Retries 2 ා the maximum number of attempts to connect to the upstream server, beyond which the back-end server is considered unavailable
   Timeout connect 2S ා the timeout time of the connection between haproxy and the back-end server can be set to a shorter time if it is in the same LAN
   Timeout client 30000 s ා defines the timeout time for inactive connections when data transmission is completed after the client connects with haproxy
   Timeout server 30000 s ා defines the timeout for an inactive connection between haproxy and the upstream server

listen admin_ Stats ා combination of frontend and backend, the name of monitoring group, and user-defined name on demand
   Bind configure listening port
   Mode http ා configure the mode of monitoring operation. Here is the 'HTTP' mode
   Option httplog ා indicates that logging of HTTP requests is enabled
   Maxconn 10 ා maximum number of concurrent connections
   Stats refresh 30s ා configure to automatically refresh the monitoring page every 30 seconds
   Stats URI / haproxy ා configure the URL of the monitoring page
   Stats realm haproxy ා configure the prompt information of monitoring page
   stats auth  admin:pingcap123             #  Configure the user and password admin of the monitoring page, and multiple user names can be set
   Stats hide version ා configure to hide the haproxy version information on the statistics page
   Stats admin if true ා configure manual enable / disable, back-end server (after haproxy-1.4.9)

Listen tidb cluster ා configure database load balancing
   Bind ා configure floating IP and listening port
   The application layer in mode TCP ා haproxy to use layer 4
   Balance least conn ා the server with the least number of connections receives the connection first. `Leastconn 'is recommended for long session services, such as LDAP, SQL, TSE, etc., rather than short session protocols such as HTTP. The algorithm is dynamic, and the weight will be adjusted when the server starts slowly.
   Server tidb-1 check inter 2000 rise 2 fall 3 ා detects port 4000 with detection frequency of 2000 Ms. If the machine is found to be normal for two times, it will be deemed that the machine has returned to normal use; if it detects three times of failure, it will be deemed that the server is not available.
   server tidb-2 check inter 2000 rise 2 fall 3
   server tidb-3 check inter 2000 rise 2 fall 3


This paper introduces the best practice of using haproxy under tidb. The basic usage method of haproxy is introduced in detail. The only regret here is that the highly available architecture and scheme of haproxy are not described in words. You can realize the primary and standby configuration and realize haproxy through the keepalived of Linux When building haproxy according to the document, you must adjust the parameters according to your specific business needs and scenarios, so as to provide the best guarantee scheme for the load balance and availability of the business.

Finally, we also hope that the small partners active in tidb community can actively share best practice experience, and we can exchange and discuss the use skills in the tidb user group Q & a forum( https://asktug.com/ )。

Original reading:https://pingcap.com/blog-cn/best-practice-haproxy/

Tidb Best Practices Series (6) the use of haproxy