DBA notes – database, table, master-slave, read-write separation

Time:2020-2-13

Preface

In the previous blog “spring boot practice and thinking”, I mentioned the separation of master-slave and read-write when comparing the application scenarios of ORM framework with different specifications. This essay will make a more in-depth discussion on this and the sub database and sub table.
DBA notes - database, table, master-slave, read-write separation

1. informal discussion

Before I get to the point, I’d like to feel free to talk about the idea of the extension cycle of the architecture (only from my own point of view). First of all, I don’t think the initial planning should be too complex or huge. No matter how the project may develop in the medium and long term, flexibility should be given priority in the early stage, and operations such as sub database and sub table should not be taken into account at the beginning. Secondly, I think that the change of demand is very normal. This is the most common way in the circles I make complaints about, and there are naturally “leaders”. In terms of business, there is a lack of overall consideration, but we should not be limited to one point of view. In the market, change is flexible, and change is dead, especially in the early stage. Therefore, in the previous versions of the architecture, we must consider higher scalability. Finally, after several rounds of market baptism and iterative development, the core business tends to be stable. At this time, we combine the medium and long-term planning to reconstruct the system, divide the domain boundary carefully, decouple the decoupling, and split the splitting.

2. Sub warehouse and sub table

2.1 overview

When the database reaches a certain scale (for example, tens of millions or more), segmentation must be considered. Generally speaking, we need to do vertical segmentation first, that is, by business segmentation, for example, user related, order related, statistics related, etc. can be stored separately. Picture source →

DBA notes - database, table, master-slave, read-write separation

But this is not enough. Although vertical segmentation has stripped some data, each business is still of that order of magnitude, so we have to take horizontal segmentation to further disperse the data, which is also the focus of this section.
DBA notes - database, table, master-slave, read-write separation

I believe that the advantages of sub database and sub table are clear at a glance. One is that the database is dedicated and the business is more centralized. The other is to improve the load capacity of database services. But there are always two sides to a coin。 From then on, you have to accept that your system complexity will be upgraded to a higher level, and iteration, migration, operation and maintenance are no longer easy.

2.2 segmentation strategy

Vertical segmentation is a problem of multiple data sources. There is nothing to talk about. The following demo is a horizontal segmentation. Based on sharding JDBC middleware, I only make a logical statement. For more detailed information and configuration, please move to “official document”.

First, we have to define the fragmentation policy in the configuration file, application.yml:

server:
  port: 8001
  
mybatis:
  config-location: classpath:mybatis/mybatis-config.xml
  mapper-locations: classpath:mybatis/mappers/*.xml

sharding:
  jdbc:
    datasource:
      names: youclk_0,youclk_1
      youclk_0:
        type: org.apache.commons.dbcp.BasicDataSource
        driver-class-name: com.mysql.jdbc.Driver
        url: jdbc:mysql://mysql:3306/youclk_0?useSSL=false
        username: root
        password: youclk
      youclk_1:
        type: org.apache.commons.dbcp.BasicDataSource
        driver-class-name: com.mysql.jdbc.Driver
        url: jdbc:mysql://mysql:3306/youclk_1?useSSL=false
        username: root
        password: youclk
    config:
      sharding:
        default-database-strategy:
          inline:
            sharding-column: number
            algorithm-expression: youclk_${number % 2}
        tables:
          user:
            actual-data-nodes: youclk_${0..1}.user

The meaning of each parameter is explained in detail in the official documents. Actually, the name can also be understood roughly. I define to store the data with even number in youclk_0 and odd number in youclk_1.

User:

@Data
public class User {
    private String id;
    private Integer number;
    private Date createTime;
}

UserRepository:

@Mapper
public interface UserRepository {
    void insert(User user);
}

UserMapper.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="com.youclk.data.repository.UserRepository">
    <resultMap id="BaseResultMap" type="com.youclk.data.entity.User">
        <id column="id" property="id" jdbcType="CHAR"/>
        <result column="number" property="number" jdbcType="INTEGER"/>
        <result column="createTime" property="create_time" jdbcType="DATE"/>
    </resultMap>

    <sql id="Base_Column_List">
        id, number, createTime
    </sql>
    
    <insert id="insert">
        INSERT INTO user (
          id, number
        )
        VALUES (
            uuid(),
            #{number,jdbcType=INTEGER}
        )
    </insert>
</mapper>

UserService:

@Service
public class UserService {

    @Resource
    private UserRepository userRepository;

    public void insert() {
        for (int i = 0; i < 10; i++) {
            User user = new User();
            user.setNumber(i);
            userRepository.insert(user);
        }
    }
}

Result:
DBA notes - database, table, master-slave, read-write separation
DBA notes - database, table, master-slave, read-write separation

After a simple circular insert, you can see that the data has been stored in a separate database according to the policy, and the result is in line with our expectation.

After database segmentation, you should be more cautious in query. Since you cut according to the strategy, you’d better check according to the strategy. Otherwise, for example, I cut 100 databases horizontally. If you don’t query according to the strategyLIMIT 100000, 10For such a set of data, the number level of the last scan is100 * (100000 + 10), which is quite terrifying. Although sharding JDBC has made some optimizations, for example, it does not query the memory at one time, but adopts the method of streaming + merging and sorting, but still consumes resources, which can be avoided or avoided as much as possible.

2.3 distributed transaction

Transaction is the most important thing in any system, especially in the face of the system with sub database, it is never easy to ensure the security of the transaction. There are two scenarios for distributed transactions. One is in distributed services, which will be discussed later. This section focuses on Quark transactions.

Sharding JDBC automatically includes weak XA transaction support, that is, it can ensure the logical transaction security, but exceptions caused by network or hardware cannot be rolled back, which is the same as general transactions in implementation:

@Test
@Transactional
public void insertTest() {
    userService.insert();
    int error = Integer.parseInt("I want error");
    userService.insert();
}

DBA notes - database, table, master-slave, read-write separation

You can see that the quark transaction has been rolled back. In addition, sharding JDBC also provides the flexible transaction of maximum effort delivery type (record the execution process in the log, retry in case of failure, delete after success, and keep the transaction log for manual intervention if it still fails). Although the security is higher, there are no guarantee of timeliness and many restrictions. Let’s leave a space for later Further in-depth discussion (mainly later, I want to finish writing and rest earlier).

3. Separation of master and slave from reading and writing

3.1 overview

Why should we be the master and follower? Let’s explore the following scenarios:

  • We know that each database server has its maximum number of connections and IOPs. If one day it can no longer meet our business needs, it is more reasonable to expand several slaves horizontally to share the pressure of master than to do performance stacking on a single server.
  • If the demand of service for database is Io intensive, it may often encounter problems such as row lock waiting. If you want to have both, is it a better choice to separate read and write.
  • If our system needs to do a lot of reports, or statistics and data analysis, these businesses are often quite resource consuming but not very important. For this, should we open several slaves to let them go to the black room to execute slowly, so as not to affect my efficiency in dealing with core businesses.

I can roughly think of these points. Welcome to leave a message and add.

3.2 master slave deployment

I take MySQL as an example. The general deployment architecture is one master and N slave. The master’s main responsibility is to write and synchronize data to slave. Slave mainly provides query functions.

To facilitate testing, I directly use docker to deploy. First, I create a master-slave configuration file, master.cnf:

[mysqld]
server_id = 1

character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
default-storage-engine=INNODB

#Optimize omit

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

log-bin = /var/lib/mysql/binlog
log_bin_trust_function_creators=1
binlog_format = ROW
expire_logs_days = 99
sync_binlog = 0

slow-query-log=1
slow-query-log-file=/var/log/mysql/slow-queries.log
long_query_time = 3
log-queries-not-using-indexes

slave.cnf:

[mysqld]
server_id = 2

character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
default-storage-engine=INNODB

#Optimize omit

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

log-bin = /var/lib/mysql/binlog
log_bin_trust_function_creators=1
binlog_format = ROW
expire_logs_days = 99
sync_binlog = 0

relay_log=slave-relay-bin
log-slave-updates=1
slave-skip-errors=all

slow-query-log=1
slow-query-log-file=/var/log/mysql/slow-queries.log
long_query_time = 3

Then perform the compose choreography, join the warm cluster, docker-compose.yml:

version: '3.5'

services:

  mysql-master:
    image: mysql
    ports:
      - 3301:3306
    networks:
      - proxy
      - youclk
    volumes:
      - /Users/Jermey/Documents/data/db/cluster/master/mysql:/var/lib/mysql
      - /Users/Jermey/Documents/data/db/cluster/master/conf.d:/etc/mysql/conf.d
    environment:
      MYSQL_ROOT_PASSWORD: youclk

  mysql-slave:
    image: mysql
    ports:
      - 3302:3306
    networks:
      - proxy
      - youclk
    volumes:
      - /Users/Jermey/Documents/data/db/cluster/slave/mysql:/var/lib/mysql
      - /Users/Jermey/Documents/data/db/cluster/slave/conf.d:/etc/mysql/conf.d
    environment:
      MYSQL_ROOT_PASSWORD: youclk

networks:
  proxy:
    external: true
  youclk:
    external: true

Thanks to docker again, the whole process from arranging configuration file to finally starting service is less than one minute:
DBA notes - database, table, master-slave, read-write separation

The next step is to configure the master-slave relationship:

docker exec -it cluster_mysql-master mysql -p

CREATE USER 'reader'@'%' IDENTIFIED BY 'youclk';
GRANT REPLICATION SLAVE ON *.* TO 'reader'@'%';

show master status\G
docker exec -it cluster_mysql-slave mysql -p

CHANGE MASTER TO \
MASTER_HOST='mysql-master',\
MASTER_PORT=3306,\
MASTER_USER='reader',\
MASTER_PASSWORD='youclk',\
MASTER_LOG_FILE='binlog.000004',\
MASTER_LOG_POS=154;

start slave;

show slave status\G

Test:
DBA notes - database, table, master-slave, read-write separation
In the figure above, master is connected on the left and slave is connected on the right. I execute in mastercreate database youclk_0;As you can see, youclk? 0 is also generated in slave, and the master-slave configuration test is completed.

3.3 separation of reading and writing

The implementation of read-write separation based on sharding JDBC is very simple. Change the configuration file, and the rest is almost imperceptible. Application.yml:

server:
  port: 8001

mybatis:
  config-location: classpath:mybatis/mybatis-config.xml
  mapper-locations: classpath:mybatis/mappers/*.xml

sharding:
  jdbc:
    datasource:
      names: ds_master,ds_slave
      ds_master:
        type: org.apache.commons.dbcp.BasicDataSource
        driver-class-name: com.mysql.jdbc.Driver
        url: jdbc:mysql://mysql:3301/youclk_0?useSSL=false
        username: root
        password: youclk
      ds_slave:
        type: org.apache.commons.dbcp.BasicDataSource
        driver-class-name: com.mysql.jdbc.Driver
        url: jdbc:mysql://mysql:3302/youclk_0?useSSL=false
        username: root
        password: youclk
    config:
      masterslave:
        load-balance-algorithm-type: round_robin
        name: ds_ms
        master-data-source-name: ds_master
        slave-data-source-names: ds_slave
      sharding:
        props:
          sql.show: true

Test:

@Test
public void selectAndInsertTest() {
    userService.selectAll();
    userService.insert();
}

Result:
DBA notes - database, table, master-slave, read-write separation

Tracking the logs of MySQL, we can find that the master and slave databases perform insertion and query respectively, which realizes the separation of reading and writing.

epilogue

Outside the curtain, the rain is murmuring. I’m worried about the Qingming Festival. I’m sorry for my writing. I hope you’ll be happy.


My official account, “Jie Yi”.
DBA notes - database, table, master-slave, read-write separation