Record a redis cluster exception. (error) clusterdown the cluster is down

Time:2021-6-6

Accident description

Previously, we used docker compose to build a redis test cluster on the test server. It ran for a long time without exception
But there was an accident in the computer room, and the server was rebooted for no reason. There was no exception when the redis cluster was rebooted, but the get, set and other methods failed
This is a mistake in the title
Here is the error message:

127.0.0.1:6378> set ceshi 123
(error) CLUSTERDOWN Hash slot not served
127.0.0.1:6378> get ceshi
(error) CLUSTERDOWN The cluster is down
127.0.0.1:6378> cluster info
cluster_state:fail
cluster_slots_assigned:16289
cluster_slots_ok:16289
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:170
cluster_my_epoch:106
cluster_stats_messages_ping_sent:1587
cluster_stats_messages_pong_sent:1589
cluster_stats_messages_sent:3176
cluster_stats_messages_ping_received:1589
cluster_stats_messages_pong_received:1587
cluster_stats_messages_received:3176
cluster_stats_messages_ping_sent:1587
cluster_stats_messages_pong_sent:1589
cluster_stats_messages_sent:3176
cluster_stats_messages_ping_received:1589
cluster_stats_messages_pong_received:1587
cluster_stats_messages_received:3176
127.0.0.1:6378> 
127.0.0.1:6378> 
127.0.0.1:6378> 
127.0.0.1:6378> cluster slots
 1) 1) (integer) 5461
    2) (integer) 5488
    3) 1) "127.0.0.1"
       2) (integer) 6373
       3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
    4) 1) "127.0.0.1"
       2) (integer) 6377
       3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
 2) 1) (integer) 5490
    2) (integer) 5491
    3) 1) "127.0.0.1"
       2) (integer) 6373
       3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
    4) 1) "127.0.0.1"
       2) (integer) 6377
       3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
 3) 1) (integer) 5493
    2) (integer) 5590
    3) 1) "127.0.0.1"
       2) (integer) 6373
       3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
    4) 1) "127.0.0.1"
       2) (integer) 6377
       3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
 4) 1) (integer) 5592
    2) (integer) 5648
    3) 1) "127.0.0.1"
       2) (integer) 6373
       3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
    4) 1) "127.0.0.1"
       2) (integer) 6377
       3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
 5) 1) (integer) 5650
    2) (integer) 5657
    3) 1) "127.0.0.1"
       2) (integer) 6373
       3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
    4) 1) "127.0.0.1"
       2) (integer) 6377
       3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
 6) 1) (integer) 5659
    2) (integer) 5755
    3) 1) "127.0.0.1"
       2) (integer) 6373
       3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
    4) 1) "127.0.0.1"
       2) (integer) 6377
       3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
 7) 1) (integer) 5757
    2) (integer) 5769
    3) 1) "127.0.0.1"
       2) (integer) 6373
       3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
    4) 1) "127.0.0.1"
       2) (integer) 6377
       3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
  ... (there are many more here)
 97)...

I guess what the problem is when I see it, because the cluster will succeed only when all 16364 slots are allocated.

solve the problem

From the above error information, we can know that there is a lack of slot points. Due to the large amount of data, there are fewer slot points for manual review
It’s a lot of work. I’ll just run with the script (I’ll add it whether you have it or not)

To add a slot:
redis-cli -h 127.0.0.1 -p 6376 cluster addslots 0
redis-cli -h 127.0.0.1 -p 6376 cluster addslots 1
redis-cli -h 127.0.0.1 -p 6376 cluster addslots 2
redis-cli -h 127.0.0.1 -p 6376 cluster addslots 3
......
Remarks (0 is slot point. There are 0-16363 slot points in total)
It's too hard to write like this
Script
//Go code generates script file
func TestShell(t *testing.T) {
    var sb strings.Builder
    for i := 0; i < 16384; i++ {
        sprintf := fmt.Sprintf("redis-cli -h 127.0.0.1 -p 6378 cluster addslots %d\n", i)
        sb.WriteString(sprintf)
    }
    create, _ := os.Create("s6378.sh")
    create.WriteString(sb.String())
}

After executing the script, restart redis

docker-compose restart
//Check again
127.0.0.1:6378> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:170
cluster_my_epoch:106
cluster_stats_messages_ping_sent:369
cluster_stats_messages_pong_sent:382
cluster_stats_messages_sent:751
cluster_stats_messages_ping_received:382
cluster_stats_messages_pong_received:369
cluster_stats_messages_received:751

cluster_state:fail => cluster_state:ok

127.0.0.1:6378> cluster slots
1) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 6373
      3) "33961d33e3f9aca7e38670602878b89c1cee00a4"
   4) 1) "127.0.0.1"
      2) (integer) 6377
      3) "fd2f54ae6e078a35b228fd1524d24640b63df464"
2) 1) (integer) 0
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 6378
      3) "2d65cfd4af71fd7c99d60ee2b75be371b705097f"
   4) 1) "127.0.0.1"
      2) (integer) 6374
      3) "3de92e59c2e1a45fe1013caa73696c9b1d1d62b8"
3) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 6376
      3) "939f8e1da6b8a687e0b7876875b38f29da063c48"
   4) 1) "127.0.0.1"
      2) (integer) 6375
      3) "275d32d018acf42c06ab0fb5cd7036b5c6f41acf"

According to the above data, redis is normal

verification

127.0.0.1:6378> set ceshi 123
-> Redirected to slot [11469] located at 127.0.0.1:6376
OK

127.0.0.1:6376> get ceshi
"123"

So far, the problem of redis cluster has been solved

be careful

What I wrote in the script is to slot the master node 6378 addslots
Other primary nodes may continue to report errors
So add other master nodes to slots (0-168363)

Attach docker-compose.yml file

version: '3'

services:
    master-1:
        container_name: master-1
        image: redis
        command: redis-server /etc/usr/local/redis.conf
        network_mode: "host"
        volumes:
            - ./redis/master1/redis.conf:/etc/usr/local/redis.conf
            - ./redis/master1/redis.log:/usr/local/redis/logs/redis-server.log
    master-2:
        container_name: master-2
        image: redis
        command: redis-server /etc/usr/local/redis.conf
        network_mode: "host"
        volumes:
            - ./redis/master2/redis.conf:/etc/usr/local/redis.conf
            - ./redis/master2/redis.log:/usr/local/redis/logs/redis-server.log        
    master-3:
        container_name: master-3
        image: redis
        command: redis-server /etc/usr/local/redis.conf
        network_mode: "host"
        volumes:
            - ./redis/master3/redis.conf:/etc/usr/local/redis.conf
            - ./redis/master3/redis.log:/usr/local/redis/logs/redis-server.log
    slave-1:
        container_name: slave-1
        image: redis
        command: redis-server /etc/usr/local/redis.conf
        network_mode: "host"
        volumes:
            - ./redis/slave1/redis.conf:/etc/usr/local/redis.conf
            - ./redis/slave1/redis.log:/usr/local/redis/logs/redis-server.log
    slave-2:
        container_name: slave-2
        image: redis
        command: redis-server /etc/usr/local/redis.conf
        network_mode: "host"
        volumes:
            - ./redis/slave2/redis.conf:/etc/usr/local/redis.conf
            - ./redis/slave2/redis.log:/usr/local/redis/logs/redis-server.log            
    slave-3:
        container_name: slave-3
        image: redis
        command: redis-server /etc/usr/local/redis.conf
        network_mode: "host"
        volumes:
            - ./redis/slave3/redis.conf:/etc/usr/local/redis.conf
            - ./redis/slave3/redis.log:/usr/local/redis/logs/redis-server.log