Huawei cloud computing ie interview notes – what are the two networking schemes in Huawei fusionsphere disaster recovery scheme? Briefly describe the principles of the two technologies? Describe the planning and design points of host replication networking scheme?

Time:2022-4-26

Host replication and disaster recovery: use the host’s IO replication and mirroring function (IO mirror module) and VRG virtual replication gateway to remotely copy the data of virtual machines on the production site to the disaster recovery site to realize disaster recovery.

Storage array replication and disaster recovery: use the hyperreplication technology of storage devices to copy data from the production center to the disaster recovery site to realize disaster recovery.

Key points of host replication disaster recovery planning and Design:

1. Planning premise: collect the application type to be disaster tolerant, IOPs (maximum bytes transmitted per second), data block size, how many virtual machines to be disaster tolerant, and the customer’s demand for RTO and RPO.

RTO: the maximum time allowed for business interruption

RPO: the maximum amount of data loss allowed (the amount of data lost per second)

2. The planning of network bandwidth is divided into management link and IO link. IO link is divided into initial replication bandwidth and incremental replication bandwidth. The bandwidth of management link is not less than 10Mbps and that of IO link is not less than 50Mbps

IO} replication bandwidth:

Calculated according to the average write business IOPs in the replication cycle:
Host replication disaster recovery = number of virtual machines protected * average write business during peak business period in each virtual machine cycle * IOPs * data block size * 8 ÷ 0.7 (8 ÷ is the conversion unit; 0.7 ÷ is the bandwidth interest rate coefficient)
   
Storage replication disaster recovery = number of virtual machines protected * average data change in virtual machine replication cycle (MB) * 8 / (replication cycle (minutes) * 60)

3. Planning of production / disaster recovery system configuration:

① The planned MAC address is divided into two parts, one part of the production site and one part of the disaster recovery site, and cannot be repeated.

② Determine the type and number of disaster recovery virtual machines.

③ For data storage planning, disaster recovery sites are generally reserved for 20%.

④ For the determination of the number of VRGS, a pair of VRGS can protect no more than 150 virtual machines and no more than 200 virtual machine disks. It is planned according to the number of virtual machines to be protected, and a calculation formula can be obtained.

⑤ Set the snapshot execution cycle.

Number of VRGS:

SAN storage scenario:

Number of VRGS = max (total write lops of all virtual machines to be disaster tolerant / 500, total disks of all virtual machines to be disaster tolerant / 200, number of all virtual machines to be disaster tolerant / 150). In case of decimal, round up

Fusionstorage scenario:

Number of VRGS = max (total write lops of all virtual machines to be disaster tolerant / 1500, total disks of all virtual machines to be disaster tolerant / 200, and number of virtual machines to be disaster tolerant / 150). In case of decimal, round up.

The VRGS of production and disaster recovery sites need to be one-to-one and bound to the host. At most one VRG can be deployed on each host.
If the data storage forms at both ends are inconsistent, the end with the largest number of VRGS shall prevail.

4. Production site planning:

① First, configure some mapping relationships on the disaster recovery platform, including the relationship between cloud platform, VRG, storage, cluster, host and port group.

② Then, when VRG is deployed, we know its specifications. There must be at least two CPUs, 6G memory, 15g system disk and 100g data disk. The official requirements of VRG specifications cannot be adjusted.

③ Then there is the planning of data storage. First of all, a virtualized data storage should be selected, and VRG virtual machine is recommended to use independent data storage instead of the same data storage as other disaster recovery virtual machines.

5. The planning of the disaster recovery site is basically the same as that of the production site, that is, the data storage needs to be reserved, and the data storage capacity needs to be reserved by 20%

 

The remote replication technology of storage replication and disaster recovery is divided into synchronous remote replication and asynchronous remote replication. Synchronization is to write to the RM module first, and then the RM module double writes to both ends. After writing, it returns success and IO calculation is completed. Asynchrony is to write to the main cache first, return the IO successfully written, then write to the main storage, take a snapshot, and then write to the remote storage.

Fusionsphere primary and standby disaster recovery storage replication disaster recovery networking

 

 

 

Realize the communication between two bcmanagers. It can be realized by connecting two management planes through a separate network.

The storage device needs to be connected to the management network and taken over by bcmanager

The two site storage device realizes data disaster recovery through the storage layer network device link of the two sites

The local high availability, dual active, two places and three centers realized by storage replication disaster recovery are similar to the general disaster recovery networking

1. Definition: the production data center is on the left and the disaster recovery data center is on the right. We need to recover the data of the Lun in the production data center to the disaster recovery data center, access the production side storage and disaster recovery side storage through BCM, and create protection sets, protection policies and protection plans to transfer the data from the back-end storage link to the disaster recovery side when the synchronization cycle comes.

2. RPO and RTO close to 0

3. Implementation principle

4. Restrictions: 1. It must be a Huawei storage device

2. There must be a back-end storage link

3. The device must support advanced features (medium and high-end storage)

 

Fusionsphere primary and standby disaster recovery host replication disaster recovery networking

 

 

 

The storage plane does not need to be connected

Bcmanager does not need to manage storage devices

It is recommended to configure the disaster recovery service management interface separately for the CNA node to communicate with VRG. If not, go through the CNA management interface

The host IO replication plane needs to be configured for the communication between VRG and VRG

VRG needs to be taken over by bcmanager

 

 

 

 

1. Realize the communication between two bcmanagers. It can be realized by connecting the two management planes, or a separate network.

2. The VRG virtual replication gateway needs three planes and has three network cards: VRG and VRG, interworking with cna and interworking with BCM: the BCM sends the strategy to VRG for execution.

3. The bcmanager has configured the protection object and protection policy to protect the host business.

I / O process

① The VM initiates the IO stream. After receiving the IO stream, the production site cna will perform IO double writing, one written to the production storage and one captured by I / O mirror and sent to the local VRG

② The local VRG compresses and encrypts the I / O data and sends it to the disaster recovery VRG through the host replication plane

③ The VRG at the disaster recovery end decompresses and decrypts the I / O data and routes it to the corresponding cna host

④ The CNA host of the disaster recovery site writes the data to the backup storage

When a disaster occurs, the upper bcmanager finds that the production side site fails, and it will mount the data in the backup storage of the disaster recovery side to the disaster recovery VM of the disaster recovery side (the disaster recovery VM is an empty VM, which will be activated only when a disaster occurs) to complete the business.

 

1. Definition: the production data center is on the left and the disaster recovery data center is on the right. The data in the production data center is transmitted to the disaster recovery data center through VRG. Access the VRG at the production end and the VRG at the standby end through the BCM, establish the protection set, protection strategy and protection plan, and transmit the production end data to the disaster recovery end through the host IO replication network plane after the passing cycle arrives

2. RPO: approaching 0; RTO: asynchronous minute level

3. Restrictions: 1. Applicable to fusionsphere environment

2. Only asynchronous mode can be used

4. Implementation principle

 

Protection set

Protection strategy: Computing (host, cluster), storage (data storage), network (DVS)

Protection plan

*The role of VRG?

1. Aggregate the IO data of the virtual machine and send it to the remote site after compression and encryption.

2. Receive the remote site data and route the data to the designated host.

3. Provide management interfaces such as replication policy distribution and status query.

*Can a pair of VRGS perform disaster recovery for virtual machines on multiple CNAs?

The number of VRGS deployed depends on network, storage and other factors. It is recommended that the number of virtual machines configured for each pair of VRGS should not exceed 150, and the total number of disks of all virtual machines should not exceed 200.

*What scenarios are host replication and storage replication suitable for?

Suitable scenarios for host replication:

Host replication disaster recovery positioning services are non critical businesses of small and medium-sized enterprises. It is recommended to support businesses such as ERP (Enterprise Resource Planning), mail server and desktop cloud.

Storage replication is suitable for the following scenarios:

1. Use Huawei San equipment and connect sites through IP network.

2. Do not want the distance between the production center and the disaster recovery center to be limited.

3. Hope to carry out planned cross site migration of virtual machines.

4. The business in the site needs continuity protection.

5. There are complex recovery scenarios such as virtual machine startup priority and dependency.

6. Disaster recovery drill has high priority.

The RPO of host replication is minute level and RTO is second level. If there are higher disaster recovery requirements, storage replication needs to be realized, such as dual active data center.

For stability reasons, key businesses are not suitable for host replication scenarios

Host replication is only suitable for fusionsphere scenarios

*What if there are more than 150 virtual machines protected by a pair of VRG?

Deploy another pair of VRGS, and then associate the VRGS on the bcmanager

*Host based replication

The CNA host must have an iomirror module, and the IO of the virtual machine on the CNA can be captured by it. The data mirroring IO is completed through the iomirror module of the CNA.

*What functions on fusioncompute are used between VRGS:

IO mirror

*Why can’t host replication use fusionstorage for data storage?

Virtualized data storage refers to the virtualization characteristics realized by cna itself, such as snapshot, thin disk, etc., that is, the corresponding storage virtualization is host storage virtualization + file system. Fusionstorage supports virtualization and is realized through the fusionstorage storage system’s own mechanism, but cna is not involved. Therefore, fusionstorage is not a virtualized data storage.

*When fusionstorage is the only storage, host replication cannot be realized?

When the basic block setting of the virtual machine is set to support, the source side supports fusionstorage virtualized storage.

*What is the difference between host replication and storage replication disaster recovery?

1. The implementation of data disaster recovery is different. Host replication disaster recovery transfers data from the production site to the production site through the VM deploying VRG. Storage replication disaster recovery is realized by array replication between two sites.

2. RPO and RTO values are different. Host copy second minute group; Two values are flexible in storage replication.

3. Application scenario. Host replication is applied to non critical businesses of small and medium-sized enterprises; Storage replication applies to all, focusing on key business applications.

4. Virtualization platforms are limited for different reasons. Host replication is only applicable to Huawei fusioncompute.

5. The distance between the two stations is different. Host replication is smaller.

6. Storage replication has no restrictions on upper layer applications, and host replication is limited to fusionsphere.

7. The host replication application VRG consumes the computing resources of the computing node.

8. Storage replication can transmit data synchronously / asynchronously, while host replication can only be asynchronous.

Fusionsphere host replication scenario planning?

1. Realization of bcmanager communication at both ends: it can connect the management plane of the two data centers or a separate network.

2. Bcmanager deployment mode: distributed deployment and disaster recovery site deployment.

3. The communication link bandwidth of bcmanager is generally 10mps.

4. The host IO replication link needs to be calculated according to the average service IOPs in the replication cycle.

5. Calculation formula: number of protected virtual machines * average write business IOPs during busy business in each virtual machine cycle * data block size * 8 / 0.7 (bandwidth utilization factor)

6. Number of VRG deployments: a pair of VRGS can protect no more than 150 virtual machines and no more than 200 virtual machine disks. Plan according to the number of virtual machines that need to be protected.

7. Protection group, protection policy, recovery plan and other plans on bcmanager.

8. Domain0 specification adjustment. The memory size of domain 0 needs to be increased by 4GB on the original basis.

Two ways of fusionsphere disaster recovery

Multi active and active standby can be realized:

1. The hypermetro feature of the storage layer enables multiple activities

2. VRG can only realize active and standby

Storage replication disaster recovery and host replication disaster recovery

Through storage replication disaster recovery, all disaster recovery schemes in the disaster recovery panorama are suitable without any special features.

Host replication disaster recovery is only suitable for primary and standby disaster recovery scenarios, because it completes data disaster recovery through VRG, and the RPO and RTO values are greater than 0.

Will the production side take snapshots for host based replication disaster recovery?

No, because the flow of IO is completed through iomirror. Iomirror does not need snapshot technology to provide support.

For the complete migration in the hot migration of virtual machines, there is no snapshot of storage.

Networking, functions and limitations of VRG

 

 

 

VRG specifications: 2cpu, 6G memory, 15g system disk, 100g data disk (logcache disk: used to store io for asynchronous transmission)

How does the host layer replicated iomirror transfer data to VRG?

First of all, from the name of iomirror, it will equivalent the data to image, and then forward the mirrored data to VRG through IP. In the scenario of host replication, it is necessary to add “virtual machine disaster recovery data traffic business management interface” to cna host.

VRG is used for host replication disaster recovery. Is it synchronous or asynchronous, or both?

Only asynchronous. RPO > 0 sec.; RTO > 0 min

If 10 LUNs are mapped to 10 hosts, what disaster recovery should be selected?

In Huawei’s disaster recovery scheme for virtualization platform, I will choose host replication disaster recovery because there may be unprotected VMS in these 10 LUNs, and the accuracy of host replication disaster recovery can be higher.

In fusioncompute scenario, Huawei’s host replication disaster recovery solution will certainly be used. In other virtualization scenarios, such as KVM and VMware, I will look for a solution similar to fusioncompute host replication disaster recovery solution. Assuming that I can’t find a host level solution, I will try to migrate these 10 VMS to the same Lun, and then use storage array replication disaster recovery, If disaster recovery must be implemented if it is not possible, then the disaster recovery solution can only be used for storage array replication.

What aspects of disaster recovery network bandwidth should be considered:

The number of virtual machines, the amount of data, and the amount of data change (idle time and busy time) should also be considered for synchronous replication

How does host replication and storage replication work?

Host replication:

 

 

 

The steps are as follows:

1、

2、

3、

4. Automatically create a space occupying virtual machine to back up the data of the disaster recovery virtual machine for data recovery after the disaster recovery virtual machine fails.

5、

6、

7、

8. Create virtual machine snapshots for the occupied virtual machine regularly according to the protection policy (to prevent data corruption during synchronization and unavailability of data) 

Storage replication:

 

 

 

The steps are as follows:

1、

2、

3、

4、

5、

6、

7、

8、

9、

10. Create a recovery plan for the protection policy and configure the VM startup sequence

Why does host replication require virtualized data storage?

Because in the process of host replication, you need to create virtual machine snapshots of the occupied virtual machine regularly according to the protection policy.

Host replication, storage replication configuration process?

 

 

 

Fusionsphere openstack disaster recovery networking design

 

 

 

Disaster recovery module design

 

 

 

 

 

 

 

Desktop cloud disaster recovery networking and implementation principle

 

 

 

 

 

 

 

 

 

 

 

 

 

 

。 Fusionsphere host replication scenario planning?

1. Realization of bcmanager communication at both ends: it can connect the management plane of the two data centers or a separate network.

2. Bcmanager deployment mode: distributed deployment and disaster recovery site deployment.

3. The communication link bandwidth of bcmanager is generally 10mps.

4. The host IO replication link needs to be calculated according to the average service IOPs in the replication cycle.

5. Calculation formula: number of protected virtual machines * average write business IOPs during busy business in each virtual machine cycle * data block size * 8 / 0.7 (bandwidth utilization factor)

6. Number of VRG deployments: a pair of VRGS can protect no more than 150 virtual machines and no more than 200 virtual machine disks. 1100 IOPs are planned according to the number of virtual machines to be protected.

7. Protection group, protection policy, recovery plan and other plans on bcmanager.

。 How many network cards does VRG have? What is the role of each?

Three network cards

1. For communication with bcmanager, it needs to be configured as a distributed switch and port group connected with bcmanager virtual machine.

2. It is used to communicate with the CNA host and the service management interface of the host. It needs to be configured as a distributed switch and port group connected with the service management interface of the host.

3. For VRG communication with the opposite site, it needs to be configured as a distributed switch and port group connected with the opposite site.

。 What is the difference between master-slave switching and disaster recovery drill? How to realize master-slave switching?

Master slave switch: from Lun to master Lun

Difference of disaster recovery drill: no Lun switching

Master-slave switching: master-slave switching is included in one click planned migration and one click fault recovery

。 One click disaster recovery test and cleanup based on storage replication

 

 

 

。 One click fault recovery based on storage replication

 

 

 

。 One click planned migration based on storage replication

 

 

 

。 One click disaster recovery testing and cleaning based on host replication

 

 

 

。 Host based replication one click failover

 

 

 

。 One click planned migration based on host replication

 

 

 

。 Differences among disaster recovery test, fault recovery and planned migration

1) Impact on production sites:

Before and after the disaster recovery test, the production site maintains normal operation, and the disaster recovery test has no impact on the production site.

Before the failure recovery, the production site has failed, and only the disaster recovery site can carry out corresponding operations to pull up the disaster recovery business

Before the planned migration, the production site is normal. After the planned migration, the business is switched to the disaster recovery site, and the business of the production site is stopped

2) Value:

Disaster recovery test is used to verify the availability of data copied to the disaster recovery site or the availability of snapshots

Fault recovery can pull up the disaster recovery business with one click in case of disaster at the production site

Planned migration can switch businesses in advance before non catastrophic shutdown to reduce the impact of downtime on businesses

3) Implementation mode

See each previous step for details.

Examination questions:

What is the bandwidth unit between VRGS?

mb/s

Why does the bandwidth of VRG need to be multiplied by 8?

Conversion of units. Byte to bit 1b = 8B

Block size unit?

MB

Why do I need to take the IOPs when the virtual machine is busy?

Because the bandwidth is not calculated based on the busy bandwidth, the data transmission efficiency may be low.

How to calculate IOPs data when business is busy?

 

Specifically, how to deploy and operate VRG?

The production site corresponds to the disaster recovery site one by one and is bound to the host.