A new generation of mindsprore molecular simulation library

Time:2021-6-14

[recommended topics in this issue]Internet of things practitioners must read: Huawei cloud experts explain the development and implementation principles of various modules of liteos in detail.
A new generation of mindsprore molecular simulation library

Abstract:Based on the characteristics of mindspree such as automatic parallel and graph computing fusion, pulse can efficiently complete the traditional molecular simulation process. Using the characteristics of mindspree automatic differentiation, it can combine AI methods such as neural network with traditional molecular simulation.

This article is shared from Huawei cloud community “mindsprore new generation molecular simulation library: pulse”, original author: Yu Yu, mindsprore algorithm scientist

Mindspore, a new generation of molecular simulation library: pulse,It is jointly developed by Gao Yiqin research group of Peking University and Shenzhen Bay laboratory and Huawei mindspree team. It has high performance, modularity and other characteristics. It is a completely self-developed molecular simulation software library. Based on the characteristics of mindspree such as automatic parallel and graph computing fusion, pulse can efficiently complete the traditional molecular simulation process. Using the characteristics of mindspree automatic differentiation, it can combine AI methods such as neural network with traditional molecular simulation.

Background

Molecular simulation refers to the use of computers to simulate the molecular structure and behavior at the atomic level, and then simulate various physical and chemical properties of molecular systems. It is based on experiments, through the basic principles, to build a set of models and algorithms, so as to calculate the reasonable molecular structure and molecular behavior. In recent years, molecular simulation technology has developed rapidly and has been widely used in many fields. In the field of drug design, it can be used to study the mechanism of action of viruses and drugs; In the field of biological science, it can be used to characterize the multi-level structure and properties of proteins; In the field of materials science, it can be used to study the structure and mechanical properties, material optimization design, etc; In the field of chemistry, it can be used to study surface catalysis and its mechanism; In the field of petrochemical industry, it can be used for structure characterization, synthesis design, adsorption and diffusion of molecular sieve catalyst, construction and characterization of polymer chain and structure of crystalline or amorphous bulk polymer, and prediction of important properties including blending behavior, mechanical properties, diffusion, cohesion and so on.

Due to the limitation of simulation time and space, the application of traditional molecular dynamics simulation software is greatly limited. Researchers need to constantly develop new force fields, sampling methods, combined with new technologies (such as AI algorithm) to expand the scene of molecular dynamics simulation. Therefore, sponge emerges as the times require, with completely independent intellectual property rights. With the modular design features, sponge supports scientists to efficiently and conveniently build the relevant calculation modules needed in molecular dynamics simulation. At the same time, sponge also has the high efficiency required by traditional simulation. In addition, sponge also naturally supports the natural integration with AI algorithms, and can use the high-performance computing characteristics of mindspore framework.

Compared with the traditional molecular simulation software combined with sits method for biomolecular enhancement sampling, pulse supports sits and optimizes the calculation process, which makes it more efficient to use sits method to simulate biological system. For polarized system, traditional molecular simulation combined with quantitative calculation is used to solve the problem of charge floating. Even if machine learning is used to reduce the amount of computation, a lot of time will be wasted on the problem of program data transmission. However, with the modular feature, sponge can support direct communication with machine learning program in memory, which greatly reduces the overall computing time.
A new generation of mindsprore molecular simulation library

Figure 1: enhanced sampling of alanine dipeptides in dominant solvents with sits and other methods

With mindspore version 1.2, open source pulse has the following advantages:

  1. Full modular molecular simulation. The modular construction of molecular simulation algorithm is easy for the field researchers to quickly implement the theory and algorithm, and provides a friendly open source community environment for external developers to contribute sub modules.
  2. The whole process of artificial intelligence algorithm based on the combination of traditional molecular simulation and mindspree. In mindspore, researchers can easily apply AI methods to molecular simulation. Fully operational pulse will be further combined with mindsprore to become a new generation of end-to-end differentiable molecular simulation software, realizing the natural integration of artificial intelligence and molecular simulation.

Case introduction

Next, we will briefly introduce a simple case of pulse on mindsprore, which uses pulse to simulate alanine tripeptide aqueous solution system.

Before practicing, make sure mindsprore is installed correctly. If not, you can install mindspore through the mindspore installation page (mindspore official website).

1. Input file preparation

Three input files need to be loaded in the simulation system of this tutorial

·Attribute file (the file with suffix. In) declares the basic conditions of simulation and controls the parameters of the whole simulation process.

·The topology file (suffix. Param7) describes the topological relationship and various parameters of molecules in the system.

·Coordinate file (suffix. Rst7 file), which describes the coordinates of each atom at the initial time in the system.

Topology files and coordinate files can be modeled by the loop tool in ambertools through the modeling process. Download ambertools MD.

After building the required topology file and coordinate file by using swap, we need to declare the basic conditions of simulation through attribute file to control the parameters of the whole simulation process. Take the attribute file in this tutorial as an example. The content of the file is as follows:

NVT 290k

 Mode = 1, # molecular dynamics (MD) mode, 1 means the simulation adopts NVT ensemble

 DT = 0.001, # simulation step

 step_ Limit = 1, # total simulation steps

 Thermostat = 1, # temperature control method, 1 means Liujian Langevin method

 langevin_ Gamma = 1.0, # gamma in thermostat_ Ln parameter

 target_ Temperature = 290, # target temperature

 write_ information_ Interval = 1000, # output frequency

 amber_ Irest = 0, # input mode, 0 means to read the input coordinate file in amber format, which does not include the speed

 Cut = 10.0, # distance of non bond interaction

After the input files of the cases are completed, they are named as http://NVT_ 290_ 10ns.in 、WATER_ Ala.parm7 and water_ ALA_ 350_ cool_ 290. Rst7, these three files can be stored in the custom path of the local workspace.

2. Loading data

From the three input files, the parameters needed by the simulation system are read for the calculation of the final system. The loading code is as follows:

import argparse

from mindspore import context

 

parser = argparse.ArgumentParser(description='Sponge Controller')

parser.add_argument('--i', type=str, default=None, help='input file')

parser.add_argument('--amber_parm', type=str, default=None, help='paramter file in AMBER type')

parser.add_argument('--c', type=str, default=None, help='initial coordinates file')

parser.add_argument('--r', type=str, default="restrt", help='')

parser.add_argument('--x', type=str, default="mdcrd", help='')

parser.add_argument('--o', type=str, default="mdout", help="")

parser.add_argument('--box', type=str, default="mdbox", help='')

parser.add_argument('--device_id', type=int, default=0, help='')

args_opt = parser.parse_args()

 

context.set_context(mode=context.GRAPH_MODE, device_target="GPU", device_id=args_opt.device_id, save_graphs=False)

3. Build simulation process

Using the computational force module and computational energy module defined in sponge, the molecular dynamics process evolution is carried out through multiple iterations to make the system reach the required equilibrium state, and the energy and other data obtained in each simulation step are recorded. The simulation process construction code is as follows:

from src.simulation_initial import Simulation

from mindspore import Tensor

 

if __name__ == "__main__":

 simulation = Simulation(args_opt)

 save_path = args_opt.o

 for steps in range(simulation.md_info.step_limit):

 print_step = steps % simulation.ntwx

 if steps == simulation.md_info.step_limit - 1:

 print_step = 0

 temperature, total_potential_energy, sigma_of_bond_ene, sigma_of_angle_ene, sigma_of_dihedral_ene, \

 nb14_lj_energy_sum, nb14_cf_energy_sum, LJ_energy_sum, ee_ene, _ = simulation(Tensor(steps), Tensor(print_step))

 # compute energy and temperature

4. Run the script

python main.py --i /path/NVT_290_10ns.in \

 --amber_parm /path/WATER_ALA.parm7 \

 --c /path/WATER_ALA_350_cool_290.rst7 \

 --o /path/ala_NVT_290_10ns.out

Where, – I is the attribute file of MD simulation, which controls the simulation process, – Amber_ ParM is the topology file of MD simulation system, – C is the initial coordinate file of our input, – O is the record file of our simulation output, which records the energy of each output step and other information, – path is the path of the input file, which is called pulse in this tutorial_ In folder.

The input file is used to simulate the molecular dynamics process evolution by calculating the force and energy at the specified temperature.

5. Operation results

The running results are in the. Out file, and the energy changes of the system are recorded in the file. The thermodynamic information of the simulation system can be viewed. The following information of the system is recorded in the. Out file:

_steps_ _TEMP_ _TOT_POT_ENE_ _BOND_ENE_ _ANGLE_ENE_ _DIHEDRAL_ENE_ _14LJ_ENE_ _14CF_ENE_ _LJ_ENE_ _CF_PME_ENE_

All kinds of energy output in the simulation process are recorded, which are the number of iterations (_ steps_), Temperature (_ TEMP_), Total energy (_ TOT_ POT_ E_), Key length (_ BOND_ ENE_), Key angle (_ ANGLE_ ENE_), Dihedral interaction (_ DIHEDRAL_ ENE_), It includes electrostatic force and Leonard Jones interaction.

Tutorial documentation:https://gitee.com/mindspore/d…

expectation

In the future version, more practical molecular dynamics simulation modules will be added to support more applications. After that, each module of pulse will gradually support automatic differentiation and automatic parallel, which will provide more friendly support for the bridging machine learning scheme. Welcome the majority of molecular dynamics enthusiasts and researchers to join us to jointly develop and maintain sponge.

Click follow to learn about Huawei’s new cloud technology for the first time~