Brief introduction of technology stack involved in scientific computing programming

Time:2021-10-22

introduction

In the first half of the year, I was responsible for a subject related to Python data analysis when I was working on my graduation project, and finally achieved a good result. I think the selection of technical scheme is of great value to share, and plays an introductory role in popularizing science for those who still know little about scientific computing (now the popular saying is data Science). Therefore, I write this article.


2.1.1 comparison of scientific computing programming languages

Introduction to MATLAB language:

Matlab is a scientific computing software developed by MathWorks company. The software mode is ontology and extension toolbox. It has powerful matrix calculation and data visualization capabilities. On the one hand, it can realize mathematical calculation in many fields such as numerical analysis, optimization, numerical solution of statistical partial differential equations, automatic control and signal processing. On the other hand, it can realize two-dimensional and three-dimensional drawing, three-dimensional scene generation and rendering, scientific calculation visualization, image processing Virtual reality and map making [13]. However, it is worth noting that MATLAB is a commercial software with a high price. In terms of software community ecology, it is becoming more and more unpopular in the field of data science. Take “matlab machine learning” and “Python machine learning” as keywords and compare them in Google analysis [14]. As shown in Figure 2.1, it will be found that the data science community of MATLAB is far less active than that of Python. From the perspective of embracing the active data science community so that they can find solutions when they encounter problems, the use of Python is obviously more conducive to the follow-up research and development of the subject.

Brief introduction of technology stack involved in scientific computing programming

Figure 2.1 comparison of Google analysis results of “matlab machine learning” and “Python machine learning”

Introduction to R language:

Similar to Matlab, R language is used for data processing and statistical analysis, which is loved by most statisticians. R language is derived from the statistical language s language developed by at & T laboratory and is basically compatible with s language [15]. R language is open source and free, maintained by a large and active global research community, and is deeply welcomed by many statisticians.

Julia language introduction:

Julia is a high-level dynamic programming language specially designed for high-performance numerical calculation. It provides unique support in distributed parallelization and accurate numerical calculation, and contains a large number of extensible mathematical function libraries. Especially in the fields of linear algebra, random number generation, signal processing and string processing, Julia integrates many mature and excellent open source libraries based on C and FORTRAN, with high performance and efficiency [16]. However, Julia was launched in 2012 and is still in vigorous development. Although it has a promising prospect, it is far less popular in the field of scientific computing than python.

Introduction to Python language:

Python language was born in 1989 by Guido van Rossum. It is a weakly typed high-level dynamic language. The latest version is 3.9. Python is open source and free, and has a very active scientific computing community ecology, which is far more active than the R language community. According to the programming language ranking released by tiobe, python has jumped to the third place, while R language is in the ninth place, as shown in Figure 2.2. Unlike MATLAB, R and Julia, which focus on scientific computing, Python is also popular in the field of web application development. Django, flask and other web frameworks written based on Python have been widely used in the field of software development. From the point of view that it is easy to transform the research results of this subject into specific data analysis solutions, choosing Python as the programming language of this subject obviously has more advantages than other languages. Consistent language features can significantly reduce the learning cost in the transformation of future achievements. Therefore, Python 3.8.2 is selected as the programming language for statistical model construction and analysis.

Brief introduction of technology stack involved in scientific computing programming

Figure 2.2 historical data of tiobe programming language ranking list

2.1.2 comparison of development platforms

Pycharm [17] Introduction:

Pycham is a python integrated development environment built by JetBrains. It has powerful functions such as debugging, syntax highlighting, project management, code jump, intelligent prompt, automatic completion, unit test, version control and so on. Pycharm is divided into community open source version and professional version. The professional version provides functions such as web development and database connection supporting Django, flask and other frameworks.

Visual studio code [18] Introduction:

Visual studio code is a lightweight editor launched by Microsoft. Its software architecture is platform plus plug-in mode. After loading the python plug-in, visual studio code can be regarded as an excellent small Python IDE, but there is still a long way to go compared with pycharm in terms of functional integrity. However, in view of its completely open source and free characteristics, there are many supporters and community participation is very active.

Spyder [19] Introduction:

An open source scientific computing IDE (integrated development environment) for Python language. Its design idea is similar to Matlab. It imitates the “workspace” function of MATLAB to facilitate the observation and modification of array values. It integrates numpy, SciPy, Matplotlib and other scientific computing software packages, and has a built-in interactive environment to process data. It is very friendly to data scientists, and is included in anaconda, Python’s scientific computing distribution. The goal of the project is to promote the use of Python in software development in science and engineering.

Introduction to jupyter Notebook:

A web-based interactive computing environment for creating Jupiter notebook documents. Different from the interactive page style generated after Spyder’s entire Python file is run, the interaction of jupyter notebook is based on Python code blocks, which means that users can selectively run some code blocks and get results according to their own wishes, which is more flexible than Spyder. Like Spyder, jupyter notebook is also included in anaconda, Python’s scientific computing distribution.

2.1.3 comparison of database technology

Introduction to MySQL [20] database:

An open source database with high performance, low cost, good reliability and very popular. It has a history of more than 20 years. It is one of the best relational database management systems at present. It is widely used in small and medium-sized websites on the Internet. With the continuous maturity of MySQL, it is also gradually used for more large-scale websites and applications. MySQL currently belongs to Oracle, and its database query language is SQL.

Mongodb [21] database introduction:

A document oriented database management system, defined as a non relational database, is committed to providing scalable high-performance data storage solutions for web applications. Different from the tables in mysql, the data structure of the query document supported by it is very loose, which is in the bson format similar to JSON, so it can store more complex data types. Mongodb is characterized by a powerful query language. Its syntax follows the object-oriented idea. It can realize most functions similar to single table query in relational database, and can also index data.

Epilogue

The technology stacks mentioned above are only for small and medium-sized datasets. For large or super large datasets, you need to use some additional technology stacks in big data, such as Apache Hadoop, Apache hive and Apache spark. Please also make a specific analysis.

reference

[5] MATLAB[EB/OL]. https://www.mathworks.com/products/matlab.html, 2020.05.11.

[6] R[EB/OL]. https://www.r-project.org/about.html, 2020.05.11.

[7] Julia[EB/OL]. https://julialang.org/, 2020.05.11.

[10] Anaconda[EB/OL]. https://www.anaconda.com/, 2020.05.11.

[11] Jupyter Notebook[EB/OL]. https://jupyter.org/, 2020.05.11.

[12] Christian Hill. Learning Scientific Programming with Python[M]. Cambridge University Press, 2015:160-317.

[13] (US) Holly Moore. Practical course of MATLAB (Second Edition) [M]. Gao Huisheng, translated by Tong Na, Li congcongcong, etc. Beijing: Electronic Industry Press, 2010:1-2

[14] Wang Shuyi. Can learning Python improve your competitiveness? [EB/OL].https://www.jianshu.com/p/4445fe0a7e16, 2020.05.11.

[15] (U.S.) Matloff (n.). R language programming art [M]. Translated by Chen Yanping, Qiu Yixuan, pan Lanfeng, etc., Beijing: Machinery Industry Press, 2013.6:1-2

[16] Edited by Wei Kun. Julia language programming [M]. Beijing: Machinery Industry Press, 2018.10:1-2

[17] PyCharm[EB/OL]. https://www.jetbrains.com/pycharm/, 2020.05.11.

[18] Visual Studio Code[EB/OL]. https://code.visualstudio.com/, 2020.05.11.

[19] Spyder[EB/OL]. https://www.spyder-ide.org/, 2020.05.11.

[20] MySQL[EB/OL]. https://www.mysql.com, 2020.05.11.

[21] MongoDB[EB/OL]. https://www.mongodb.com/, 2020.05.11.

(shallow knowledge, only introduction, throw a brick to attract jade, if there is any error, welcome to point out ~)
Brief introduction of technology stack involved in scientific computing programming

Recommended Today

Swift advanced (XV) extension

The extension in swift is somewhat similar to the category in OC Extension can beenumeration、structural morphology、class、agreementAdd new features□ you can add methods, calculation attributes, subscripts, (convenient) initializers, nested types, protocols, etc What extensions can’t do:□ original functions cannot be overwritten□ you cannot add storage attributes or add attribute observers to existing attributes□ cannot add parent […]