On December 19, 2020 China. Net developers conference was held in Suzhou. With the theme of “open source, sharing and innovation”, combined with offline and online real-time synchronous live broadcasting, the conference collected more than 40 technology masters from well-known enterprises such as Microsoft and godson, and brought nearly 50 technical lectures and. Net application practices to more than 500000 developers.
Mr. Wang Hong, the person in charge of table technology of grape City, was honored to be the guest speaker of this conference and shared the experience accumulated in the optimization of grape city’s high-performance table technology with the present. Net developers.
Wang Hong, as the person in charge of table technology in grape City, has been focusing on the research in the field of enterprise high-performance table technology since 2014, designed a new table component architecture for grape City, led the R & D team to launch an industry-leading spreadsheet component gcexcel, and accumulated a lot of architecture design experience in high concurrency and high availability table components.
In this sharing, starting from the background and original intention of developing spreadsheet components in grape City, Wang Hong compares the reading performance differences between excel and native c# code in detail, and summarizes several means of performance tuning for prototype, such as reducing the impact of garbage collection, sharing objects to improve performance, compressing data to reduce memory, making full use of cache, etc.
The following is the main content shared by Mr. Wang Hong:
1 \. Background and original intention of grape city’s research and development of spreadsheet components
As early as more than 30 years ago, spreadsheets, as a basic function suite in office software, first appeared in personal computers. In recent years, with the further strengthening of network informatization, the application of spreadsheet is more and more extensive and in-depth.
Today, “table” has also become an important form of data, which is widely used in the storage structure and system composition of various desktop software, application systems and SaaS platforms. People are used to using tabular tools to deal with large-scale data of industries such as finance and taxation, finance, securities, insurance, industrial manufacturing, logistics and warehousing. Typical representatives include Microsoft Excel, Google Spreadsheet, WPS and so on.
Grape City, as the world’s leading development technology provider, has long invested in R & D efforts to explore and explore how to embed spreadsheets into various systems in the form of components. After nearly 30 years of research, grape city’s table technology has not only retained the user’s habit of using Excel, but also provided efficient data processing and visualization capabilities in the business system based on the user’s experience and accumulation.
2 \. Test the limit performance of Excel file reading through c# code
The application scenarios of spreadsheets are generally complex. Developing and implementing them will encounter many technical difficulties, the most typical of which is the performance problem.
In order to realize high-performance table components, grape city has overcome many performance challenges, such as how to quickly open and save a spreadsheet file, how to calculate massive formula functions, how to let users quickly complete the value and style settings of a large number of cells, etc.
In order to test the reading performance of c# code to Excel files, Mr. Wang Hong chose a scenario that is easy to encounter in daily life: when a spreadsheet file is large (including 30 columns, 1000000 rows and 30000000 cell data), it takes 34 seconds to open it in Excel.
If users want to open such a large file in a shorter time, is there a way to do so? The answer is yes. After testing, it takes only 12 seconds to open such a file with gcexcel, a table component of grape city.
3 \. Optimization of grape City Table Technology: reducing the impact of waste recycling
Since Office 2007, excel file has been a standard zip file. After decompressing it, find a folder named “worksheets”, in which the “Sheet1. XML” file stores the corresponding position and value of each excel file cell.
How to extract these positions and values through more efficient algorithms is the optimization direction of grape city table technology. For an unoptimized c# code, it takes 27 seconds to read out the value of each cell and store it in a list. Here, we just change the object in the code to double, which can be completed in 20 seconds.
This seven second gap is due to the impact of garbage collection. There are too many object objects in the list, which consumes a lot of garbage collection time. Although they are not recycled, because they are objects, it is necessary to constantly detect whether they can be recycled in the garbage collection process. Change the object to double, and the garbage collection time can be ignored.
How does grape city form technology overcome the impact of waste recycling?
- Eliminate the concept of cells. Because there are too many cells, it is difficult to keep this concept and reduce the number of objects very low. Grape City processes the original data in the cell type separately, and removes the style for additional processing. Only the cell value is considered here. There may be four types of cell values in spreadsheet: number, text, Boolean and error. Numbers are internally represented by double type, and Boolean and error can also be stored by double. For text, you can’t use conventional methods to store it, but we can think of a way to make double store text, which will be described in detail later. In short, double can store all excel data, so we can design a simple cell data object. It is a structure, not an object.
- Change row storage to column storage. The maximum number of rows in the spreadsheet is the 20th power of 2 (about 1048576), while the maximum number of columns is the 14th power of 2 (about 16384). Therefore, changing row storage to column storage can reduce the number of objects. Using C #’s generics, let the dictionary store value type data. After such a transformation, the number of objects will drop from more than 90 million to more than 10000, and the impact of garbage collection is basically ignored.
4 \. Grape City Table Technology Optimization: shared storage and style compression
The so-called shared storage means that only one copy of common objects in the whole software is generated and placed in a global place. Each object uses a number as its ID, and only this ID is stored in other places. Take the following text as an example. All texts are stored in a table. Each text has a unique ID, and only this ID is stored in the cell.
Shared storage has many advantages, which can significantly reduce memory and improve performance:
- Reduce memory consumption. For a software with a large amount of data, a lot of data will be repeated. For example, in a spreadsheet, if the number of texts is large, the repetition rate is very high. Store these texts in a centralized manner and only one copy of the same. You will find that the number of different texts is actually very small, which will greatly reduce memory consumption.
- Improve performance. Take the search and replacement of strings as an example. If there is no shared centralized storage, you must scan all cells and compare the text of each cell. But with shared storage, you just need to look up the “dictionary”; In string comparison, you only need to compare whether the numbers are equal; In sorting and other operations, you only need to establish an index for the strings in the table, arrange the order, and then use this index when necessary.
Through shared storage, grape city table technology has greatly reduced the memory consumption, but we can further optimize it. Look at this figure. We store a number in each cell, pointing to a grid style object in the global style sheet. However, we find that the repetition rate of these numbers is very high, because the style of an area in the spreadsheet is often the same. According to this feature, we can easily think of an optimization method.
Instead of storing a number in each cell, we create a table on the graph. One column of the table records an area (a rectangle) and the other column records the ID of the style corresponding to the area. In this way, we can reduce the storage space of the style a lot. At the extreme, if the entire worksheet is the same style, we only need to store a rectangle and a number here.
5 \. Grape City Table Technology Optimization: make full use of cache
Today’s CPUs have cache, which can improve the performance of CPU processing data. This figure shows the relationship between different storage media and CPU: the faster the storage media is closer to the CPU, the faster the speed, the smaller the space and the more expensive the price.
When the CPU reads data, it first reads from the nearest cache. Only when it misses, it reads from the next level cache. Therefore, improving cache hit rate is of great significance to improve performance.
At the same time, we should also be clear that when data is copied from a slower storage medium to a faster storage medium, it is not copied one by one, that is, when we access a data in memory, the adjacent data around it is also read into the cache nearest to the CPU. Therefore, if the adjacent data is read at the next time, it can be found directly in the cache, and the reading speed will be very fast.
Knowing the principle of cache, how to write cache friendly code?
·When designing components, you should pay attention to what data structures you choose, such as arrays and dictionaries, which may be affected differently by the cache.
·When accessing data, pay attention to the storage mode of data.
6 \. Grape City Table Technology Optimization: Other Practices
In addition to the above optimization methods, grape city’s table technology has more optimization practices. Such as creating cache, using set based operation, using SIMD to calculate a large amount of data, etc.
The above is the main content shared by Mr. Wang Hong. Through the vivid example code + time comparison, we have also formed a very deep understanding of the high-performance table technology of grape city.
At the same time, the high-performance table technology shared by Mr. Wang Hong at the conference has been implemented. At the front end, the pure front-end table control spreadjs can carry out online editing, calculation and display for Excel and grid data; On the back end, the server-side table component GrapeCity documents for Excel (gcexcel for short) can batch process EXCEL documents and perform more efficient export and printing.