From “apply for credit card” to help you explain to your parents what a distributed database is


What is a database? For ordinary people, they may not be exposed to it in their daily life. As database practitioners, they also scratch their heads. For example, when we turn on the tap, we have water, and when we press the switch, we have electricity. We rarely trace the source and ask where the water comes from and where the electricity comes from. Today, let’s talk about Greenplum around you.

When we use mobile banking to view revenue and expenditure records, open social apps to see new messages,Databases are like invisible pipes and wires that we can’t see. The moment we turn on applications, we have the data and information ready


There are many cases of Greenplum used by all walks of life around us, one of which is closely related to our daily consumption – credit card application。 I don’t know if you have any experience of early credit card application? It will take at least one to two months from submission to card release. But now many banks can “batch in seconds”. Users can check the progress immediately after submitting the application, and they can get the card within a week. This experience can not be greatly improved without the role of the database in the background.

Credit card is the credit certificate issued by bank or credit card company to qualified consumers. In order to ensure that consumers are trustworthy and have the ability to repay at the time, banks need to make judgments based on the applicant’s personal information, transaction records, professional characteristics, existing assets and other dimensions. These data come from user submission, other bank retrieval, and other credit channels, and cross comparison of these data is also needed to verify the authenticity of the data to avoid fraud. In the era of manual review, applicants submit paper applications, and banks mail these paper forms back to the credit card audit center. Auditors check all materials one by one, and cross search to verify the authenticity of the data. For example, a 30-year-old white-collar working in a second tier city, whether his income matches his annual transaction records, and whether his career information is true, Whether the operation of the company is stable. Ordinary people look at the bank income and expenditure records are still big, to manually check so much information, and so on data between the round-trip, it took most of the time.

On the one hand, the key to speed up is to go digital and paperless.The paper application is replaced by online application, and the information can be entered quickly as soon as it is submitted, and the application record is generated. Then, machine audit and manual review replace pure manual audit. The system can quickly retrieve the bank’s own business database and data sources from other channels, quickly match the basic requirements and give the score. If the applicant can meet all the requirements, it can immediately approve and realize the second batch. In most cases, the bank will conduct a second round of manual review to ensure that the information is fully verified and reduce the overdue risk.


On the other hand, banks improve their own credit database and business database, and the industry general database has been developedSuch as the national personal credit reporting system and UnionPay credit risk sharing system to realize the user credit data exchange. Around 2000, the Central Bank of China took the lead in the construction of a national personal credit information system, which input personal related information into a unified database, so that all domestic commercial banks can access and query personal credit information.

Of course, data connectivity is a good thing. It means that you can handle the corresponding banking business in any province or city in China, instead of going back to a specific bank in your home country. But there are also a variety of technical problems to be solved. Imagine the credit data system as a huge warehouse, in which all kinds of personal information is like a file containing personal information. When banks want to extract personal information, they will encounter the following challenges:

  • When the business is busy, hundreds of people come to extract data at the same time. How should the warehouse respond?
  • If several people keep or take the same record at the same time, how should the warehouse solve it?
  • The warehouse keeps adding new data every day. What if it can’t be loaded?


The distributed database similar to Greenplum can solve the above problems, and it is also the core technology for banks to quickly query information and get analysis results quickly.Distributed database is developed on the basis of the traditional centralized database. The evolution between the two can be understood as: before, all the data were put in the same huge warehouse. With the increase of data, the available space becomes smaller and smaller, the operating cost is higher and higher, and the time of searching for files is longer and longer. When the amount of data reaches a certain critical value, the existing hardware platform can not carry more data. Distributed database does not need a large warehouse, but to build a set of small warehouses, which can be distributed in different places, but are managed by a system. The administrator can find the corresponding data in each small warehouse through the system.

Every day, the warehouse will accept queries from all over the country. In 2018, China’s credit reference system has established a unified credit file for 970 million natural persons, with an average daily personal credit report query of more than 4.77 million times. Imagine that there are millions of people going in and out every day. Some need to take and some to save. If you don’t pay attention, there may be a “big traffic jam”.Greenplum is based on massive parallel processing architecture (MPP)The headquarters will allocate these queries to different small warehouses for processing at the same time. Because each small warehouse has its own storage space, computing and scheduling capabilities. Users can configure the scale of warehouse cluster reasonably according to their needs. Compared with single machine system, the architecture of MPP cluster can control more storage and computing power, and has unlimited expansion ability in theory. So with more data,No matter how large the access requirements are, they can be easily handled

In addition, when several people come to access the same record at the same time,Greenplum provides perfect transaction support through multi version concurrency control and fine-grained lock managementIt makes it possible for multiple people to access and update at the same time. When someone comes to read the record, the system will let him see the current version of the record; if someone modifies the record at the same time, the system will let the writer see a copy of the version. In this way, both reading and writing are correct, which can reduce the waiting time. At the same time, it can ensure that the correct data is always read, and the consistency of the write operation data is not damaged.

With the increase of data, the warehouse is facing the problem of insufficient space. Compared with building a super large warehouse in the past, which not only consumes resources and costs, but also lacks flexibility,Greenplum’s solution is to adopt a flexible “franchise” approach。 These warehouses have their own independent management system and independent storage space. The warehouses are connected by network. Therefore, when the space is insufficient, it is only necessary to build new small warehouses. These small warehouses are cheaper and more flexible, can meet more stringent application scenarios, and bring the advantages of cost, benefit and risk control. This is just like franchising, where stores are operated independently before, but linked by a unified brand and network. For banks, this structure reduces the cost of maintaining the database. With the increase of business volume, it can easily expand nodes and improve the storage and computing capacity of the whole system.


Image: Greenplum database structure

Greenplum, as the world’s first open source distributed database, helps many Chinese commercial banks to speed up business processing and provide customers with a better experience!As a major enterprise oriented product, Greenplum’s efforts and improvement in technology ultimately serve the real life and work scene, and serve every user.

Reference articles:………_129977374.htm

2015 China Financial Development Report: theory, exploration and practice of social credit system construction

For more information about greennum’s technical dry goods, please visit the Greenplum Chinese community website.


Recommended Today

Swift advanced (XV) extension

The extension in swift is somewhat similar to the category in OC Extension can beenumeration、structural morphology、class、agreementAdd new features□ you can add methods, calculation attributes, subscripts, (convenient) initializers, nested types, protocols, etc What extensions can’t do:□ original functions cannot be overwritten□ you cannot add storage attributes or add attribute observers to existing attributes□ cannot add parent […]