Data type for financial scenario: numeric

Time:2021-10-25

The author has done the design related to database data type and the source code implementation from 0 to 1, and has a deep understanding of the design, source code implementation and calculation principle in memory of data types such as numeric (equivalent to decimal, which are all part of standard SQL), datetime, timestamp, varchar, etc.

Based on the source code of PostgreSQL, this article analyzes the memory computing structure and disk storage structure of numeric type in PostgreSQL.

C source code:https://github.com/postgres/p…

Header file:https://github.com/postgres/p…

Accuracy requirements

In the process of programming, you may be familiar with the built-in 4-byte float and 8-byte doulbe types for addition, subtraction, multiplication and division. Although floating-point numbers are stored by scientific counting method, the precision of some binary numbers is missing in the mutual conversion mechanism between binary and decimal.

For similar financial scenarios, it is not acceptable to store huge values and high requirements for data accuracy, even a small loss of accuracy. All kinds of databases on the market basically contain numeric type, and each digit is accurately stored through string, so as to achieve accurate calculation that floating-point numbers can’t do.

Introduction to numeric syntax

NUMERIC(precision, scale)

  • Precision: the sum of all numbers in numeric
  • Scale: the number of digits after the decimal point

For example: 12.345, then precision is 5 and scale is 3.

matters needing attention:

  1. All integers can be regarded as numeric with scale of 0;
  2. Precision must be a positive number, and scale can be 0 or a positive number;
  3. Numeric (precision) syntax. The default scale is 0;
  4. Without any parameters in the syntax, the values of any precision and scale can be stored as long as they do not exceed the maximum value of precision;
  5. As long as scale is declared in numeric, the entered values must be forced to match the scale (i.e. round, which is rounded);
  6. If the entered scale value overflows, an error will be reported.

When the precision is not specified, the value range of each value type is [common]:

Data type for financial scenario: numeric

Numeric special value

In addition to normal values, numeric supports a special value: Nan (meaning “not-a-number”). When you want to use it as a constant in SQL, you need to use quotation marks, for example:

    UPDATE table SET x = 'NaN'

Numeric data flow in SQL

We know that the execution process of an SQL in the database is roughly as follows:

Data type for financial scenario: numeric

    CREATE TABLE test (
            name VARCHAR(100) NOT NULL,
            price NUMERIC(5,2)
    );

    INSERT INTO test (name, price)
    VALUES ('Phone',500.215), 
                 ('Tablet',500.214);
                 

Take the two SQL statements in the above example as an example, first create a test table and insert data. Here, we focus on how the written numeric number is represented in memory and how the data structure corresponding to the defined numeric (5,2) is represented in memory. What is the storage structure of the written data after it falls into the disk.

Here, the storage structure of data in memory is different from that when the disk is dropped. Finally, the invalid bytes occupied in memory need to be removed when the disk is dropped. For example, varchar (100), if 100 bytes are allocated in memory and only three “ABC” bytes are actually written, the allocated memory is 100 bytes, and the 97 bytes not used in disk dropping should be removed. When the last three bytes are written to disk, data compression should be done. You can imagine that if the storage structure in the memory is written directly to the disk without any processing, if the amount of data is very large, it will waste more disk space!

Analysis of numeric disk storage structure

The structure numericdata is the structure that finally falls on the disk. As shown below, numericdata contains the union fields of numericlong and numericshort:

    struct NumericLong
    {
                    uint16          n_sign_dscale;  /* Sign + display scale */
                    int16           n_weight;               /* Weight of 1st digit  */
                    NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
    };

    struct NumericShort
    {
                    uint16          n_header;               /* Sign + display scale + weight */
                    NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
    };

    union NumericChoice
    {
                    uint16          n_header;               /* Header word */
                    struct NumericLong n_long;      /* Long form (4-byte header) */
                    struct NumericShort n_short;    /* Short form (2-byte header) */
    };

    struct NumericData
    {
                    int32           vl_len_;                /* varlena header (do not touch directly!) */
                    union NumericChoice choice; /* choice of format */
    };

Structure numericlong

    struct NumericLong
    {
                    uint16          n_sign_dscale;  /* Sign + display scale */
                    int16           n_weight;               /* Weight of 1st digit  */
                    NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
    };

uint16 n_sign_dscale : the upper two bits in the first byte are used to save the sign.

If 0x0000: the sign bit is positive

If 0x4000: the sign bit is negative

If 0xc000: Nan

The remaining 14 bits are used to save the display scale

int16 n_weight: save weights. Here I want to explain the meaning of weight here. Here, numeric is represented by a set of int16 arrays. Each element is represented by int16 as a 4-digit number, that is, the maximum storage is 9999. Then the base value is 10000. The base value of the weight here is 10000 (the base value of the weight in hexadecimal is 10 and the binary is 2).

Data type for financial scenario: numeric

NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER] : dynamic array (also called flexible array, which is uniformly called dynamic array here) is a feature added after C99. This feature is that in this structure, the dynamic array does not occupy any space, and its length is determined by VL in numericdata_ len_ decision.

Here we can see that there are two structures: long and short. For the early PostgreSQL version, the long storage method is used. Later, it is optimized and improved to the short storage method. In order to maintain forward compatibility, the improved version can still read the data stored in the previous version and retain the long storage method.

Structure numericshort

    struct NumericShort
    {
                    uint16          n_header;               /* Sign + display scale + weight */
                    NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER]; /* Digits */
    };
    

uint16 n_header: saves information about symbols, dynamic scales, and weights.

If 0xc000, it means that the numeric is Nan

Among the remaining 14 bits, 1 is used to save symbols, 6 to save dynamic scale and 7 to save weight.

NumericDigit n_data[FLEXIBLE_ARRAY_MEMBER] : refer to the flexible array description above.

Consortium numericchoice

    union NumericChoice
    {
                    uint16          n_header;               /* Header word */
                    struct NumericLong n_long;      /* Long form (4-byte header) */
                    struct NumericShort n_short;    /* Short form (2-byte header) */
    };
    

uint16 n_header: this two byte variable contains a lot of information. If n_ The highest two bit values of the first byte of the header are:

0x8000: the numericshort storage format is adopted

0xc000: Nan

In addition, the numericlong storage format is used.

Structure numericdata

    struct NumericData
    {
                    int32           vl_len_;                /* varlena header (do not touch directly!) */
                    union NumericChoice choice; /* choice of format */
    };

int32 vl_len_: used to save the length of a dynamic array. This array is a dynamic array defined in the numericlong or numericshort structure.

Analysis of numeric memory computing structure

    typedef struct NumericVar
    {
                    int                     ndigits;                /* # of digits in digits[] - can be 0! */
                    int                     weight;                 /* weight of first digit */
                    int                     sign;                   /* NUMERIC_POS, NUMERIC_NEG, or NUMERIC_NAN */
                    int                     dscale;                 /* display scale */
                    NumericDigit *buf;                      /* start of palloc'd space for digits[] */
                    NumericDigit *digits;           /* base-NBASE digits */
    } NumericVar;
    

NumericVarIs the format used to do arithmetic operations, indigit-arrayPart is the same as the storage format, but the header part is more complex. The following analysis is made respectively:

Data type for financial scenario: numeric

  • BUF: points to the memory start location actually requested for numericvar
  • Digits: points to the position of the first digit in actual use (the element here is int16, not 0)

    • One or two elements (int16) are generally reserved between buf and digts as possiblecarry(carry) Yes, of course, considering the actual situationnumericinleadingPart may have many zeros, which meansbufFollowdigitsThere can be many elements between them
  • Dscale: abbreviation for display scale, indicatingnumericHow many decimal numbers are there after the decimal point

    • In the current version, always > = 0,dscaleThe value of may be larger than the actual stored decimal places, which means that the extra part is 0 (trailing zeros). At the same time, it also means that meaningless zeros will not be written in when writing to the disk (saving disk space)
  • Rscale: here is a variable used in function calculation, the abbreviation of result scale, to save the accuracy of the target calculation result, which is always > = 0

    • rscaleNot saved inNumericVarIn, the actual value is based on the inputdscalesaffirmatory
  • Sign: sign orNAN
  • Weight: weight, which is a power of hexadecimal (digits – 1)

    • For example, 9999 9999 9999.9999 occupies three int16, and the weight is 2 (the principle is the same as that of hexadecimal weight, but the base value of int16 is 10000)
  • Ndigits: indigits[ ]Number of int16 in the array

Pay attention to the official account of Qingyun technology community, and then reply to the keyword “cloud original real combat”, then join the course exchange group.

Data type for financial scenario: numeric

author

Gao riyao Senior Database Kernel R & D, MySQL series product kernel development

This article is composed of blog one article multi posting platformOpenWriterelease!

Recommended Today

Swift advanced (XV) extension

The extension in swift is somewhat similar to the category in OC Extension can beenumeration、structural morphology、class、agreementAdd new features□ you can add methods, calculation attributes, subscripts, (convenient) initializers, nested types, protocols, etc What extensions can’t do:□ original functions cannot be overwritten□ you cannot add storage attributes or add attribute observers to existing attributes□ cannot add parent […]