Read some IEEE 754 implementation of floatingpoint arithmetic related articles
 IEEE 754 (IEEE 7542019)
 Floatingpoint arithmetic
 Significand
 JavaScript floating point number trap and solution
 Basic field: on floating point numbers
 In depth analysis of floating point numbers
 What is the difference between quiet NaN and signaling NaN?
 Why is the most secure integer in JavaScript 2 to the 53rd power minus one?
 How numbers are encoded in JavaScript
 How to understand the rounding scheme of IEEE754? Haifeng’s answer – Zhihu
 ECMA 262
 Discussion on the precision of IEEE 754 floating point number
 Lecture Notes on the Status of IEEE Standard 754 for Binary FloatingPoint Arithmetic – Kahan
 What Every Computer Scientist Should Know About FloatingPoint Arithmetic
 What you should know about floating point numbers
Write and read（Unless otherwise specified below, all symbols used in expression are the corresponding initial letters of English; floatingpoint numbers also refer to binary floatingpoint numbers; all contents are based on IEEE 7542019)
name  radix  Significant bits (including 1 implied integer digit)  Decimal digits (precision = LG2 ^ signed bits)  Exponential bit  Fixed offset value  E min  E max 

Binary 16 semi precision floating point number  2  1 + 10 = 11  lg2^11 ≈ 3.31  5  2^(51) – 1 = 15  14 = 1 – +15  2^(51) – 1 = +15 
Binary 32 single precision floating point number  2  24  7.22  8  127  −126  +127 
Binary 64 double precision floating point number  2  53  15.95  11  1023  −1022  +1023 
Binary 128 four precision floating point numbers  2  113  34.02  15  16383  −16382  +16383 
Binary 256 eight precision floating point numbers  2  237  71.34  19  262143  262142  +262143 
31

 30 23 22 0
    
type +++++ value
Special value * 0.0
min subnormal number * 00000000 00000000000000000000001 ±2^−23 × 2^−126 = ±2−149 ≈ ±1.4×10^45
max subnormal number * 00000000 11111111111111111111111 ±(1−2^−23) × 2^−126 ≈ ±1.18×10^38
min normal number * 00000001 00000000000000000000000 ±2^−126 ≈ ±1.18×10^38
±1.0 * 01111111 00000000000000000000000 ±1.0
max normal number * 11111110 11111111111111111111111 ±(2−2^23) × 2^127 ≈ ±3.4×10^38
Special value * 11111111 10000000000000000 ±∞
Special value 0 11111111 10000000000000000 qnan
Special value 0 11111111 010000000000000000 snan
+++++
    
 ++++
   
  v 
 the implicit bit
 v v
 exponent fraction
v
sign
32bit single precision floatingpoint number
Floating point storage structure consists of three parts
 S sign bit sign
 0 is positive
 1 is negative
 E is the digit exponent
 Exponential bit（Offset index, also known asOrder code）It is represented by an unsigned integer with a range of:
[0, 2^e  1]
(Offset index = Real index + Fixed offset value. Fixed offset value =2^(e1)  1
)
Offset index:
0
expressFloating point numbers in non canonical formorSpecial value ± 0 If the decimal part of the mantissa is not 0, it meansUnconventional floating point numbers
 If the decimal part of the mantissa is 0, it meansSpecial value ± 0(sign bit dependent)

Offset index:
(0, 2^(e1)  1)
expressNegative index 
Offset index:
2^(e1)  1
express± 0 index 
Offset index:
(2^(e1)  1, 2^e  1)
expressPositive index 
Offset index:
2^e  1
expressSpecial value ±∞orSpecial value Nan If the decimal part of the mantissa is 0, it meansSpecial value ±∞(sign bit dependent)
 If the decimal part of the mantissa is not 0, it meansSpecial value Nan
 qNaN(quiet Nan) the highest decimal part of the mantissa is 1
 When you change the highest bit to 0, you may getSpecial value ±∞(sign bit dependent)
 sNaN(signaling Nan) the highest decimal part of the mantissa is 0
 The highest bit is changed to 1qNaN
 Generally, qnan is used to make the operation normal, and snan is used to throw an exception (whether to throw an exception depends on the state of floating point unit FPU). See the difference between qnan and snan
 qNaN(quiet Nan) the highest decimal part of the mantissa is 1

 useAdvantages of offset index: can be represented as an unsigned integer of E units in lengthReal indexThis makes it easier to compare the exponents of two floatingpoint numbers
 Exponential bit（Offset index, also known asOrder code）It is represented by an unsigned integer with a range of:
 Mantissa / significant number (significant is also called mantissa, which is equal to the implicit bit + fraction)
 Conventional and non conventional floating point numbers
 The offset index: (0, 2 ^ e – 1), that is [1, 2 ^ e – 2], representsFloating point numbers in reduced form. reduced floating point numberThe implied integer bit is 1
 If the offset index is 0 and the decimal part of mantissa is not 0, it means thatFloating point numbers in non canonical form. nonstandard floating point numberThe implied integer bit is 0
 The migration index of non conventional floatingpoint numbers is 1 less than that of conventional floatingpoint numbers
 For example, the migration index of single precision floatingpoint number with minimum specification form (32bit = 1s + 8e + 23F) is 1: ( 126 + 127), and the actual index is – 126; while the migration index of non conventional single precision floatingpoint number is 0: ( 126 + 127 – 1), and the corresponding actual index is – 126 instead of – 127
 useAdvantages of implicit integer bits: increased the effective length of 1bit floatingpoint number
 useThe advantages of non conventional floating point numbers(advantages of gradual overflow gradual underflow): avoidedSudden downward overflowBreak underflow, so that the gap between each floatingpoint number is consistent=
2^(f + (1  (2^(e1)  1)))
 Conventional and non conventional floating point numbers
Characteristics of floating point numbers

It can only be expressed precisely by binary scientific notation
(1)^s*m*2^e
If M exceeds the precision, it will be rounded to zero automaticallyThis is why floatingpoint numbers such as 0.1 and 1.1 cannot be accurately stored
//The following is a double precision floatingpoint number implemented by JavaScript, with an accuracy of 15.95 and about 16 significant digits Significant number is the length of all numbers in a number from the first nonzero number of the number (0.1). To precision (16); // 0.100000000000 "for 0.1, the significant number is 16 bits (0.1).Toprecision (17); // "0.1000000000001" for 0.1, the valid number is 17 bits (0.1).toPrecision(18); // "0.100000000000000006" (0.1).toPrecision(22); // "0.1000000000000000055511" (1.1). To precision (16); // 1.10000000000 "for 1.1, the significant number is 16 bits (1.1). To precision (17); // 1.10000000000001 "for 1.1, the significant number is 17 bits (1.1).toPrecision(18); // "1.10000000000000009" (1.1).toPrecision(22); // "1.100000000000000088818" 1.000000000000001; // 1.000000000000001, the number of significant digits is 16 1.000000 million 0001; // 1 the 1 of the 17th bit is omitted

Statute formMaximum value of floating point number:
±(1 + (2^1 + 2^2 + ... + 2^f)) * 2^(2^(e1)  1)
<=>±(2  2^f) * 2^(2^(e1)  1)
.For double precision floatingpoint numbers, the maximum value of its specification is as follows:
±(2 2^52) * 2^1023 === ±1.7976931348623157e+308
,1.7976931348623157e+308
It’s also in JavaScriptNumber
Object static propertiesMAX_VALUE
(note that it is not a safe integer), greater than this value means ∞（Number.MAX_VALUE * 1.000000000000001 === Infinity; Number.MAX_VALUE + 1e+292 === Infinity
) 
Non conventional formMinimum value of floating point number:
±2^(f + (1  (2^(e1)  1)))
.For the double precision floatingpoint number, its non conventional minimum value is:
±2^(521022) === ±5e324
,5e324
It’s also in JavaScriptNumber
Object static propertiesMIN_VALUE
Is less than 0 
Safe integer range of floating point numbers(the safe integer range means that floatingpoint numbers and integers can be onetoone):
[(2^m  1), 2^m  1]
For double precision floatingpoint numbers, the safe integer is:±2^53  1 === ±9007199254740991
A floatingpoint number corresponds to multiple real numbers, as shown in the figure belowThis is also in JavaScript
Number
Object static propertiesMAX_SAFE_INTEGER
andMIN_SAFE_INTEGER
Value of2^53 + 1
Expressed in binary as:1000...0001
(54 bits in total, two ones are 2 ^ 53 and 2 ^ 0 respectively)1.000...0001 * 2^53
Since the mantissa of a double precision floatingpoint number can hold up to 52 bits of binary, the last one is bound to be discarded2^53 + 1
And2^53
Storage consistency, i.e2^53 === 2^53 + 1
, 2 ^ 53 is not a safe integer 
An integer that can be represented exactly by a floatingpoint number (except for numbers in the range of safe integers)Take double precision floatingpoint numbers as an example: since the decimal part of mantissa can only store 52 digits at most, there are two types of integers that are larger than the safe integer range of floatingpoint numbers and need to be accurately represented

One is to increase the size of the index in the range of the index and keep the mantissa always at
1.0
Number of:2^54
,2^55
,2^56
, …,2^1023
, these are exact numbers 
The other is the number whose exponent and mantissa are changed at the same time: for numbers between [2 ^ 53,2 ^ 54), because there are 53 decimal places in the mantissa, and the 53rd digit is bound to be omitted. So long as we guarantee that the 53rd digit of the mantissa is 0, then the number can be accurately guaranteed, that is, the even number between [2 ^ 53, 2 ^ 54) can ensure that the 53rd digit is 0, so it can be accurately expressed;
In the same way, the number between [2 ^ 54, 2 ^ 55), the 53rd and 54th bits are bound to be omitted. So long as we guarantee that the 53rd and 54th bits of the number are both 0, then the number can be accurately guaranteed, that is, between [2 ^ 54, 2 ^ 55), the spacing becomes a multiple of 4, so as to ensure that the 53rd and 54th bits are both 0, and can be expressed accurately, and so on

Comparison of floating point numbers
 Floating point numbers are basically compared in the order of sign bit, exponential field and mantissa field. Obviously, all positive numbers are greater than negative numbers. When the sign is the same, the larger binary representation of the exponent is, the larger the floatingpoint value is; if the sign bit and index bit are the same, the floatingpoint value of larger mantissa is larger
Floating pointFive rounding methods(four rounding methods for binary floating point numbers)

Round to the nearest value

Round to the nearest value, roundtiestoeven: will round the result to the nearest and representable value. If the same is close, select the least significant bit that is even (the least significant bit of mantissa is 0); if the least significant bit is the same (for example, the least significant bits of decimal floatingpoint numbers 9.5, 9 and 1 * e ^ 1 are all odd), select the one with larger magnitude（For positive numbers, the larger the order is; for negative numbers, the smaller the order is）This is usually the default rounding method for binary floatingpoint numbers and the recommended rounding method for decimal floatingpoint numbers
//Round to nearest, ties to even example //9.5 represents a floatingpoint number in binary scientific notation, rounded to one digit 9.5 => 1001.1 => 1.0011 * 2^3 //The two nearest to it are 10 and 9, respectively 10 => 1010 => 1.010 * 2^3 9 => 1001 => 1.001 * 2^3 //The distances between 10 and 9 and 9.5 were 1.010 * 2^3  1.0011 * 2^3 = 0.0001 * 2^3 // 0.1 1.0011 * 2^3  1.001 * 2^3 = 0.0001 * 2^3 // 0.1 //The distance is the same, and the least significant bit is compared //The least significant bit of 1.010 * 2 ^ 3 is even //The least significant bit of 1.001 * 2 ^ 3 is odd //Therefore, 9.5 is rounded to one digit and is 10 instead of 9 //0.95 is represented as a floatingpoint number in binary scientific notation, rounded to one bit 0.95 => 0.11 1100 1100 1100 1100 1100 1 //The two nearest to him are 1 and 0.9, respectively 1 => 1.00 0000 0000 0000 0000 0000 0 0.9 => 0.11 1001 1001 1001 1001 1001 1 //The distance between 1 and 0.9 and 0.95 was 0.95 1.00 0000 0000 0000 0000 0000 0  0.11 1100 1100 1100 1100 1100 1 = 0.00 0011 0011 0011 0011 0011 1 0.11 1100 1100 1100 1100 1100 1  0.11 1001 1001 1001 1001 1001 1 = 0.00 0011 0011 0011 0011 0011 0 0.00 0011 0011 0011 0011 0011 1 > 0.00 0011 0011 0011 0011 0011 0 //0.9 is a little closer to 0.95, so 0.95 rounded to one digit is 0.9 instead of 1

Round to the nearest value, roundtiestoaway: rounds the result to the nearest and representable value. If it is as close as possible, select a larger order of magnitude（For positive numbers, the larger the order is; for negative numbers, the smaller the order is), Binary floatingpoint numbers do not require this roundingAnd decimal floatingpoint numbers should provide this rounding method for users to choose


Directional rounding
 Round to ward positive, also known as rounding up ceil: rounds the result to positive infinity
 Round toward negative, also known as rounding down floor: rounds the result in the direction of negative infinity
 Round towardzero, also known as truncation: rounds the result in the direction of 0

In JavaScript
Math.round(x)
Rounding of static methods Returns the integer closest to X. if two integers are equal and close, then it is closer to + ∞; if it is already an integer, it returns itself
Binary floating point numberexception handling
 For operations that are not defined mathematically, such as 0 / 0, sqrt ( 1.0), etc., qnan is returned by default
 Division by zero. The divisor is zero and the divisor is a finite nonzero number. The default return is ±∞
 The result of the operation exceeds the range e Max that can be expressed by the exponent, and it returns ±∞ by default
 Underflow. The result of the underflow. Operation exceeds the range of normal numbers of the specified floatingpoint number. By default, it returns the subnormal numbers or 0 of the non conventional floatingpoint number (following the rounding rule)
 Inexact. The result of the inexact. Operation cannot be expressed exactly. The rounding value of the exact result is returned by default (following the rounding rules)
On line conversion (binary and decimal) links to floating point numbers
 IEEE754 Floating Point Converter – Single precision 32bit
 IEEE754 Single precision 32bit
 IEEE754 Double precision 64bit
Download standard
 IEEE 7542019