**This article is about my GBA library lib_ The fixed-point part of Mathematical Library in HL.**

# What is the fixed number? Why use fixed points?

In the previous article, I have introduced the hardware of GBA, and its CPU has no floating-point unit!

What I want to write is a ray tracing program, which basically does accurate mathematical operations, but the CPU doesn’t even support floating-point numbers. Isn’t that a no play?

Of course, there are ways:

1. Using software floating-point numbers to simulate floating-point numbers at the software level is much slower than that of hardware floating-point numbers. Ray tracing is a computation intensive program, which certainly can’t work;

2. The use of fixed-point number, when computers generally do not have floating-point unit, we are using fixed-point number instead of decimal operation. The speed of fixed-point number operation is only several times slower than that of integer operation, which is acceptable.

Fixed point number uses integer to represent decimal point through fixed decimal point position,**The range of fixed-point number is smaller than that of floating-point number**And its range and precision can not be both.

On the detailed principle of fixed point number,See another article I’m not going to write, can be Baidu

## hl_types.h

At the beginning, I wrote a. H header file, which contains the data types to be used and the macro definitions of some common operations.

No matter in what program, it is very necessary to define the data type, because the length of int, long, long long is different in different compilers, and it is also different in the case of 32-bit / 64 bit. For the strong adaptability of the program, the data type with self-defined length should be used.

The basic data type code is as follows:

```
Typedef signed char S8; // 8-bit signed integer
Typedef signed short S16; // 16 bit signed integer
Typedef signed int S32; // 32-bit signed integer
Typedef signed long long S64; // 64 bit signed integer
Typedef unsigned char U8; // 8-bit unsigned integer
Typedef unsigned short U16; // 16 bit unsigned integer
Typedef unsigned int U32; // 32-bit unsigned integer
Typedef unsigned long long U64; // 64 bit unsigned integer
Typedef volatile signed char vs8; // variable 8-bit signed integer
Typedef volatile signed short vs16; // variable 16 bit signed integer
Typedef volatile signed int vs32; // variable 32-bit signed integer
Typedef volatile unsigned char vu8; // variable 8-bit unsigned integer
Typedef volatile unsigned short vu16; // variable 16 bit unsigned integer
Typedef volatile unsigned int vu32; // variable 32-bit unsigned integer
```

Then there are some types that change with 32 / 64 bit systems:

```
#ifdef _X64
typedef long long _stype;
typedef unsigned long long _utype;
#define _XLEN 8
#else
typedef int _stype;
typedef unsigned int _utype;
#define _XLEN 4
#endif
//Common pointer type
typedef void *t_pointer, *t_ptr;
//Integer address type
typedef _utype t_addr;
```

By predefined a_ X64 macro, can make_ Utype is 4 bytes in 32 bits and 8 bytes in 64 bits. Although our GBA is certainly 32-bit, if we want to migrate the program to 64 bit computer, we should pay attention to the pointer type and address length changes.

Then there are some common definitions

```
//Boolean type
typedef int Bool;
#ifndef NULL
#define NULL 0
#endif
#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif
/*Inline function declaration*/
#define _INLINE_ static inline
/*Gets the offset of the element from the start address of the structure*/
#define _OFFSET(_type,_element) ((t_addr)&(((_type *)0)->_element))
...
#define BIT(n) (1<...
```

Other definitions will be supplemented when necessary. Now we can start to write the math library.

## hl_math.h

Add at the beginning of the document:

```
#pragma once
#include
```

#Pragma once is written for the compiler, which means that the code is compiled only once.

The reason why this sentence is added to the header file is that the header file is referenced in C by directly copying the memory of the header file to the # include location. If the same header file is included in multiple files, the macro and structure will be defined many times during compilation, which will cause compilation errors.

Another way to write for all compilers is:

```
#ifndef _XXX_H
#define _XXX_H
code...
#endif
```

### Define the number of fixed points

After that, we start to write real code. First, we define the fixed-point number type

```
//32-bit fixed point number
typedef s32 fp32;
//32-bit fixed point number的小数位数 20bit
//Integer size-2048-2047, decimal precision 0.000001
#define FP32_FBIT 20
#define FP32_1 (1<>(29-FP32_FBIT))
#define FP32_SQRT2 (1518500249>>(30-FP32_FBIT))
#define FP32_SQRT3 (1859775393>>(30-FP32_FBIT))
#define FP32_F2(n) (1<
#define FP16_1 (1<
```

You can see that my fixed-point number has 32-bit and 16 bit, 32-bit is called fp32, which is mainly used for most operations with high accuracy, and 16 bit is called fp16, which is mainly used for color operations with low accuracy.

**fp32**In order to calculate the accuracy, 20 decimal places are assigned to the decimal part (it can be said that we attach great importance to the accuracy), so that the fractional value of the decimal is 1 / 2^{20}, to 6 digits after the decimal point, while the integer has only 12 digits^{11}=2048, the range is – 2048 ~ 2047.

**fp16**The length of the decimal is valid, and 10 decimal places are allocated, which is only 1 / 2^{10}=1 / 1024 is the precision of 0.001, while the integer has only 5 bits, ranging from – 32 to 31.

In addition to defining the decimal length fbit, I also define the corresponding fixed-point number of some common values, such as 1, 0.5, π.As you can see, 1 of the fixed-point number is 1 * 2^{20}0.5 is 0.5 * 2^{20}That’s how fixed-point numbers work.

In the same way, we can write several conversion functions

```
//int -> fp32
static inline fp32 fp32_int(int n) { return n << FP32_FBIT; }
//float -> fp32
//static inline fp32 fp32_float(float f) { return (fp32)(f * (1 << FP32_FBIT)); }
//int/100 -> fp32
static inline fp32 fp32_100f(int n) { return (((s64)n << FP32_FBIT) + 50) / 100; }
//fp32 -> int
static inline int int_fp32(fp32 f) { return f >> FP32_FBIT; }
```

After reading the code, we should have a deeper understanding of fixed points.

All functions are preceded by

static inline，inlineIt is to declare that this function is an inline function, that is, it will be expanded at compile time to avoid the cost of function call. For our common and short operation function, of course, we need to add. But inline is just a suggestion to the compiler. The compiler may not listen to it. If it thinks that this function is too large and inline is not cost-effective, it will not inline. At this time, this function becomes a common function defined in the header file, which will cause a problem. If the header file is included many times, the function will be redefined, so addstatic, declared as a static function, only visible in the file where it is declared to avoid naming conflicts. In fact, to write normally, we should use the definition before_ INLNE_ To prevent switching to a compiler that does not support the static inline feature.

### Fixed point number operation

Then there’s the arithmetic function. The first is addition and subtraction, which is the same as integer operation.Its operation principle is as follows:

Hypothesis:

Integer a is the fixed-point form of decimal a, that is, a = a * fs (FS = 1)<

Integer B is the fixed-point form of decimal B, that is, ~ b = b * fs (FS = 1)<

Then the formula of fixed point number a plus fixed point number B is:

** A (+) B = a*fs (+) b*fs = (a+b)*fs = (A/fs+B/fs)*fs = A+B**

```
//Fp32 + fp32 * * is not necessary in fact
static inline fp32 fp32_add(fp32 a, fp32 b) { return a + b; }
//Fp32 - fp32 * * is not really necessary
static inline fp32 fp32_sub(fp32 a, fp32 b) { return a - b; }
```

Then there is multiplication and division

Look at the code first. The difference is that it needs to be reduced by 2 after multiplication^{FBIT}After the division, we need to enlarge 2^{FBIT}。

```
//Fp32 * fp32 (64 bit secure computing)
static inline fp32 fp32_mul64(fp32 a, fp32 b)
{ return (((s64)a) * b) >> FP32_FBIT; }
//Fp32 / fp32 (64 bit secure operation) * B < 1 may still overflow
static inline fp32 fp32_div64(fp32 a, fp32 b)
{ return (((s64)a) << FP32_FBIT) / b; }
```

The derivation process of multiplying fixed point number a by fixed point number B is as follows

** A (x) B = (a*b)*fs = (A/fs)*(B/fs)*fs = (A*B)/fs**

The derivation process of dividing fixed-point number a by fixed-point number B is as follows

** A (÷) B = (a/b)*fs = (A/fs)/(B/fs)*fs = (A/B)*fs**

It’s easy to understand that a fixed number is a decimal multiplied by 2^{FBIT}So, if two fixed-point numbers are multiplied, two times^{FBIT}It’s accumulated, so you have to remove 2^{FBIT}。

Then there are some common functions:

```
//fp32^2
static inline fp32 fp32_pow2(fp32 a)
{ return (((s64)a) * a) >> FP32_FBIT; }
//The return result is U64
static inline u64 fp32_pow2_64(fp32 a)
{ return (((s64)a) * a) >> FP32_FBIT; }
static inline fp32 fp32_lerp(fp32 a, fp32 b, fp32 t)
{ return a + fp32_mul64(b - a, t); }
```

The next part of the math library will also include some common fixed-point functions, such as square root and trigonometric functions.

Here is only a small part of the list, other if you need to see the source code.