On the efficient learning methods of C language programming, the primary task is to master efficient programming, followed by code optimization!


In this article, I have collected a lot of experience and methods. Applying these experiences and methods can help us optimize C language code from the aspects of execution speed and memory use.

brief introduction

In a recent project, we need to develop a lightweight JPEG library that runs on mobile devices but does not guarantee high image quality. During this period, I summarized some methods to make the program run faster.

In this article, I collected some experience and methods. Applying these experiences and methods can help us optimize C language code from the aspects of execution speed and memory use.

Although there are many guidelines for C code optimization, there is little optimization knowledge about compilation and the programming machine you use.

Usually, in order to make your program run faster, the amount of code in your program may need to be increased. The increase of code may have an adverse impact on the complexity and readability of the program.

This is not allowed when writing programs on small devices such as mobile phones and PDAs that have many restrictions on memory use. Therefore, in code optimization, our motto should be to ensure that both memory usage and execution speed are optimized.


In fact, in my project, I used many methods to optimize arm programming (the project is based on ARM platform) and many methods on the Internet. But not all the methods mentioned in the article can play a good role.

Therefore, I summarized and collected useful and efficient methods. At the same time, I also modified some of these methods to make them suitable for all programming environments, not limited to arm environment.

Where do you need to use these methods?

Without this, all discussions are out of the question. The most important thing of program optimization is to find out where to optimize, that is, to find out which parts or modules of the program run slowly or consume a lot of memory. Only when all parts of the program are optimized can the program execute faster.

The most running parts of the program, especially those methods repeatedly called by the internal loop of the program, should be optimized.

For an experienced coder, it is often very simple to find the part of the program that needs to be optimized most. In addition, there are many tools that can help us find the parts that need to be optimized. I have used the visual c + + built-in performance tool profiler to find out where the program consumes the most memory.

Another tool I have used is Intel’s VTune, which can also detect the slowest part of the program. According to my experience, internal or nested loops, calling third-party library methods is usually the main cause of program slowness.

Integer number

If we determine that the integer is nonnegative, we should use unsigned int instead of int. Some processors process unsigned integer numbers much more efficiently than signed integer numbers (this is a good practice and is conducive to the self interpretation of specific types of code).

Therefore, the best way to declare an int shaped variable in a tight loop is:

register unsigned int variable_name;

Remember, the operation speed of shaping in is high, floating-point float, and the operation can be completed directly by the processor without the help of FPU (floating-point operation unit) or floating-point operation library.

Although this does not guarantee that the compiler will use register storage variables, nor does it guarantee that the processor can handle unsigned integers more efficiently, it is common to all compilers.

For example, in a calculation package, if the result needs to be accurate to two decimal places, we can multiply it by 100, and then convert it to floating-point numbers as late as possible.

Division and remainder

In standard processors, a 32-bit division requires 20 to 140 cycles for numerators and denominators. The time consumed by the division function includes a constant time plus the time consumed by each bit of division.

Time (numerator / denominator) = C0 + C1* log2 (numerator / denominator)     = C0 + C1 * (log2 (numerator) - log2 (denominator)).

For ARM processors, this version requires 20 + 4.3n cycles. This is a very expensive operation and should be avoided as much as possible. Sometimes, division can be replaced by multiplication expressions.

For example, if we know that B is a positive number and BC is an integer, then (A / b) > C can be rewritten as a > (CB). If it is determined that the operand is unsigned, it is better to use unsigned division because it is more efficient than signed division.

Merge division and remainder

In some scenarios, both division (x / y) and remainder (x% y) operations are required. In this case, the compiler can return the result and remainder of division by calling a division operation. If we need both the result of division and the remainder, we can write them together as follows:

int func_div_and_mod (int a, int b) {             return (a / b) + (a % b);    }

Divide and remainder by the power of 2

If the divisor in the division is a power of 2, we can better optimize the division. The compiler uses shift operations to perform division. Therefore, we need to set the divisor to the power of 2 as much as possible (for example, 64 instead of 66). And still remember, unsigned integer division is more efficient than signed integer division.

typedef unsigned int uint;uint div32u (uint a) {     return a / 32;}int div32s (int a){    return a / 32;}

Both of the above division methods avoid calling the division function directly, and unsigned division uses fewer computer instructions. Signed division takes more time to execute due to the need to shift to 0 and negative numbers.

An alternative method of taking mold

We use the remainder operator to provide arithmetic modulus. But sometimes you can use if statements in conjunction with modulo operations. Consider two examples:

uint modulo_func1 (uint count){    return (++count % 60);}uint modulo_func2 (uint count){    if (++count >= 60)        count = 0;    return (count);}

If statements are preferred over remainder operators because if statements execute faster. Note here that the new version function works correctly only when we know the input count balance of 0 to 59.

Use array subscript

If you want to set a character value representing a meaning to a variable, you might do this:

switch ( queue ) {    case 0 :   letter = 'W';           break;    case 1 :   letter = 'S';           break;    case 2 :   letter = 'U';           break;}

Or do this:

if ( queue == 0 )      letter = 'W';else if ( queue == 1 )      letter = 'S';else  letter = 'U';

A simpler and faster way is to use array subscripts to get the value of the character array. As follows:

static char *classes="WSU"; letter = classes[queue];

global variable

Global variables are never in registers. Using pointers or function calls, you can directly modify the value of global variables. Therefore, the compiler cannot cache the values of global variables in registers, but this requires additional (often unnecessary) reading and storage when using global variables. Therefore, we do not recommend using global variables in important loops.

If the function uses too many global variables, it is better to copy the value of the global variable to the local variable so that it can be stored in the register. This method only applies to global variables and will not be used by any function we call. Examples are as follows:

int f(void);int g(void);int errs;void test1(void){      errs += f();      errs += g();} void test2(void){      int localerrs = errs;      localerrs += f();      localerrs += g();      errs = localerrs;}

Note that test1 must load and store the value of the global variable errs at each increment operation, while test2 stores localerrs in registers and requires only one computer instruction.

Use alias

Consider the following examples:

void func1( int *data ){        int i;         for(i=0; i<10; i++)        {                  anyfunc( *data, i);        }}

Although the value of * data may never be changed, the compiler does not know that the anyfunc function will not modify it, so the program must read it from memory every time it is used. If we know that the value of the variable will not be changed, we should use the following encoding:

void func1( int *data ){        int i;        int localdata;         localdata = *data;        for(i=0; i<10; i++)        {                  anyfunc (localdata, i);        }}

This provides conditions for the compiler to optimize the code.

Life cycle segmentation of variables

Because the registers in the processor are of fixed length, the storage of digital variables in the registers in the program is limited.

Some compilers support “live range splitting”, that is, variables can be allocated to different registers or memory in different parts of the program. The life cycle of a variable begins with the last assignment to it and ends with the last use before the next assignment.

During the life cycle, the value of the variable is valid, that is, the variable is alive. Between different life cycles, the value of the variable is not needed, that is, the variable is dead. In this way, registers can be used by other variables, allowing the compiler to allocate more variables to use registers.

The number of variables that need to be allocated using registers needs to exceed the number of different variable lifecycles in the function. If the number of different variable lifecycles exceeds the number of registers, some variables must be temporarily stored in memory. This process is called segmentation.

The compiler first splits the recently used variables to reduce the consumption caused by segmentation. The methods to prohibit variable life cycle segmentation are as follows:

  • Limit the number of variables used: this can be achieved by keeping the expression in the function simple and compact without using too many variables. Splitting large functions into small and simple functions will also achieve good results.

  • Register storage is used for frequently used variables: This allows us to tell the compiler that the variable needs to be used frequently, so it needs to be stored in the register first. However, in some cases, such variables may still be divided into registers.

Variable type

The C compiler supports basic types: char, short, int, long (including signed and unsigned), float, and double. Using the correct variable type is crucial because it can reduce the size of code and data and significantly increase the performance of your program.

local variable

We should try not to use local variables of type char and short. For char and short types, the compiler needs to reduce the local variable to 8 or 16 bits each time it is assigned.

This is called signed extension for signed variables and zero extension for unsigned variables. These extensions can be realized by shifting the register by 24 or 16 bits to the left and then shifting the same bits to the right according to the signed or unsigned flag, which will consume two computer instruction operations (zero extension of unsigned char type only needs to consume one computer instruction).

Such shift operations can be avoided by using local variables of type int and unsigned int. This is very important for operations such as loading data into local variables first and then processing local variable data values. Whether the input and output data is 8 bits or 16 bits, it is worth considering them as 32 bits.

Consider the following three functions:

int wordinc (int a){       return a + 1;}short shortinc (short a){        return a + 1;}char charinc (char a){        return a + 1;}

Although the results are the same, the first program segment runs faster than the latter two.


We should use the reference value to pass the structure data as much as possible, that is, use the pointer, otherwise the passed data will be copied to the stack, so as to reduce the performance of the program. I’ve seen a program pass very large structural data by passing values, and then this can be better done through a simple pointer.

The function accepts the pointer of the structure data through the parameter. If we are sure not to change the value of the data, we need to define the content pointed to by the pointer as a constant. For example:

void print_data_of_a_structure (const Thestruct  *data_pointer){        ...printf contents of the structure...}

This example tells the compiler that the function does not change the value of external parameters (using const decoration) and does not have to be read every time it is accessed. At the same time, ensure that the compiler limits any modification to the read-only structure, so as to give additional protection to the structure data.

Pointer chain

Pointer chains are often used to access structural data. For example, common codes are as follows:

typedef struct { int x, y, z; } Point3;typedef struct { Point3 *pos, *direction; } Object; void InitPos1(Object *p){   p->pos->x = 0;   p->pos->y = 0;   p->pos->z = 0;}

However, this kind of code must call p – > POS repeatedly at each operation, because the compiler does not know that P – > pos – > x is the same as p – > POS. A better way is to cache p – > POS to a local variable:

void InitPos2(Object *p){   Point3 *pos = p->pos;   pos->x = 0;   pos->y = 0;   pos->z = 0;}

Another method is to directly include point3 data in the object structure, which can completely eliminate the pointer operation on point3.

Conditional execution

Conditional execution statements are mostly used in if statements, as well as when using relational operators (etc.) or Boolean expressions (& &,! Etc.) to calculate complex expressions. For code fragments containing function calls, conditional execution is invalid because the return value of the function will be destroyed.

Therefore, it is beneficial to keep the if and else statements as simple as possible, because the compiler can focus on them. Relational expressions should be written together.

The following example shows how the compiler uses conditional execution:

int g(int a, int b, int c, int d){   if (a > 0 && b > 0 && c    //  grouped conditions tied up together//      return a + b + c + d;   return -1;}

Because conditions are clustered together, the compiler can focus on them.

Boolean expressions and range checking

A common Boolean expression is used to judge whether a variable is within a certain range. For example, check whether a graphic coordinate is within a window:

bool PointInRectangelArea (Point p, Rectangle *r){   return (p.x >= r->xmin && p.x xmax &&                      p.y >= r->ymin && p.y ymax);}

Here’s a faster way: x > min & & X

bool PointInRectangelArea (Point p, Rectangle *r){    return ((unsigned) (p.x - r->xmin) xmax &&   (unsigned) (p.y - r->ymin) ymax); }

Boolean expression and zero value comparison

The flag bit of the processor is set after the comparison instruction operation. Flag bits can also be rewritten by basic arithmetic and bare metal instructions such as MOV, add, and, Mul, etc. If the data instruction sets the flag bit, the N and Z flag bits will also be set as the result is compared with 0. The N flag indicates whether the result is negative, and the Z flag indicates whether the result is 0.

In C language, the N and Z flag bits in the processor are associated with the following instructions: signed relational operation x < 0, x > = 0, x = = 0, X= 0 Unsigned relational operation x = = 0, X= 0 (or x > 0).

The compiler will issue a comparison instruction every time a relational operator is called in C code. If the operator is mentioned above, the compiler optimizes the comparison instruction. For example:

int aFunction(int x, int y){   if (x + y       return 1;  else     return 0;}


Use the above judgment method as much as possible, which can reduce the call of comparison instructions in key loops, reduce the code volume and improve the code performance. C language has no concepts of borrow and overflow, so it is impossible to directly use borrow flag C and overflow flag V without the help of assembly. However, the compiler supports borrow (unsigned overflow), for example:


int sum(int x, int y){   int res;   res = x + y;   if ((unsigned) res      res++;   return res;}

Lazy detection development

In statements such as if (a > 10 & & B = 4), ensure that the first part of the and expression gives the result (or the earliest and fastest calculation) as soon as possible, so that the second part may not need to be executed.

Replace if… Else with switch() function

For multi conditional judgment involving if… Else… Else… For example:

if( val == 1)    dostuff1();else if (val == 2)    dostuff2();else if (val == 3)    dostuff3();

Using switch may be faster:

switch( val ){    case 1: dostuff1(); break;    case 2: dostuff2(); break;    case 3: dostuff3(); break;}


In the if () statement, if the last statement hits, the previous conditions need to be tested and executed once. Switch allows us not to do additional tests. Put the elif… Statement first, if possible.

Binary interrupt

Interrupt the code in a binary manner instead of stacking the code in a column. Do not do this as follows:

if(a==1) {} else if(a==2) {} else if(a==3) {} else if(a==4) {} else if(a==5) {} else if(a==6) {} else if(a==7) {} else if(a==8){}

Replace it with the following dichotomy, as follows:

if(a<=4) {    if(a==1)     {    }  else if(a==2)  {    }  else if(a==3)  {    }  else if(a==4)   {    }}else{    if(a==5)  {    } else if(a==6)   {    } else if(a==7)  {    } else if(a==8)  {    }}

Or as follows:

if(a<=4){    if(a<=2)    {        if(a==1)        {            /* a is 1 */        }        else        {            /* a must be 2 */        }    }    else    {        if(a==3)        {            /* a is 3 */        }        else        {            /* a must be 4 */        }    }}else{    if(a<=6)    {        if(a==5)        {            /* a is 5 */        }        else        {            /* a must be 6 */        }    }    else    {        if(a==7)        {            /* a is 7 */        }        else        {            /* a must be 8 */        }    }}

Compare the following two case statements:

On the efficient learning methods of C language programming, the primary task is to master efficient programming, followed by code optimization!

Switch statement vs lookup table

The application scenarios of switch are as follows:

  • Call one or more functions

  • Set a variable value or return a value

  • Execute one or more code snippets

If there are many case tags, in the first two use scenarios of switch, the use of lookup table can be completed more efficiently. For example, there are two ways to convert strings:

char * Condition_String1(int condition) {  switch(condition) {     case 0: return "EQ";     case 1: return "NE";     case 2: return "CS";     case 3: return "CC";     case 4: return "MI";     case 5: return "PL";     case 6: return "VS";     case 7: return "VC";     case 8: return "HI";     case 9: return "LS";     case 10: return "GE";     case 11: return "LT";     case 12: return "GT";     case 13: return "LE";     case 14: return "";     default: return 0;  }} char * Condition_String2(int condition) {   if ((unsigned) condition >= 15) return 0;      return      "EQ\0NE\0CS\0CC\0MI\0PL\0VS\0VC\0HI\0LS\0GE\0LT\0GT\0LE\0\0" +       3 * condition;}

The first program requires 240 bytes, while the second requires only 72 bytes.


Loop is a common structure in most programs; Most of the execution time of the program occurs in the loop, so it is well worth working on the loop execution time.


Cycle termination

If you don’t pay attention, the writing of loop termination conditions will lead to additional burden. We should use loops that count to zero and simple loop termination conditions. Simple termination conditions consume less time. Look at the following calculation n! Two procedures. The first implementation uses an increasing loop, and the second implementation uses a decreasing loop.


int fact1_func (int n){    int i, fact = 1;    for (i = 1; i <= n; i++)      fact *= i;    return (fact);} int fact2_func(int n){    int i, fact = 1;    for (i = n; i != 0; i--)       fact *= i;    return (fact);}

Fact2 of the second program_ Func is more efficient than the first one.


Faster for loop

This is a simple and efficient concept. Generally, we write the for loop code as follows:

for( i=0;  i<10;  i++){ ... }

I cycles from 0 to 9. If we don’t mind the order of loop counting, we can write this:

for( i=10; i--; ) { ... }

The reason for this is that it can process the value of I faster – the test condition is: is I non-zero? If so, decrement the value of I. For the above code, the processor needs to calculate “calculate I minus 10, is its value non negative? If it is non negative, I increments and continues”. Simple cycles are very different. In this way, I decreases from 9 to 0, and such a loop executes faster.

The grammar here is a little strange, but it is indeed legal. The third statement in the loop is optional (an infinite loop can be written as for (;)). The following code has the same effect:

for(i=10; i; i--){}

Or further:

for(i=10; i!=0; i--){}


What we need to remember here is that the loop must end at 0 (so if you loop between 50 and 80, this will not work), and the loop counter is decremented. Code that uses an up loop counter does not enjoy this optimization.


Merge cycle

If one cycle can solve the problem, we must not use two. But if you need to do a lot of work in the loop, this pit is not suitable for the instruction cache of the processor. In this case, two separate loops may execute faster than a single loop. Here is an example:

On the efficient learning methods of C language programming, the primary task is to master efficient programming, followed by code optimization!


Function loop

There is always a performance cost when calling a function. Not only does the program pointer need to be changed, but also the variables used need to be stacked and new variables allocated. In order to improve the performance of the program, there are many functions that can be optimized. While maintaining the readability of the program code, the size of the code is controllable.

If a function is often called in a loop, the loop is included in the function, which can reduce repeated function calls. The code is as follows:

for(i=0 ; i<100 ; i++){    func(t,i);}---void func(int w,d){    lots of stuff.}

Should read:

func(t);---void func(w){    for(i=0 ; i<100 ; i++)    {        //lots of stuff.    }}


Loop expansion

Simple loops can be unrolled for better performance, but at the cost of increased code size. After the loop is expanded, the loop count should be smaller and smaller to take fewer code branches. If the number of iterations of the loop is only a few times, the loop can be fully expanded to eliminate the burden of bad circulation.

This will make a big difference. Loop unrolling can bring significant performance savings because the code does not need to check and increase the value of I every loop. For example:

On the efficient learning methods of C language programming, the primary task is to master efficient programming, followed by code optimization!

The compiler usually expands a simple loop with a fixed number of iterations as above. But like the following code:


The following code (example 1) is significantly longer than using a loop, but it is more efficient. The value of block Sie is set to 8, which is only suitable for testing purposes. As long as we repeat “loop contents” the same number of times, it will have a good effect.

In this example, the loop condition is checked every 8 iterations instead of every time. Because the number of iterations is unknown, it will not be expanded. Therefore, expanding the loop as much as possible can enable us to achieve better execution speed.

//Example 1 #include #define BLOCKSIZE (8) void main(void){int i = 0;int limit = 33;  /* could be anything */int blocklimit; /* The limit may not be divisible by BLOCKSIZE, * go as near as we can first, then tidy up. */blocklimit = (limit / BLOCKSIZE) * BLOCKSIZE; /* unroll the loop in blocks of 8 */while( i {    printf("process(%d)\n", i);    printf("process(%d)\n", i+1);    printf("process(%d)\n", i+2);    printf("process(%d)\n", i+3);    printf("process(%d)\n", i+4);    printf("process(%d)\n", i+5);    printf("process(%d)\n", i+6);    printf("process(%d)\n", i+7);     /* update the counter */    i += 8; } /* * There may be some left to do. * This could be done as a simple for() loop, * but a switch is faster (and more interesting) */ if( i {    /* Jump into the case at the place that will allow     * us to finish off the appropriate number of items.     */     switch( limit - i )    {        case 7 : printf("process(%d)\n", i); i++;        case 6 : printf("process(%d)\n", i); i++;        case 5 : printf("process(%d)\n", i); i++;        case 4 : printf("process(%d)\n", i); i++;        case 3 : printf("process(%d)\n", i); i++;        case 2 : printf("process(%d)\n", i); i++;        case 1 : printf("process(%d)\n", i);    }} }


Count the number of non-zero bits

By constantly moving to the left, extracting and counting the lowest bit, sample program 1 efficiently checks several non-zero bits in an array. Example program 2 is expanded in a loop four times, and then the code is optimized by combining the four shifts into one. Frequent loop expansion can provide many optimization opportunities.

//Example - 1int countbit1(uint n){  int bits = 0;  while (n != 0)  {    if (n & 1) bits++;    n >>= 1;   }  return bits;}//Example - 2int countbit2(uint n){   int bits = 0;   while (n != 0)   {      if (n & 1) bits++;      if (n & 2) bits++;      if (n & 4) bits++;      if (n & 8) bits++;      n >>= 4;   }   return bits;}


Disconnect the cycle as early as possible

In general, loops do not need to be executed in their entirety. For example, if we are looking for a special value from the array, once found, we should break the loop as early as possible. For example, the following loop looks for the existence of – 99 from 10000 integers.


found = FALSE;for(i=0;i<10000;i++){    if( list[i] == -99 )    {        found = TRUE;    }} if( found )     printf("Yes, there is a -99. Hooray!\n");


The above code works normally, but the loop needs to be fully executed, whether we have found it or not. A better way is to stop and continue the query once we find the number we are looking for.


found = FALSE;for(i=0; i<10000; i++){    if( list[i] == -99 )    {        found = TRUE;        break;    }}if( found )     printf("Yes, there is a -99. Hooray!\n");

If the data to be checked is in the 23rd position, the program will execute 23 times, saving 9977 cycles.

Function design

It’s a good habit to design small and simple functions. This allows the register to perform some optimization such as register variable application, which is very efficient.


Performance consumption of function calls

The performance consumption of function call for processor is very small, which only accounts for a small part of the performance consumption in function execution. There are certain restrictions on the parameters passed into the function variable register. These parameters must be integer compatible (char, shorts, ints and floats all occupy one word) or less than four word size (including double and long lengths occupying two words).

If the number of parameters is limited to 4, the fifth and subsequent words are stored on the stack. This means that when calling a function, you need to load parameters from the stack, thus increasing the consumption of storage and reading.

Look at the following code:

int f1(int a, int b, int c, int d) {   return a + b + c + d;} int g1(void) {   return f1(1, 2, 3, 4);} int f2(int a, int b, int c, int d, int e, int f) {  return a + b + c + d + e + f;} ing g2(void) { return f2(1, 2, 3, 4, 5, 6);}

The fifth and sixth parameters in function G2 are stored on the stack and loaded in function F2, which will consume two more parameters.


Reduce function parameter transfer consumption

Methods to reduce the consumption of function parameter transfer include:

  • Try to ensure that the function uses less than four parameters. In this way, the stack is not used to store parameter values.

  • If the function requires more than four parameters, try to ensure that the value of using the latter parameters is higher than the cost of storing them on the stack.

  • Pass a reference to a parameter through a pointer rather than the parameter structure itself.

  • Putting parameters into a structure and passing them into a function through a pointer can reduce the number of parameters and improve readability.

  • Minimize the use of long type parameters that take up two word sizes. For programs that need floating-point type, double should be used as little as possible because it occupies two word size.

  • Avoid that function parameters exist in both registers and registers (called parameter splitting). Today’s compilers are not efficient enough to deal with this situation: all register variables will also be put on the stack.

  • Avoid changing parameters. The variable parameter function puts all the parameters on the stack.



Leaf function

A function that does not call any function is called a leaf function. In the following applications, nearly half of the function calls are to call leaf functions. Since there is no need to store and read register variables, leaf functions are very efficient on any platform.

The performance cost of reading variables is very small compared with that of using four registers. So write the frequently called functions as leaf functions as far as possible. The number of function calls can be checked by some tools.

Here are some ways to compile a function into a leaf function:

  • Avoid calling other functions: including those that call the C library instead (such as division or floating-point operator functions).

  • For short functions, use__ Inline modifier ().

Inline function

Inline functions disable all compilation options. Use__ The inline modifier function causes the function to be directly replaced with the function body at the call. In this way, the code calls the function faster, but increases the size of the code, especially when the function itself is large and often called.

__inline int square(int x) {   return x * x;} #include  double length(int x, int y){    return sqrt(square(x) + square(y));}

The benefits of using inline functions are as follows:

  • No function call burden. The function call is directly replaced by the function body, so there is no performance consumption such as reading register variables.

  • Less parameter passing consumption. Since there is no need to copy variables, the consumption of passing parameters is less. If the parameter is constant, the compiler can provide better optimization.

The drawback of inline functions is that if there are many places to call, the volume of code will become very large. It depends on the size of the function itself and the number of calls.

It is wise to use inline only for important functions. If used properly, inline functions can even reduce the volume of code: function calls produce some computer instructions, but using an inline optimized version may produce fewer computer instructions.

Using lookup tables

Functions can usually be designed as lookup tables, which can significantly improve performance. The accuracy of the lookup table is lower than that of the usual calculation, but there is no difference for the general program.

Many signal processing programs (e.g., modem demodulation software) use many sin and COS functions that are very computationally expensive. For real-time systems, accuracy is not particularly important, and sin and COS lookup tables may be more appropriate. When using a lookup table, put similar operations into the lookup table as much as possible, which is faster and saves storage space than using multiple lookup tables.

Floating point operation

Although floating-point operation is time-consuming for all processors, we still need to use it when implementing signal processing software. When writing floating-point operation programs, keep the following points in mind:

  • Floating point division is slow. Floating point division is twice as slow as addition or multiplication. Convert division to multiplication by using constants (for example, x = x / 3.0 can be replaced by x = x * (1.0 / 3.0)). The division of constants is evaluated during compilation.

  • Use float instead of double. Float type variables consume better memory and registers and are more efficient due to low precision. If the accuracy is sufficient, use float as much as possible.

  • Avoid using a priori functions. A priori functions, such as sin, exp and log, are implemented through a series of multiplication and addition (using precision extension). These operations are at least ten times slower than normal multiplication.

  • Simplify floating-point expressions. The compiler does not apply optimizations applied to integer operations to floating-point operations. For example, 3 * (x / 3) can be optimized to x, and floating-point operations lose precision. Therefore, if you know the result is correct, it is necessary to perform the necessary manual floating-point optimization.

However, the performance of floating-point operations may not meet the performance requirements of specific software. In this case, the best way may be to use fixed-point arithmetic. When the range of values is small enough, fixed-point arithmetic operations are more accurate and faster than floating-point operations.

Other skills

Usually, you can use space for time. If you can cache frequently used data instead of recalculating, it can be accessed faster. For example, sine and cosine lookup tables, or pseudo-random numbers.

  • Try not to use + + and – in loops. For example: while (n –) {}, which is sometimes difficult to optimize.

  • Reduce the use of global variables.

  • Unless declared as a global variable, use static to modify the variable for access within the file.

  • Use a word size variable (int, long, etc.) as much as possible. Using them (instead of char, short, double, bit field, etc.) the machine may run faster.

  • Recursion is not used. Recursion may be elegant and simple, but it requires too many function calls.

  • Without using the sqrt square function in the loop, calculating the square root is very performance consuming.

  • One dimensional arrays are faster than multidimensional arrays.

  • The compiler can optimize in one file – avoid splitting related functions into different files. If they are put together, the compiler can handle them better (for example, inline can be used).

  • Single precision functions are faster than double precision functions.

  • Floating point multiplication is faster than floating point division – use Val * 0.5 instead of Val / 2.0.

  • Addition is faster than multiplication – use Val + Val + Val instead of Val * 3.

  • The put () function is faster than printf (), but not flexible.

  • Use #define macros instead of commonly used small functions.

  • Binary / unformatted file access is faster than formatted file access because the program does not need to convert between human readable ASCII and machine-readable binary. If you don’t need to read the contents of the file, save it as binary.

  • If your library supports the mallopt () function (used to control malloc), try to use it. The setting of maxfast greatly improves the performance of functions that call malloc many times. If a structure needs to be created and destroyed more than once in a second, try setting the mallopt option.

Last but not least – turn the compiler optimization option on! It seems obvious, but it is often forgotten when the product is launched. The compiler can optimize the code at a lower level and perform specific optimization processing for the target processor.




If youFor C / C++be interested,If you want to learn programming, Xiaobian recommends oneC/C++Technical exchange group[Click to enter]!


An active, high-level and high-level programming learning hall; The introduction to programming is only incidental, and the improvement of thinking is valuable!


InvolvingIntroduction to programming, game programming, network programmingWindowsProgrammingLinuxProgrammingQtInterface development, hackers, etc……