Programming is a craft. Be an excellent general

Time:2021-7-25

Programming is a craft. As mentioned in the experience sharing article of nginx community, professional programmers are good at overall design and detail processing. This paper discusses the skill of overall design, especially modularity.

Omnipotent genius, Fabrice bellard

Ffmpeg, the most powerful streaming media library
QEMU, hardware virtualized virtual machine
TCC, mini CC compiler
Quickjs, a C engine that 100% supports JS syntax
Wait, all the above are made by one person, French genius.
Last year, quickjs once blasted the technology circle. My friends in nginx community recommended it to me for the first time and called him a genius.
The software broadened my horizons. This article uses it as an introduction to what I think is a very important skill: how to organize code.

NJS, it’s hard to implement the language engine

I privately asked fabric bellard (mentioned patch to qjs) about the process of developing qjs. The answer was amazing. He only spent two years in his spare time. After participating in NJS for several years, I realized how complex it is to implement the language engine.
NJS started in 17 years and is now almost 40% complete. However, the foundation has been very good, and the subsequent function development will be much faster. And common functions have been supported. Among them, I support modularization, arrow functions, and other common functions. Ll (k) is introduced into language parsing.
Seems to have done some good work. However, compared with qjs, take billiards as an example. A long-distance and accurate player can score 90% of the ball, which seems very powerful. But for a very powerful person, he may only need 80% accuracy and good walking position to clear the platform easily.
Mentioning qjs is not self belittling. This comparison is very unfair. The author of qjs is a JS expert. He can use js to implement virtual machines. The people involved in NJS, including Igor, are not really JS syntax experts. JS syntax is really too huge. In our usual development process, there is a JS expert outside the community who is very helpful to us. It is simply a JS walking dictionary. Therefore, in the early stage, we can only rely on the grammar manual and then implement it. If some implementations are different from the essence of grammar, we have to start again. For example, the early implementation of the two grammars of apply and call really made people suffer a lot. This is also the first time I participated in it, because I fixed its bugs and did refactoring. Then I found that people in the community accepted this refactoring very much and felt like meeting a bosom friend.

Quickjs, 50000 lines of code, a file of software

I will explain that this approach is reasonable. At this point, it must be mentioned and explained in detail later.

Modularity, the best code organization

When I participated in NJS, the first thing was to make it support modular programming. When NJS came out, I began to pay attention. For a long time, writing code with NJS can only be placed in one file, which is very unfriendly to code organization. Let’s first look at the modular usage of JS:
main.js

/*Custom module*/
import foo from 'foo.js';
foo.inc();

/*Built in module*/
import crypto from 'crypto';
var h = crypto.createHash('md5');
var hash = h.update('AB').digest('hex');

foo.js

var state = {count:0}

function inc() {
    state.count++;
}

function get() {
    return state.count;
}

export default {inc, get}

After supporting modularization, it becomes very easy to use. This big function is also reviewed and adjusted by Igor, the author of nginx, who has gained a lot. Objectively speaking, JS syntax is much easier to use than Lua. At present, NJS is very stable, but there are not so many functions. It is recommended to consider using NJS for lightweight applications, and the community is very active. I believe it can be expected in the future.
Now take a peek at the source code of quickjs.

JSContext *JS_NewContext(JSRuntime *rt)
{
    JSContext *ctx;

    ctx = JS_NewContextRaw(rt);
    if (!ctx)
        return NULL;

    JS_AddIntrinsicBaseObjects(ctx);
    JS_AddIntrinsicDate(ctx);
    JS_AddIntrinsicEval(ctx);
    JS_AddIntrinsicStringNormalize(ctx);
    JS_AddIntrinsicRegExp(ctx);
    JS_AddIntrinsicJSON(ctx);
    JS_AddIntrinsicProxy(ctx);
    JS_AddIntrinsicMapSet(ctx);
    JS_AddIntrinsicTypedArrays(ctx);
    JS_AddIntrinsicPromise(ctx);
    return ctx;
}

void *JS_GetContextOpaque(JSContext *ctx)
{
    return ctx->user_opaque;
}

void JS_SetContextOpaque(JSContext *ctx, void *opaque)
{
    ctx->user_opaque = opaque;
}

All the source code is thrown into a file. I’ve seen a lot of software source code, which is relatively complete. Nginx, unit, NJS, Lua, etc. from a personal perspective, quickjs is the best. At first glance, it’s a bit messy, but if you look closely (you may need to be familiar with JS syntax), it’s an absolute master’s work.
If you want to delete a syntax function, you can delete it continuously from one line to another in quickjs. This is impossible in other software. Either multiple files should be deleted, or multiple different places should be deleted in a file. I think this is the essence of modularity: high cohesion.
Students who have learned design principles must know that software should be highly cohesive and low coupling. My understanding is that as long as high cohesion is achieved, low coupling is a natural thing.
For example, to implement nginx Lua module. There are two important functions: nginx module correlation function and Lua encapsulation correlation function.
Over design mode:

ngx_http_lua_module.c
/*Nginx module correlation function*/

ngx_http_lua_request.c
/*Lua encapsulating related functions*/

Reasonable way

ngx_http_lua_module.c
/*Nginx module correlation function*/
/*Lua encapsulating related functions*/

https://github.com/hongzhidao…
Over design is an easy trap to step into.
Discussion 1:
What if there are more functions, such as HTTP subrequest?
It is recommended to put it in the same file instead of being affected by the number of lines of code.
Discussion 2:
There are more functions, such as HTTP share memory. What should I do when this function comes in?
It can be considered to be independent of another document. The principle is to find a convincing reason that the new function can be independent into a highly cohesive module. One feature is that it often has special APIs, such as get and set for shared memory operations.
On the other hand, the introduction of a file itself is also a cost, and it is higher than the function level. Each refactoring should bring substantial value. This is why I insist on putting the same file as much as possible. I made several suggestions earlier to do similar things for NJS, but later it turned out that some of them were over designed. And some are right, such as NJS_ VM. C is divided into NJS_ VM. C and NJS_ vmcode.c。 One is responsible for the virtual machine and the other is responsible for bytecode processing.

To sum up:
High cohesion is the highest criterion.
The cost of introducing a new file is higher than that of a function, and it needs to be of real value.
Don’t be affected by the number of lines of code.
Cooperation is only a division of labor and cannot be used as a reason to destroy high cohesion.

Talk about design again

As mentioned earlier, the code quality of quickjs is very high because its design is admirable. The code lines of the whole qjs are less than 50000, and 100% syntax is realized, including large numbers and regular with very hard core, all of which are made by themselves. From the implementation of the whole engine, it is highly abstract, and the algorithm is very simple and effective. For example, the attribute operation of objects in JS should be the most commonly used, such as a [‘name ‘]. Both a and name are strings during syntax parsing. The term is token. Qjs is implemented with a very efficient hash, including all strings used by JS, and there is little code.

typedef struct JSShapeProperty {
    uint32_t hash_next : 26; /* 0 if last in list */
    uint32_t flags : 6;   /* JS_PROP_XXX */
    JSAtom atom; /* JS_ATOM_NULL = free property entry */
} JSShapeProperty;

struct JSShape {
    uint32_t prop_hash_end[0]; /* hash table of size hash_mask + 1
                                  before the start of the structure. */
    JSGCObjectHeader header;
    /* true if the shape is inserted in the shape hash table. If not,
       JSShape.hash is not valid */
    uint8_t is_hashed;
    /* If true, the shape may have small array index properties 'n' with 0
       <= n <= 2^31-1. If false, the shape is guaranteed not to have
       small array index properties */
    uint8_t has_small_array_index;
    uint32_t hash; /* current hash value */
    uint32_t prop_hash_mask;
    int prop_size; /* allocated properties */
    int prop_count;
    JSShape *shape_hash_next; /* in JSRuntime.shape_hash[h] list */
    JSObject *proto;
    JSShapeProperty prop[0]; /* prop_size elements */
};

The pointer also uses negative operation. He is an expert in mathematics.
Why can’t NJS do this? Dependency, and the details refer to each other. What can’t be done in software development.
Also take playing as an example. Those players who are very old-fashioned in position and force often play simple and effective. Don’t wonder why some balls don’t hit in first, but choose to play harder. Everything is under control.

Design over implementation

This is my great experience in the past two years. In the past, I thought that with this design effort, I had already realized things, and I thought that refactoring could solve all the design deficiencies. That’s right. The problem is to spend more time on detours.
write some code, think, write more, meditate, write a meaningful commit log, take a sleep, think again, and re-read, split/fold/re-write, think, become happy with the final result.
The above suggestions are given by the person in charge of unit. Personally, I think this is a feasible and effective way. The http2 implementation of nginx was written by him. By the way, nginx’s http3 is about to be completed.

There are ways to be feasible

This series of articles will have practical methods. Practice is a very effective way for students who want to improve their code. I personally think learning or writing projects is a way.
Utopia is an API gateway framework I wrote, with only a thousand lines of code. Some of the designs refer to unit, especially the routing part. I understand their design process, very excellent. This is a very suitable project for learning.
Design can talk too much, far more than one article can be finished, and will continue to be mixed in other chapters in the future.

[nginx-lua-module] https://github.com/hongzhidao…
[the-craft-of-programming] https://github.com/hongzhidao…
Technical issues are welcome to communicate in the issue
[utopia] https://github.com/hongzhidao…
Not open source, pay attention to official account and update in time.
Programming is a craft. Be an excellent general

Recommended Today

Hot! Front and rear learning routes of GitHub target 144K

Hello, Sifu’s little friend. I’m silent Wang Er. Last week, while appreciating teacher Ruan Yifeng’s science and technology weekly, I found a powerful learning route, which has been marked with 144K on GitHub. It’s very popular. It covers not only the front-end and back-end learning routes, but also the operation and maintenance learning routes. As […]