Why is C program as solid as a rock?

November 11, 2008 (single day ~ ~)

C language is a special programming language today. Only a very small number of people can really program with C, and a large part of us have their own views on C. Buffer overflow, stack overflow, integer data overflow, C has many well-known defects, which are spread by people at will, even those who are not familiar with C. I haven’t contacted C for 10 times, because of one reason or another. At first, compilers were expensive (before free UNIX was released) and slow, and the environment was terrible. Moreover, all the horror stories about C make me think how a small ordinary programmer like me can write reliable C programs.

Apart from some small C modules that I copied and pasted directly from other places, the first C program I wrote was converge VM. There are two things that amaze me: – O. First, writing C program is not so difficult. Later, I learned that I wasted time writing assembly code when I was young, which gave me great psychological support. After all, C is a higher-level assembly language. Once one understands a concept like a pointer, which is arguably the most subtle concept in low-level languages, because there is no corresponding metaphor in the real world. The second thing is that converge VM is not as buggy as I expected.

In fact, ignore the logic errors that may exist in any programming language. So far, only two errors that only focus on C have caused practical problems in the converge VM (idea, I’m sure there are many latent bugs, but I haven’t encountered too many). The first error is that a list does not\0This problem took a long time to debug. The other mistake was a lot more magical and took me months. The converge garbage collector can carefully reclaim memory space allocated at will based on pointers. In all present structures, the pointer refers to the boundary between the word and the word alignment. However, allocated memory blocks are often not word aligned in length (In all modern architectures, pointers have to live on word-aligned boundaries.However, malloc’d chunks of memory are often not word-aligned in length.)So sometimes the garbage collector will try to read 4 bytes at a memory block location of 4, even if that memory block is 5 bytes long. In other words, the garbage collector tries to read 1 bytes of a piece of data and 3 bytes of random data in memory that theoretically has no permission. Rarely and miraculously, the mistakes that result are almost impossible to explain. But it’s no exaggeration to say that in how many programming languages can one recursively add a garbage collector?

My experience with converge VM is not very consistent with my previous bias. I’ve come to admit that C programs will show segfault randomly, lose data, and often go to Lindisfarne like Vikings. By contrast, programs written in high-level languages report errors in normal logic and predictable patterns. Gradually, these problems in my daily use of these programs I can trust written in C, I have encountered. I don’t remember the last time there was a big problem with these programs. These will not crash, but will gracefully handle minor errors. Even though I’m extremely critical of these software (I’ve been using OpenBSD for 9 years, so there’s no better quality software) there are obvious reasons why it’s so reliable: it’s used by many people, and these people help us find bugs. The software has been developed for a long time, so there are bugs in previous versions. And, to be honest, only reasonably capable programmers prefer C. However, there is still a fundamental problem: why is the program written in C as solid as a rock?

After the dark days of writing papers, I recently did a little c programming. For those who haven’t written the C program for a long time, it’s useless to send an email properly. All these years, I have passedsshFor remote machinessendmailSending mail. This solves many problems (such as blacklist). In many networks, it also has problems (especially wireless network). An excessive network connection will be discarded. It’s a very annoying process to check whether an email is sent or not. So after going through its design, I’m going to write a simple toolset to pass it safelysshSend mail. The final program – extsmail – has more functions than I expected, but the basic idea is to use external commands, such assshSimply retry sending the message until it is sent successfully. I also want this toolset to be as resource efficient and portable as possible. This inevitably determines that extsmail should be written in C. Then I decided to try to write the program as much as possible, just as an experiment. In the traditional UNIX way, it only depends on the functions provided by the reliable UNIX distribution, and has strong fault tolerance. In the process of doing this, I made two observations about the novice who wrote the program in C.

The first observation is not obvious. Because the program written in C has numerous wrong ways, I am more careful than usual. In particular, the operation of any memory block can cause a very dangerous buffer overflow type error. However, in a high-level language, I might be lazy and think, “well, should I subtract one from this value when I index an array? Run and have a look. In C, I think “OK, sit down and think about the reason.”. Ironically, the time spent running programs and finding problems is not the same as the time spent sitting down to think, except that sitting down to think is more brain consuming.

The second observation, which I have never encountered before, is that there is no exception handling in C. If, for example, extsmail wants to improve its fault tolerance, it has to deal with all possible errors. On the one hand, this is very painful. Extsmail has a large proportion (about 40%) to check and remove errors, although UNIX system method has carefully dealt with the errors. In other words, when a method is called in C, for examplestatThe document lists all failures. Users can easily choose which situation should be fixed in the program and which fatal errors should be further handled (in extsmail, insufficient memory is fatal error). This is the huge difference in the way of exception handling based on language in thinking mode. The classic philosophy is to write code normally, only in a few casestry ... catchStatement block to handle specific errors (rarely encountered errors). Java, with controlled exceptions, tells the user in different ways, “when you call this method, you need totry catchSpecific exception “.

I’ve learned that exception based software design is not appropriate when you want the software to be strong enough (robust). What needs to be clear is to know the error or exception returned or thrown by a method, and then handle it according to the situation. In today’s IDE, you can automatically display which exceptions will be thrown according to the methods you write, and you can only do this at most. In theory, subclasses and polymorphisms in OO mean that precompiled libraries cannot determine whether they will throw exceptions based on the code they write (because subclasses will override methods and throw different errors). From a practical point of view, I suspect that so many methods will throw many different exceptions, which will confuse the user. When comparing UNIX methods, it is important to note that they minimize the number of errors returned to users, and some internal errors, or collection and classification errors. Furthermore, I suspect that many libraries that rely on exception handling need to be rewritten substantially to reduce the number of exceptions thrown to a reasonable value. Furthermore, the caller of the method should decide what errors should be secondary, recoverable, and which will lead to important problems, or even the end of the program. The controlled exception and the exception forced by the caller forget this.

“Those who don’t understand UNIX are doomed to wretchedly reinvent the wheel,” says Henry Spencer. That’s why so many programs written in C are stronger than the prejudices we put forward. UNIX culture, the oldest and wisest culture in the mainstream of computers, has found many ways to turn the limitations and defects of C into advantages. As I’ve been through, I’ve learned that slowly. To sum up, I don’t recommend using C without a lot of thinking. If C is used, the final software will be rock solid, but the development will cost a lot of manpower.