Recently, someone asked me how to study image processing, how to get started and how to apply it. I was speechless for a moment. Think about it carefully. I have also done two years of image research, done two innovative projects and issued two papers. It is also a little experience, so I summarize and share with you, hoping to be helpful to you.
Before writing this tutorial, I wanted to get more illustrations to make the article look fancy. Later, I didn’t think it was necessary to do so. It’s not bad for everyone to take time to sink down and read the text. Moreover, academic and technology itself are not so fancy.
1、 Application of image processing
In fact, there is nothing to say. The application value of a technology does not depend on words, but depends on how many people do it. It is a very simple truth. In fact, I think the simplest and most effective way to judge whether a technology is valuable and how valuable it is is is to see how many people are studying it. If everyone is studying, it must mean that it is very hot, at least at present, and it will still be hot in the next few years. Therefore, if you are not sure whether image processing is valuable, check the number of image processing engineers in the country.
Of course, here is a brief mention. If you really just want to ask, “what’s the use of image processing?”, I believe Baidu will give a more professional answer than me. However, as an expert in image processing, I’d like to talk about it from several basic points.
1) Identity authentication
The 21st century is the era of face brushing, which is indisputable. The first is the bank. It is said that the bank in Chongqing has used the face recognition verification system for auxiliary authentication.
The second is the access control system, which used to be fingerprint, iris and now face. Although the identification of fingerprint and iris is accurate, it is invasive. It is invasive in the process of collection and in the process of verification. Anyway, anyone who records fingerprints every day (collects fingerprint information) and stares at the camera with his eyes (collects iris information) will feel uncomfortable, and his hands will peel off.
In contrast, face recognition is much more convenient. No one will mind taking a picture (collecting face information). Finally, monitoring. The monitoring taken by a camera can record hundreds of people from different angles (such as the monitoring of dense places such as stations). It will be a huge project for the police to identify. If the system can automatically identify personnel information, it will undoubtedly bring great convenience to handling cases.
2) Monitoring security
Security monitoring can be said to be the most potential application field in the field of image processing. Now every city is frantically installing surveillance cameras, and countless cameras are recording surveillance all over the country, but the back-end processing of security surveillance has not kept up.
What is the back-end processing? In short, it is the video processing of surveillance video. Note that the video processing here includes not only face recognition, but also pedestrian detection, anomaly detection, saliency detection, collaborative tracking, etc. Before face recognition, let’s briefly talk about pedestrian anomaly detection.
Pedestrian anomaly detection in image processing is a very magical thing in the eyes of laymen. After all, it seems impossible for the camera to judge who is good and who is bad in the current picture through the surveillance video (of course, it is too arbitrary to directly divide into good and bad people). However, please do not ignore the fact that at present, most of the analysis and processing of surveillance videos are completed manually. When solving a case, the police often call out the surveillance videos in recent days, and then see the end from the beginning. It is conceivable that the amount of work is large. It is precisely in this way that people have spawned the research on intelligent monitoring, because there is a practical demand in this regard. Of course, our video analysis program will not directly give arbitrary and one-sided judgment results such as “good man or bad man”.
In terms of the current technical level, it is enough to count the number of people in the current monitoring screen (pedestrian detection), locate their faces (face detection), identify their identity (face recognition), identify their expressions (expression recognition), and detect their actions (anomaly detection). In this way, people will no longer face dozens or even hundreds of hours of surveillance video in a daze, but directly analyze the data given by the computer, how many people are in the current picture, who are they, whose actions are suspicious, etc. In short, intelligent monitoring will develop rapidly next, because the demand is very urgent.
3) Deep learning
Generally speaking, “image processing is a paradise for deep learning applications”. Some people may not be familiar with the concept of deep learning. You can baidu by yourself. I give a relatively popular explanation here: “if you don’t know what deep learning is, think about the T-800 in Terminator”. Of course, this sentence is not what I said, but from a big bull in the industry. Of course, this may be a little one-sided, but deep learning is indeed recognized as the foundation of a new generation of artificial intelligence.
Here are two examples. One is Google’s artificial brain project. Google can be said to be the leading enterprise in deep learning. The Google brain project announced in 2012 used 16000 computing nodes and trained for several weeks. The obtained artificial intelligence model can independently recognize cat face images, opening the way for a new generation of artificial intelligence. After that, Microsoft Deep Learning Institute, baidu deep learning Institute and other institutions began to invest a lot, Colleges and universities have also made waves for a simple reason. Everyone knows that it will catch fire.
The second is the competition in image recognition. The most authoritative is the Imagenet competition. We train and test our algorithms on an image database with tens of millions of images and thousands of categories to compete for recognition rate. In recent years, the winner has been the deep learning model, specifically the convolutional neural network. For more information about Imagenet competitions over the years, you can baidu by yourself.
When it comes to the application of deep learning in image processing, I have to mention Professor Tang Xiaoou in China. It’s not too much to say that he is the leader of deep learning in China. The deepid face recognition algorithm (divided into three generations in total) proposed by him has achieved 99.75% accuracy in some large-scale face databases (if LFW database), and it can be said that it has exceeded the human recognition rate purely in terms of numbers. Therefore, Professor Tang also set up a company to develop facesdk (although it has not been published). However, it is unreasonable to compare computer with human brain. Each has its own advantages. However, we can see the powerful power of deep learning in the field of image recognition. As for the relationship between deep learning and image processing, it goes without saying here. Google brain recognizes images, deep learning competitions use images, and deepids recognize image faces. Although deep learning is also applied in other aspects, such as speech recognition, image processing is still its main application field.
2、 Image processing research tools
The research of image processing is divided into two parts: algorithm research and application. The main programming languages used are MATLAB, C / C + +, python, etc. the reason is very simple. They all have many response third-party libraries, so we don’t need to program from scratch.
The MATLAB software of mathwork company can be said to be a sharp tool for algorithm research. Its strength lies in its convenient and fast matrix operation ability and graphic simulation ability. In terms of simplicity and encapsulation, it really explodes other languages. However, high encapsulation will inevitably lose some flexibility, and MATLAB is more like a tool than a programming language. Incidentally, it ranked 20th in the 2015 programming language ranking, second only to Objective-C developed by IOS.
For algorithm researchers (especially masters and doctors in Colleges and universities), Matlab is naturally the preferred tool, because it is simple and fast and has good encapsulation. More importantly, almost all algorithm Daniel and elite professors in the world will first publish the corresponding matlab source code, and then gradually rewrite it into other languages for practical application. Therefore, if you want to do research on image processing, matlab must be mastered and proficient.
When you have some ideas to verify, it’s best to write tests in MATLAB wisely. If you come up and use the seemingly tall C + + to experiment, not only will there be a lot of wrong bugs, but the effect may be poor in the end. Even if the effect is good, there will be a lot of time delay. After all, the algorithm development still needs to be fast, so as to send papers before others. In short, as long as it is a contact image algorithm, you can’t escape matlab after all. Even if you are a software developer and don’t develop algorithms, you must be able to understand others’ matlab algorithms.
For those who have not been in touch with MATLAB and image processing before, a related book “detailed explanation of MATLAB image processing examples (with CD-ROM)” is recommended here. This book is very helpful for getting started with MATLAB image processing. I remember when I was a graduate student, I started with two books, one is Gonzalez’s digital image processing, and the other is this detailed explanation of MATLAB image processing examples.
But here’s a friendly reminder. When watching such tutorials (not just matlab), don’t try to remember all tool functions. This practice is very stupid. The correct way is to quickly read such reference books according to your own situation, and you can find out the meaningful source code to knock and practice your hand feeling. As for the specific tool functions, you only need to know that MATLAB provides this function, and then come back to check it later, or Google Baidu. I think in the introductory stage, the most important thing is not how many books we read and how many lessons we listened to, but to type out a piece of code and run the results as soon as possible to build self-confidence and sense of achievement. This is the most real driving force to support us. I remember that I knocked a poor matlab program for license plate detection not long after I saw it. Now it seems to be full of loopholes, but I was really excited and had a sense of achievement. I thought I could do this. For beginners, this feeling is precious.
Opencv is a C + + image processing toolkit developed by Intel, which is understood as C + + version of MATLAB. The original intention of Intel to develop this toolkit was to facilitate everyone to share. I hope everyone can jointly build skyscrapers on the basis of a common architecture, rather than build bungalows on their own foundation.
Unlike MATLAB, opencv is development oriented, with good stability and comprehensive exception handling mechanism. However, it should be noted that since opencv is open source, if you directly call its API in the project, it means that your project must also be open source. Therefore, in the real product development process, we often need to dig the code from the opencv library instead of calling it directly. Fortunately, Intel allows us to look at the source code and compile it ourselves.
When it comes to C + + and opencv, there is a problem that must be raised, that is, the famous Caffe framework in the field of deep learning. This is a typical deep learning framework based on C + + and OpenCV. It was written by Jia Yangqing, one of the leaders of Google deep learning team and Google brain, and published the source code. Nowadays, various in-depth learning institutions are making extensive use of this framework for research.
Here are also two tutorials on OpenCV. One is an introduction to opencv3 programming written by Daniel maoyun, a CSDN blog. This is a book compiled according to his blog for many years. It is a very detailed and typical tutorial, which introduces the relatively cutting-edge knowledge in OpenCV. I’ve read this tutorial. It’s regular. The code inside is easy to understand, especially for beginners. Of course, you should also pay attention not to make the mistake of reading. Just look at its functions, knock the code to practice your hand feeling, and don’t try to remember API functions. If you use more important tools, you will naturally remember, and it is useless to remember unimportant tools.
The second book recommended here is “image recognition and project practice – VC + + and MATLAB technology implementation”. This book is a Book biased towards engineering application. I recommend it because it gives a lot of innovative and running code. One of the items impressed me deeply, which is an example of license plate detection. A brief description: because the number of characters in the license plate is fixed, it locates the license plate area by judging the jump variable and stroke width of the strokes in the horizontal area. This idea is refreshing, and it also gives detailed code. I have tried it myself, and the effect is good.
It is also emphasized here that we must enter the handwriting program as soon as possible to build self-confidence and sense of achievement. I learned opencv and just used it to develop a human face gender recognition system, which is the demand of an undergraduate innovation plan, and the effect is OK.
Python ranked No. 5 in the programming language ranking in December this year, with rapid growth. It can be said that Python has gradually become the new standard of scripting language. In terms of image processing algorithms, in addition to its simple programming advantages, python also benefits from two important Python class libraries – numpy and theano.
Numpy is Python’s linear algebra library, which can provide good support for matrix operation, and can carry out the development and Simulation of many machine learning related algorithms on this basis. Here is a widely recognized book “machine learning practice”. I am also reading this book recently, which covers many classical algorithms in the field of machine learning, ranging from KNN to SVM, Both give a detailed introduction and code implementation (Python version). Theano is Python’s machine learning library, which can facilitate the implementation of deep learning (e.g. convolutional neural network CNN) algorithm. This library is used for the reproduction of deepid algorithm on the Internet.
Personally, from the perspective of image processing alone, Python is not as widely used as the first two tools (MATLAB and openCV). However, as a general scripting language, I think every programmer should understand it. After all, as the saying goes, there is no bad programming language, only bad programmers. When I was learning python, the first program I wrote was the small program of wechat flying. There are detailed tutorials in my blog. Although I wrote it with reference to the video tutorial of little turtle’s zero foundation introduction to learning python, I still have a sense of achievement.
3、 Research methods of image processing
In my opinion, image processing research can be divided into three parts: basic concepts, basic ideas and algorithm research.
1) Basic concepts
The so-called basic concepts are the most basic knowledge in image processing, such as what is an image? What are pixels? What is color image and so on. There is no clear boundary to delimit what is basic concept and what is advanced knowledge, which varies from person to person. To understand the basic knowledge of image processing, one book is a must read, that is, digital image processing written by Gonzalez and translated by Ruan Qiuqi. This book has been used as a classic textbook in the field of image processing for more than 30 years. I have read this book several times, and each time I have a new experience. I think everyone engaged in image should be familiar with this book. In addition to several chapters on wavelet transform, pattern recognition and other relatively abstract contents, other contents in the book are relatively basic and can be understood at the undergraduate level. And I suggest reading this book as soon as possible. If it is a graduate student, try to read it before entering the project, because such a classic book may not have time to read after entering the project, and it will only be consulted at most in the future. I finished reading this book in the winter vacation of my senior year. After that, I became much more relaxed in the process of image introduction. After reading this book, even if you only read the first few chapters, you will understand the concepts of what is an image (two-dimensional or three-dimensional matrix), what is a pixel, a color image and a gray-scale image, a color space, image filtering, image noise, image frequency-domain transformation, etc., which will be much more convenient for further research in the future.
2) Basic thought
At first, I wanted to name this part “basic algorithms” to introduce some basic algorithms in image processing. Later, I thought it over and decided not to write so, because image processing is a very big concept. Image processing is not equal to face recognition or pattern recognition. It is easy to write empty words by directly introducing such basic algorithms of image processing, There is no practical significance. If readers are interested, they can directly Google Baidu “top ten classic algorithms of image processing”, which has what I want to say.
The algorithm is dead and focuses on thought. For example, I personally focus on pattern recognition. There is a very simple way to judge whether a student is getting started in this direction, that is, “if you can naturally imagine an image as a point in high-dimensional space”, it shows that you are getting started in pattern recognition and can classify images. Of course, the standard is not unique. In other fields, such as target detection, there will also be other judgment standards. In short, if we want to process the image, the image is no longer just an image. It may evolve into various forms of concepts, which may be points, faces or coordinate space. In the classical algorithm particle filter of target tracking, small image blocks are regarded as particles; In subspace theory, a series of images are put together to construct a component principal space (such as principal component analysis, PCA algorithm, etc.)., I won’t introduce these algorithms in detail. It’s abstract and old-fashioned to say more, but what I want to say is that we must understand the image itself well. It is an image, a matrix, a container of information and a form of data expression. Images do not have to be visually meaningful (such as images in frequency domain).
In short, the basic idea of image processing should be based on the image itself, go deep into the internal structure of the image, and think flexibly. When I finished my undergraduate course, I didn’t know the corresponding relationship between images and points in high-dimensional space. Then one day, I suddenly understood that this is the so-called quantitative change produces qualitative change. In short, we must think more, summarize more, and take the initiative to study, so that we can really understand some things. The most basic things often contain profound truth. No matter how powerful you are now, you can’t let go of the most original things. Think more about what the image is and what essential attributes it has. You may not get an accurate answer, but you can certainly get some useful insights (a bit like a philosophical question).
3) Algorithm research
Algorithm research should be the core work of image processing, especially for doctors and masters in major universities. Here, I don’t want to talk about the algorithms on the big. I want to talk about some basic things of algorithm research, such as some basic courses, such as matrix operation.
Studying the algorithm of image processing is inseparable from mathematics. Here, I suggest that masters in image processing must take two courses: functional analysis and optimization algorithm. Some schools have listed these two courses as compulsory courses in the graduate stage. These two can be said to be the basis of image processing (at least pattern recognition). I didn’t go to the optimization algorithm at the beginning, but I made it up myself later, otherwise it’s really difficult to do anything. As for functional theory, I didn’t understand it very well when I attended the class at that time, but in the later research process, I found that many basic knowledge and basic theories of image processing are the same as the boring theorems in functional analysis. There’s no way. Some things themselves are boring dry goods. It’s hard to learn without it.
Second, I want to talk about matrix operation. Image is matrix, and image processing is matrix operation. Why everyone likes to use MATLAB is because its matrix computing ability is too powerful. In the world of MATLAB, any variable is a matrix. Similarly, opencv is popular not only because of its good encapsulation, but also because of its matrix format. It defines the basic class of mat and allows you to perform various operations on the matrix. Python is no exception. Its numpy is a special linear algebra library.
In the process of image programming, the API functions on the screen are tools in the final analysis. You can find out from the manual that the real core is the algorithm. The algorithm is written by the formula, the unit of the formula is a variable, and the variable of the image is a matrix. Therefore, it is common to skillfully operate the matrix, find rank, inverse, least square and covariance. Therefore, if you are lucky enough to take the course of matrix analysis, you must understand it. It’s all dry goods.
In short, image processing is a typical field with low threshold and deep hall. Don’t need too much foundation. Having studied linear algebra and a little programming is enough; But those algorithms are unfathomable and a labor-consuming job. When writing this tutorial, I said it very frankly, just like talking to everyone, saying what I think and what I say. In the end, I want to say two digressions, that is, not only for image processing, but also for the introduction of other new technologies. Take the first step as soon as possible, build self-confidence and sense of achievement as soon as possible, let yourself have the courage to go on, and then make up for what you lack. I think it is often not the technology itself that really discourages people, but our lack of confidence in ourselves. Only by starting work decisively can we defeat the demons.