The past and present life of captcha: from text recognition to insensitive verification


On September 24, 2017, on the first day of pre registration for the national postgraduate entrance examination, a senior female student of Chengdu University, when registering online, actually appeared a verification code with the words “don’t test”, and a line of red letter was displayed on the verification code: the user name or password you entered is incorrect. The person in charge of the “China graduate enrollment information network”, which is responsible for the national postgraduate enrollment, responded that the word “biekao” appeared in the verification code was purely a coincidence.

The past and present life of captcha: from text recognition to insensitive verification

It is understood that the verification code of the registration system of the research recruitment website consists of three categories: Chinese characters, letters + numbers, and numerical calculation. Candidates may encounter these three categories when inputting verification codes. Although the captcha of “biekao” only appears randomly, it reminds people of those abnormal captcha 12306 during the Spring Festival. It seems that the captcha is just as meaningless as “proving that your mother is your mother”. Does the captcha exist to embarrass human beings?

Captcha has become one of the necessary security mechanisms for most websites and applications. Although the process is cumbersome, it plays an important role. When inputting the verification code, the background system can identify whether the login is a person or a computer program by inputting the length of time, so as to avoid password leakage, ticket swiping and cheating caused by malicious login.

Captcha was born more than 20 years ago

The full name of captcha is “Turing test for automatic distinction between computer and human”. It was proposed by Louis von ANN of Carnegie Mellon University in 1997. Its original intention is to identify real people or malicious programs. Verification code is mainly reflected in the following ways: the computer will automatically generate a question for the user to answer. This question can be generated and judged by the computer, but only human can solve it. The operator who answers the question can be considered as human.

The past and present life of captcha: from text recognition to insensitive verification
Father of captcha: Louis von ANN

Verification code is to use the principle of “human can easily identify the text information in the picture with the naked eye, but the machine can’t” to resist malicious login. By identifying and inputting these interactions, it can distinguish the robot from the real human, and prevent malicious attacks or brushing. It is a public automatic program that uses consciousness to distinguish whether the user is a computer or a person , login, online shopping, trading and other scenarios play a huge role, and in the evolution of the network has always become an indispensable technology. In addition, British medical experts have found that captcha may be used to detect the risk of dementia as soon as possible.

Evolution of captcha: from text recognition to insensitive verification

The early captcha is that the website put forward some problems. With the increasingly upgrading of the two aspects of security protection and cracking intrusion, the difficulty of captcha is increasing, and the forms are also diversified. From simple alphanumeric and arithmetic problems to distorted characters and blurred pictures, these are classified as knowledge captcha.

The past and present life of captcha: from text recognition to insensitive verification
Various captcha

Although captcha is very helpful to website platform, not everyone likes captcha. Louis von Ann reported in 2009 that every American spends 1.9 seconds a day solving the captcha problem. If the population of the United States was 309 million, it would take them 6795 days every year.

In China, verification code has always make complaints about objects. Not only is the national postgraduate entrance examination name of the name of this make complaints about the code, but also by the majority of netizens Tucao 12306 “Metamorphosis” verification code.
The past and present life of captcha: from text recognition to insensitive verification
Google’s reCAPTCHA

In order to save netizens’ time and improve the operation experience, the new generation of captcha, such as Google’s reCAPTCHA and top image’s imperceptible verification, has begun to evolve into knowledge-free. The specific embodiment is that you need to click or drag the slider, or even do not need any operation to complete the network login authentication. This new verification method can solve the contradiction between website security and user experience.

Top image insensitivity verification based on Artificial Intelligence

As a new generation of captcha, top image imperceptible verification is based on artificial intelligence, based on user behavior and environmental information, etc. according to data information, combined with model and risk control analysis, it can distinguish between human and machine, and effectively prevent and control new threats.
The past and present life of captcha: from text recognition to insensitive verification
Top image “insensibility verification”

Based on the machine learning model platform, the senseless verification creates optimization related models for verification code machine simulation trajectory protection. This paper includes trajectory time-consuming detection, abnormal trajectory detection (including the trajectories generated by conventional signal generator such as straight line, uniform velocity, aggregate curve, etc.) and outlier behavior detected by anomaly detection algorithm.

In terms of user access, imperceptible authentication is protected by human-computer interaction. Such as the mouse in the page sliding track, keyboard tapping rate, sliding verification code sliding track, speed, button click behavior trajectory model detection to protect.

In the aspect of anomaly detection, one of the anomaly detection algorithms used in insensitive verification is isolation forest. In isolation forest, the concept of isolation is proposed, that is, the outlier data is isolated from the existing data distribution to achieve the purpose of anomaly detection. This algorithm is better than the algorithm of creating profile based on normal data points for anomaly detection, such as replicator neural Network and one class SVM have higher ability and accuracy of anomaly recognition.

In addition, the two classifiers can also be trained according to the existing verification code data to predict and distinguish the collected human-computer interaction behavior data, so as to further improve the accuracy of identifying malicious behavior.

In order to prevent web crawlers from brute force cracking the captcha, insensible verification uses technologies such as disordered slicing of pictures, regular processing of image update, image variation, etc., combined with correlation detection, and through built-in rules and policies, it can judge the correlation, such as the association of the same device, the Association of the same IP, the correlation of sliding failure, the correlation of verification times, etc Don’t be abnormal correlation in a short period of time.

In addition, the data transmission link has built-in “disordered cut image transmission” function, which can spread the background image after disorderly cutting, greatly improving the difficulty of cracking.

Sensorless verification gate: