Secure deserialization of Web Security

Time:2021-7-14

Insecure deserialization

In this section, we will introduce what is insecure deserialization and describe how it can expose a website to high-risk attacks. We will focus on typical scenarios and demonstrate some concrete examples of PHP, ruby, and Java deserialization. Finally, we will introduce some methods to avoid unsafe deserialization vulnerabilities.

Secure deserialization of Web Security

It is often difficult to take advantage of unsafe deserialization. However, it is sometimes much simpler than you think. If you are not familiar with deserialization, this section contains some important background information that you should be familiar with first. If you already know the basics of deserialization, you can jump right to learning how to use it.

What is serialization

Serialization is the process of transforming complex data structures (such as objects and their fields) into “flatter” formats, which can be sent and received as byte stream sequences. Serialized data makes the following process easier:

  • Write complex data to interprocess memory, file or database
  • Sending complex data, for example, through network or API calls, between different components of an application

Crucially, when an object is serialized, its state remains unchanged. In other words, the properties of the object and their assignments are preserved.

Serialization vs. deserialization

Deserialization is the process of restoring a byte stream to exactly the same copy as the original object. The logic of the site can then interact with the deserialized object just as it does with any other object.

Secure deserialization of Web Security

Many programming languages provide local support for serialization. How to serialize an object depends on the language. Some languages serialize objects to binary format, while others serialize them to string format with varying degrees of readability. Note that all properties of the original object are stored in the serialized data stream, including all private fields. To prevent a field from being serialized, it must be explicitly marked as “transient” in the class declaration.

Note that serialization may be called marshalling (Ruby) or pickling (Python) when using different programming languages, and these terms are synonymous with “serialization.”.

What is unsafe deserialization

Unsafe deserialization means that the user controlled data is deserialized by the website. This allows attackers to manipulate serialized objects in order to pass harmful data into the application code.

You can even replace a serialized object with an object of a completely different class. Worryingly, the object of any available class on the website will be deserialized and instantiated, regardless of whether the class is expected or not. Therefore, insecure deserialization is sometimes referred to as an “object injection” object injection vulnerability.

An object of an unexpected class may cause an exception. Before that, however, damage may have been caused. Many attacks based on deserialization are completed before the end of deserialization. This means that the deserialization process itself can be attacked, even if the website’s functionality does not directly interact with malicious objects. Therefore, websites whose logic is based on strongly typed languages are also vulnerable to these technologies.

How does an insecure deserialization vulnerability occur

Unsafe deserialization usually occurs because people generally lack the understanding of the risk of user controlled data deserialization. Ideally, user input should not be deserialized at all.

Some site owners think they are safe because they do some form of additional checking on the deserialized data. However, this approach is often ineffective because it is almost impossible to verify or anticipate all possible scenarios. These checks are also fundamentally flawed because they rely on checking the data after it has been deserialized, which in many cases is too late to prevent attacks.

The vulnerability may also be due to the fact that the deserialized object is usually considered trustworthy. Especially when using the language of binary serialization format, developers may think that users cannot read or manipulate data effectively. However, while this may require more effort, the possibility for an attacker to exploit binary serialized objects is the same as that for a string based format.

Due to the large number of dependencies in modern websites, it is possible to attack based on deserialization. A site may use many different libraries, and each library has its own dependencies, which leads to a pool with a large number of classes and methods that is difficult to manage safely. Since an attacker can create any instance of these classes, it is difficult to predict which methods can be called on malicious data. This is especially true if an attacker can link a long string of unexpected method calls together and pass data to a receiver that is completely unrelated to the original source. As a result, it is almost impossible to predict the flow of malicious data and plug every potential vulnerability.

In short, it is impossible to deserialize untrusted input safely.

What is the impact of unsafe deserialization

The impact of insecure deserialization can be very serious because it provides an entry point, resulting in a significant increase in attack area. It allows attackers to reuse existing application code in a harmful way, leading to many other vulnerabilities, such as remote code execution

Even when remote code cannot be executed, unsafe deserialization can lead to privilege escalation, access to arbitrary files, and denial of service attacks.

How to exploit an insecure deserialization vulnerability

Details are provided below.

How to prevent unsafe deserialization vulnerability

In general, deserialization of user input should be avoided unless absolutely necessary. In many cases, the difficulty of defending potential high-risk vulnerabilities outweighs the benefits.

If you do need to deserialize data from untrusted sources, take strong steps to ensure that the data is not tampered with. For example, you can implement a digital signature to check the integrity of the data. Remember, however, that any checks must be done before deserialization begins. Otherwise, the inspection is useless.

If possible, you should avoid using common deserialization features. The serialized data of these methods contains all the properties of the original object, as well as private fields that may contain sensitive information. Instead, you should create your own class specific serialization methods to control the exposed fields.

Finally, remember that the vulnerability is the deserialization of user input, not the existence of a toolchain that subsequently processes the data. Do not rely on trying to eliminate the tool chain identified in the testing process, which is unrealistic due to the existence of cross library dependencies. At any given time, publicly logged memory corruption vulnerabilities also mean that applications may be vulnerable.


Exploit the vulnerability of insure deserialization

In this section, we’ll show you how to exploit some common vulnerability scenarios through examples of PHP, ruby, and Java deserialization. We want to prove that it’s actually much easier to take advantage of insecure deserialization than many people think. If you can use pre built toolchains, even during black box testing.

We will also guide you in creating attacks based on high-risk deserialization vulnerabilities. Although these often require access to source code, once you understand the basic concepts, they are easier to learn than you think. We will discuss the following topics:

  • How to identify unsafe deserialization
  • Modify the serialization object required by the web site
  • Transferring malicious data to dangerous website functions
  • Inject any object type
  • Chained method calls to control the flow of data into dangerous receivers
  • Manually create your own advanced exploit
  • Phar deserialization

Note: Although many of the experiments and examples are based on PHP, most development techniques work equally well with other languages.

How to identify unsafe deserialization

Identifying unsafe deserialization is relatively easy, whether you use white box or black box testing.

During the audit process, you should look at all incoming data from the site and try to identify any data that is similar to serialization. If you know the formats used in different languages, you can recognize the serialized data relatively easily. In this section, we’ll show examples of PHP and Java serialization. Once the serialized data is determined, you can test whether you can control it.

PHP serialization format

PHP uses an almost readable string format, with letters for the data type and numbers for the length of each part. For example, suppose a user object has the following properties:

$user->name = "carlos";
$user->isLoggedIn = true;

After serialization, this object might look like this:

O:4:"User":2:{s:4:"name":s:6:"carlos"; s:10:"isLoggedIn":b:1;}

It means:

  • O:4:"User"-An object whose class name is a four character “user”
  • 2-Object has two properties
  • s:4:"name"-The key of the first attribute is the 4-character string “name”
  • s:6:"carlos"-The value of the first attribute is the 6-character string “Carlos”
  • s:10:"isLoggedIn"-The key of the second attribute is the 10 character string “isloggedin”
  • b:1-The value of the second property is the Boolean value true

The native method of PHP serialization isserialize()andunserialize()。 If you have access to source code, you should first find it in all locationsunserialize()And further investigation.

Java serialization format

Some languages, such as Java, use the binary serialization format. This is harder to read, but if you know how to recognize some signals, you can still recognize the serialized data. For example, serialized Java objects always start with the same bytes, which are encoded in hexadecimalac edAnd Base64rO0

Implementation interfacejava.io.SerializableAny class of can be serialized and deserialized. If you have access to source code, please use it carefullyreadObject()Method used to read and deserialize data from InputStream.

Manipulating serialized objects

Exploiting certain deserialization vulnerabilities is as easy as changing properties in a serialized object. When the object state is persisted, you can study the serialized data to identify and edit the property values of interest. Then, the malicious object is passed to the website through the deserialization process. This is the initial step of a basic deserialization attack.

Broadly speaking, there are two ways to manipulate serialized objects. You can edit objects directly as a byte stream, or write a short script in the corresponding language to create and serialize new objects yourself. When using the binary serialization format, the latter method is usually easier.

Modify object properties

When tampering with data, as long as the attacker keeps a valid serialized object, the deserialization process will use the modified attribute value to create a server-side object.

As a simple example, suppose a website uses serialization objectsUserStore data about the user session in a cookie. If the attacker finds this serialized object in the HTTP request, they may decode it to find the following byte stream:

O:4:"User":2:{s:8:"username";s:6:"carlos";s:7:"isAdmin";b:0;}

thisisAdminAttributes are easy to interest attackers. The attacker only needs to change the Boolean value of this property to 1 (true), and then re encode the object and use the modified value to override the current cookie. On its own, it doesn’t work. However, if the website uses this cookie to check whether the current user has access to certain administrative functions:

$user = unserialize($_COOKIE);
if ($user->isAdmin === true) {
// allow access to admin interface
}

The above code will instantiate the user object based on the data from the cookie, including the user object modified by the attackerisAdminProperty and does not check the authenticity of the serialized object. At this point, the modified data will be upgraded directly.

This simple scenario is not common. However, editing attribute values in this way shows the first step of the attack.

Modify data type

In addition to modifying the property values in the serialized object, we can also provide unexpected data types.

Weakly typed languages such as PHP are particularly vulnerable to this operation when using the loose comparison operator = = to compare different data types. For example, if you perform a loose comparison between an integer and a string, PHP will try to convert the string to an integer, which means5 == "5"The result is true

In particular, this also applies to any alphanumeric string that starts with a number. PHP converts the entire string to the integer value of the initial number, and the rest of the string is completely ignored. Therefore,5 == "5 of something"In fact, it is regarded as5 == 5

This becomes even more strange when comparing a string to the integer 0:

0 == "Example string" // true

Because there are no numbers in the string, PHP treats the entire string as an integer 0.

Consider the case where using this loose comparison operator with user controlled data from a deserialized object can lead to dangerous logic flaws.

$login = unserialize($_COOKIE)
if ($login['password'] == $password) {
// log in successfully
}

Suppose the attacker modifies the password property to an integer of 0 instead of the expected string. So as long as the stored password does not start with a number, it will cause the authentication to pass. Note that this is only a possibility, because deserialization preserves the data type, if the code gets the password directly from the request, then 0 is converted to a string, and the condition evaluates to false.

Note that when modifying data types in any serialized object format, it is important to remember to also update any type labels and length indicators in the serialized data. Otherwise, the serialized object will be corrupted and will not be deserialized.

When using binary format directly, we recommend using hackvertor extension, which can be obtained from BAPP store. With hackvertor, you can change the serialized data to a string, which will automatically update the binary data and adjust the offset accordingly, which can save a lot of manual operation.

Using application features

In addition to simply checking property values, web site functionality can also perform dangerous operations on data in deserialized objects. In this case, you can use unsafe deserialization to pass unexpected data and take advantage of related functions to cause damage.

For example, as part of the site’s “delete users” feature, by visiting$user->image_locationProperty to delete a user’s profile image. If this$userFrom a serialized object, the attacker can modify theimage_locationObject to set it to any file path. Deleting their own user account will also delete this arbitrary file.

This example relies on an attacker to manually call a dangerous method through a user accessible function. However, unsafe deserialization becomes more interesting when you construct exploits that automatically pass data to dangerous methods. This is achieved by using the “magic method”.

Magic Methods

Magic methods are a special subset of methods that don’t have to be explicitly called. Instead, they are called automatically when a particular event or scenario occurs. Magic method is a common feature of object-oriented programming in various languages. They are sometimes represented by prefixing method names or by surrounding them with double underscores.

Developers can add magic methods to the class to predetermine which code should be executed when the corresponding event or scenario occurs. The exact time and reason for calling magic methods varies from method to method. One of the most common examples in PHP is__construct(), which is called when instantiating an object of a class, similar to Python’s__init__。 Typically, constructor magic methods like this contain code to initialize instance properties. However, developers can customize magic methods to execute whatever code they want.

Magic method is widely used, it does not represent a loophole in itself. But they can become dangerous when the code they execute processes data that an attacker can control, such as data from a deserialized object. An attacker can use this vulnerability to automatically call methods on deserialized data when corresponding conditions are met.

In this case, most importantly, some languages have magic methods that are called automatically during deserialization. For example, PHP’sunserialize()Method to find and call the __wakeup()The magic way.

In Java deserialization, the same applies toreadObject()Method, which is essentially similar to “reinitialize” the constructor of a serialized object. thisObjectInputStream.readObject()Method is used to read data from the initial byte stream. However, serializable classes can also declare their ownreadObject()The method is as follows

private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {...};

This allows the class to more closely control the deserialization of its own fields. The most important thing is to declare it in this wayreadObject()Method acts as a magic method called during deserialization.

You should pay close attention to any class that contains such magic methods. They allow you to pass data from the serialized object to the site code before the object is fully deserialized. This is the starting point for exploiting more advanced vulnerabilities.

Inject any object

As we’ve seen, occasionally unsafe deserialization can be exploited by editing the objects provided by the site. However, injecting arbitrary objects can bring more possibilities.

In object-oriented programming, the available methods of an object are determined by its class. Therefore, if an attacker can manipulate the object class passed in as serialized data, it can affect the code executed after deserialization or even during deserialization.

Deserialization methods usually do not check the contents of the deserialization. This means that you can pass in an object of any serializable class available to the site, and that object will be deserialized. This allows an attacker to create an instance of an arbitrary class. The fact that the object is not the expected class does not matter. Unexpected object types can cause exceptions in the application logic, but the malicious object has been instantiated.

If attackers have access to the source code, they can examine all available classes in detail. To construct a simple attack, they look for classes that contain deserialization magic methods, and then check whether any of them perform dangerous operations on controllable data. Then, the attacker will pass in the serialized object of this class to use its magic method to attack.

Classes that contain these deserialization magic methods can also be used to launch more complex attacks, which involve a series of method calls called “gadget chain” call chains.

Call chain

A “gadget” is a piece of code in an application that can help an attacker achieve a specific goal. A single gadget cannot directly cause any harmful effect on user input. However, the attacker’s goal may be to call a method that passes its input to another widget. By linking multiple gadgets together in this way, attackers may pass their input to a dangerous “sink gadget”, causing the greatest damage.

It’s important to understand that, unlike other types of attacks, a gadget chain is not the payload of a chained method built by an attacker. All the code already exists on the website. The attacker’s only control is the data passed to the gadget chain. This is usually done by calling magic methods during deserialization, sometimes called “startup gadgets.”.

Many insecure deserialization vulnerabilities can only be exploited by using widget chains. This may sometimes be a simple one-step or two-step chain, but building a high-risk attack may require more elaborate object instantiation and method call sequences. Therefore, the ability to construct a gadget chain is one of the key factors to successfully utilize unsafe deserialization.

Using a pre built widget chain

Manual identification of gadget chains can be a rather arduous process, which is almost impossible without source code access. Fortunately, there are some ways to deal with pre built gadget chains, which you can try first.

There are several tools available to help you build a gadget chain with minimal effort. These tools provide a series of pre discovered gadget chains that have been used on other websites. When an insecure deserialization vulnerability is found on the target site, you can use these tools to try and exploit it even if you do not have access to the source code. This approach is possible due to the widespread use of libraries that contain exploitable gadget chains. For example, if the gadget chain of a Java dependent Apache commons collections library can be used on a website, then any other website using the library can also use the same chain to attack.

One of the tools used for Java deserialization is “ysoserial.”. You just specify a library that you think the target application is using, and then provide a command to try and execute, and the tool will create the appropriate serialized object based on the known widget chain of the given library. It’s still a bit of a try, but it’s much easier than building your own gadget chain by hand.

Most languages that often suffer from insecure deserialization attacks have matching proof of concept tools. For example, for PHP based sites, you can use “PHP generic gadget chains” (phpggc).

It should be noted that the gadget chain in the website code or any of its libraries is not the cause of the vulnerability. The vulnerability is that the user can control the data deserialization, and the gadget chain is only a means to manipulate the data flow after the data is injected. This also applies to various memory corruption vulnerabilities that rely on untrusted data deserialization. So even if they try to manage every possible inserted gadget chain, the site may still be vulnerable.

Using a recorded widget chain

You can see if there are any documented exploits that can be used to attack your target website. Even without a dedicated tool for automatically generating serialized objects, you can still find documented gadget chains for popular frameworks and manually adjust them.

If you can’t find a usable gadget chain, you can still gain valuable knowledge, which you can use to create your own custom vulnerability exploiter.

Create your own exploit

You need to create your own exploits when off the shelf gadget chains and documented exploits fail.

To build your own gadget chain successfully, you almost certainly need to access the source code. The first step is to study this source code to identify the class that contains the magic method called during deserialization. Evaluate the code executed by this magic method to see if it does anything dangerous directly using user controllable properties.

If the magic method itself is not available, it can be used as a starting point for your gadget chain. Study any method that starts the widget call. Are these operations dangerous to the data you control? If not, please look at every method that they invoke, and so on.

Repeat this process, tracking the values you can access until you reach a dead end or identify a dangerous sink gadget to which your controllable data is transferred.

Once you’ve solved how to successfully construct a widget chain in your application code, the next step is to create a serialized object that contains a payload. All that is needed is to study the class declaration in the source code and create a valid serialized object with the appropriate value needed to exploit the vulnerability. As we saw in previous labs, this is relatively simple when using string based serialization formats.

Using binary formats, such as when building Java deserialization vulnerabilities, can be particularly cumbersome. When making small changes to existing objects, it can be comfortable to use bytes directly. But when you make more important changes, such as passing in a brand new object, it quickly becomes impractical. To generate and serialize your own data, it’s usually much easier to write your own code in the target language.

When creating your own gadget chain, you should pay attention to the chance of using this additional attack surface to trigger secondary vulnerabilities.

By carefully studying the source code, you can find longer gadget chains that may allow you to build high-risk attacks, usually including remote code execution

Phar deserialization

So far, we have mainly studied how to exploit the deserialization vulnerability, that is, the website explicitly deserializes user input. However, in PHP, sometimes it is possible to exploit the deserialization vulnerability even if the unserialize () method is not explicitly used.

When you access different files, PHP provides different ways to handle them. One of them isphar://, which provides a streaming interface to access PHP Archive (. Phar) files.

The PHP documentation reveals that the phar manifest file contains serialized metadata. It’s crucial if you’re rightphar://Streams perform file system operations, and their metadata is implicitly deserialized. That meansphar://A stream can be a potential point to exploit unsafe deserialization, provided that the stream can be passed into a file system method.

For obviously dangerous file system methods, such asinclude()orfopen()It is likely that the website has implemented countermeasures to reduce the possibility of their malicious use. However, such asfile_exists()Such methods that do not appear to be obviously dangerous may not be well protected.

This technology requires you to upload phar to the server in some way. For example, one way is to use the image upload function. If you can disguise phar as a simple JPG file, you can sometimes bypass the verification check of the website. If you can force the website to load the phar stream disguised as JPG, any harmful data injected through phar metadata will be deserialized. Since PHP does not check the file extension when reading the stream, it is not important that the file uses the image extension.

As long as the class of the object is supported by the web site, the__wakeup() and__destruct()Magic methods can be called in this way, allowing you to start a gadget chain using this technique.

Using deserialization through memory corruption

Even if you don’t use a gadget chain, it’s possible to take advantage of unsafe deserialization. If all other methods fail, there are usually publicly documented memory corruption vulnerabilities that can be exploited by insecure deserialization. These often lead to remote code execution.

Deserialization methods, such as PHPunserialize() This kind of attack is rarely strengthened, exposing a large number of attack surface. It is not always considered a vulnerability in itself, because these methods are not intended to handle user controlled input in the beginning.