Deep good article: PHP copy on write and garbage collection mechanism (transfer)


Original address:

Copy on writeCOW)It is an optimization strategy in the field of computer programming. The core idea is that if multiple callers require the same resource (such as data storage on memory or disk), they will get the same pointer to the same resource together. Until a caller tries to modify the content of the resource, the system will actually copy a private copy to the caller, and what other callers see The original resources remain unchanged. This process is transparent to other callers. The main advantage of this method is that if the caller does not modify the resource, no private copy will be created, so multiple callers can only share the same resource when reading.


Be careful: the following code is based on php5.6, and the reference counting mechanism has changed after php7.

As we all know, PHP is implemented by C, but C is a strongly typed language. How can PHP be a weakly typed language. Together, the code of PHP variables in the bottom layer of C language:

typedef struct _zval_struct zval;
typedef unsigned int zend_uint;
typedef unsigned char zend_uchar;
struct _zval_struct {
 Zvalue ﹐ value; / * note that the value of the variable is stored here*/
 Zend? Uint refcount? GC; / * reference count*/
 Zend Chu uchar type; / * variable current data type*/
 Zend? Uchar is? Ref? GC; / * whether the variable references*/
typedef union _zvalue_value {
 Long lval; / * integer value in PHP*/
 Double dval; / * floating point value of PHP*/
 struct { 
 char *val;
 int len;
 }Str; / * string of PHP*/
 Hashtable * HT; / * array*/
 Zend_object_value obj; / * object*/
} zvalue_value;

The lower level of PHP variables is a structure named zval, and the zvalue ﹣ value structure in it is actually a union, which actually stores the variable values of PHP. In order to distinguish whether the same zval address is shared by multiple variables, Zend engine introducedref_countandis_refTwo variables are identified.

Run the following code to observe the change of refcount:

//-----Execution results----- 
foo: (refcount=1, is_ref=0)=1 
foo: (refcount=2, is_ref=0)=1 
foo: (refcount=1, is_ref=0)=1

When $foo is assigned, the value of the $foo variable is only pointed to by the $foo variable. When the value of $foo is assigned to $bar, PHP does not give a copy of the memory to $bar, but points $Foo and $bar to an address. At the same time, the reference count is increased by 1, that is, the new 2. Then, we change the value of $bar. If we need the memory that the $bar variable points to directly, the value of $foo will also change. This is not the result we want. Therefore, the PHP kernel copies a copy of the memory and updates its value to the assigned value: 2 (this operation is also called variable separation operation). At the same time, the original $foo variable only points to $foo, so the reference count is updated to: refcount = 1.

Let’s take a look at an example of memory, which makes it easier to see the obvious role of cow in memory usage optimization:

<?php $j = 1; 
$tipi = (0, 100000, 'php-internal'); 
$tipi_copy = $tipi; 
foreach($tipi_copy as $i){ 
 $j += ($i); 
$ php t.php 

The above code typically highlights the role of cow. When the array variable $tipi is assigned to $tipi ﹐ copy, the use of memory does not increase by half immediately, nor does it change significantly when the number of iterations $tipi ﹐ copy. Here, the data of $tipi ﹐ copy and $tipi variables point to the same block of memory without copying.

That is to say, even if we do not use reference, after a variable is assigned, as long as we do not change the value of the variable, we will not apply for new memory to store data. According to this, we can easily think of some scenarios in which cow can control the memory usage very effectively: just use variables for calculation and rarely modify them, such as the transfer of function parameters, the replication of large arrays and so on, which do not need to change the value of variables.

Principle of reference counting

After understanding the internal storage structure of PHP variables, we can understand the principles related to PHP variable assignment and the early garbage collection mechanism.

The memory recycling algorithm used in php5.2 is the famous reference counting algorithm, which is called “reference counting” in Chinese translation. Its idea is very intuitive and concise: assign a counter to each memory object, when a memory object is established, the counter is initialized to 1 (therefore, there is always a variable referencing this object), and every new variable references this When the garbage collection mechanism operates, all memory objects with counter 0 will be destroyed and the occupied memory will be recycled.

Memory leak

However, there is a vulnerability in the garbage collection mechanism before php5.3, that is, when an array or an internal child element of an object references its parent element, if the parent element is deleted at this time, the variable container will not be deleted because its child element still points to the variable container, but because there is no symbol pointing to the variable container in all scopes, it cannot be deleted Clear, so a memory leak occurs until the end of the script execution

If you have Xdebug installed, you can call thexdebug_debug_zval()Display the values of “refcount” and “is” ref “.

Give an example:

Because the output of this example is not good, it is shown in the figure as follows:


Give an example:


As shown in the picture:


Root buffer mechanism

Introduced after php5.3Root buffer mechanismThat is to say, when PHP starts, the default setting is to specify the number of root buffers of zval (10000 by default). When PHP finds that there are zval with circular references, it will put them into the root buffer. When the root buffer reaches the specified number in the configuration file (10000 by default), it will be garbage collected to solve the problem of memory leakage caused by circular references

Why didn’t all the memory come back

Because the core structure of PHP, hashtable, cannot allocate enough memory blocks at one time when it is defined, so only a small block will be allocated when it is initialized, and it will be expanded when it is not enough, while hashtable only expands without reducing, so when 100 variables are stored, the symbol table will be expanded when it is not enough, and when unset(), it will only be allocated for the variable value But the memory allocated for the variable name is still in the symbol table. The symbol table is not shrunk, so the memory confiscated is occupied by the symbol table.

PHP does not apply to the OS for memory as long as there is not enough memory, but first applies for a large amount of memory, and then distributes part of it to the applicant. In this way, when there is a logic need to apply for memory, it does not need to apply to the OS for memory again, avoiding repeated application, and only applies when a large amount of memory is not enough. When the memory is released, PHP does not return the memory to the OS, but maintains a list of free memory tracks for reuse.

Garbage collection related configuration

  • Zend.enable_gc, the default value is on. If you want to turn off the garbage collection mechanism, you can set it to off

Little knowledge points

  • unset(): unset() is just to disconnect a variable from a block of memory area, and at the same time, reduce the reference count of the memory area by 1. Whether the memory is recycled mainly depends on whether the refcount is 0.
  • null: to assign null to a variable is to directly empty the data structure pointed to by the variable and set its reference count to 0.
  • End of script execution: all memory in the script will be released, whether there is a ring reference or not.

Recommended Today

Building grpc service with golang

This tutorial provides a basic tutorial for go to use grpc In the tutorial you will learn how to: stay.protoA service is defined in the file. Use the protocol buffer compiler to generate client and server code. Use grpc’s go API to write a client and server for your service. Before you continue, make sure […]