On the reference count in PHP string type

Time:2020-3-29

Author: Wang Shu

Background introduction

  • String type is also a commonly used type. Due to the characteristics of strings, in order to save memory, the same string variables usually share a block of memory space. By reference counting, multiple variables are marked to use this memory.
  • However, after GDB tracking, it is found that not all strings are normal in the operation reference count, with normal accumulation, sometimes 0, sometimes 1. In order to find out, this paper simply analyzes various assignment situations.

Environmental situation

  • System version: Ubuntu 16.04.3 LTS
  • PHP version: PHP 7.1.0
  • GDB version: GNU GDB (Ubuntu 7.11.1-0ubuntu 1 ~ 16.5) 7.11.1

1、 Basic variable

Zval is the basis of all variables in PHP. (line 121 of zend_type. H)

On the reference count in PHP string type

Where zend_value stores the specific data, and the structure is shown in the figure: (zend_type. H 101 line)

On the reference count in PHP string type

  • Zend_value is a union, which takes 8 bytes as a whole.
  • U1 is a union, which stores the necessary data required by the type and takes up 4 bytes.
  • U2 bit is a union, which stores some extra data, such as next in hash collision, which takes up 4 bytes.

The entire zval structure, which takes 16 bytes, supports all types of PHP.

On the reference count in PHP string type

Php7 uses such a simple and ingenious zval to store all types of data, so how can a string of uncertain length be stored in a 16 byte zval?

2、 String variable

<?php
$a = "hello world";
echo $a;

Through GDB debugging, you can see:

On the reference count in PHP string type

Type = 6, compared with the definition of type, you can see that the type isIS_STRING(line zend_type. H 303)

On the reference count in PHP string type

Because our string length is not necessarily, 16 bytes of zval alone cannot be directly stored, so we point to the memory address of the real storage string through STR in value. By printing, we can see that the address type iszend_string

On the reference count in PHP string type

1. Zend_string structure

First, take a look at its data structure, as shown in the figure (line 169 of zend_type. H)

On the reference count in PHP string type

GC in zend_string structure
The head first is GC. You can see other complex types. The head has a GC. What is its function?
Look at the data structure of GC, as shown in the figure:

On the reference count in PHP string type

  • The first is refcount, which records the number of references.
  • The second u is a consortium, which is very similar to U1 of zval. The key is to record the type.

So it’s better to guess. When a program performs GC or other operations, for any complex type, the pointer head is GC, which not only has reference count, but also can determine the real type of the complex type through u.v.type.

H in zend_string structure
From the name, we can guess that this is the hash of string, the idea of space for time, to save the calculated hash and improve performance.

Len in zend_string
Obviously, it stores the length of the string.

Val [1] in zend_string
This writing method is a flexible array in C, where the whole string is stored. In this way, the memory address of the string is closely connected with the memory address of the structure, reducing the time to get the value from another block of memory.
(PS: leave a little question and GDB can trace it. Does flexible array occupy memory space? What is the structure of Zend string after alignment? How much does it occupy as a whole

2. Zend_string actual content

After understanding the structure itself, you can print the content to have a look, as shown in the figure

On the reference count in PHP string type

What is stored in this address is really Hello world. Why is refcount 0 in GC?

The reasons are as follows:

  • Constant string, a fixed string in PHP code, is stored in the global variable table in the compilation phase, also known as the literal scale. It will be destroyed only after the request is finished, so refcount is always 0.
  • The temporary string, which occurs when the virtual machine executes opcode, is stored in the temporary variable area and has a normal refcount.

Change the code, look at the temporary string

<?php
$a = "hello world".time();
echo $a;

Print the zval of this variable, refcount is 1, as shown in the figure

On the reference count in PHP string type

3、 Reference count for string

1. Direct assignment of temporary string

For temporary strings, it should be the reference count + 1 in zend_string for each variable assigned, and release this memory when the reference count is 0.
<?php

$a = "hello world".time();
echo $a;
$b = $a;
echo $b;
When the assignment of $a is completed, $a is in the first position on the stack. The type is 6, is_string. Take the STR in value and the address is * * 0x7ff4402c30 * *. You can see the content. The zend_string reference count is 1.

On the reference count in PHP string type

When the assignment of $B is completed, the second position of $B on the stack is type 6, is_string. The same address of STR in value is * * 0x7ff4402c30 * *, and the reference count of zend_string is 2.

On the reference count in PHP string type

The general quotation can be drawn as follows:

On the reference count in PHP string type

2. Reference assignment

For direct variable assignment, the reference relationship has been drawn above. What about the reference type?
<?php

$a = "hello world".time();
echo $a;
$b = &$a;
echo $b;
When the assignment of $a is completed, $a is in the first position on the stack. The type is 6, is_string. Take the STR in value and the address is * * 0x7ff4402c30 * *. You can see the content. The zend_string reference count is 1.

On the reference count in PHP string type

When $B is assigned as a reference type, $B is in the second position on the stack, the type is 10, is "reference, and you can see the content by taking ref in value.

On the reference count in PHP string type

Does the type of $a change at this time? Is it a string type? Just print $a and have a look. At this time, the type of $a becomes 10. Is "reference. Print ref in value. The address is the same as ref in $B!

On the reference count in PHP string type

When $B references $a, both $a and $B become reference types. The reference type points to a zval in, the type is 6, the STR in is string, the value points to a Zend string, and the Zend string reference count is 1

On the reference count in PHP string type

The general reference is shown in the figure:

On the reference count in PHP string type

4、 String variable special value

<?php
$a = “string”;
$b = “double”;
 
echo $a;
echo $b;

In our conclusion, both $a and $B belong to constant strings.

Print the zend_string of $a, as shown in the figure

On the reference count in PHP string type

Print the zend_string of $B, as shown in the figure

On the reference count in PHP string type

It can be seen that $B meets the expectation, but $a subverts the above theory.
What’s the problem?

After GDB tracking, you can see that a and B are on the stack, and both are of string type.
However, the str address in value is quite different.
Let’s first look at variable a
At the first position of the stack, the value of STR is 0x11522c0

On the reference count in PHP string type

Second, variable b
In the second position on the stack, the str value is 0x7ff4401880

On the reference count in PHP string type

If you know PHP’s memory allocation, you can see that the string of B is allocated on the chunk 0x7ffff440000, which belongs to the first page, 0x7ff4401000

The string of a is obviously not this rule. It is not assigned to the chunk, but a very special address.
So string is not allocated by “emalloc”.

So, in a stupid way, I turn 0x11522c0 (zend_string *) 0x11522c0 and see when the value in it is put in.

PHP version 7.1.0
The first node: Line 1345 in php_cli. C

sapi_module->startup(sapi_module)

The second node: Line 424 in php_cli. C

php_module_startup(sapi_module, NULL, 0)

The third node: Line 2123 in main. C

zend_startup(&zuf, NULL);

The fourth node: Line 768 in Zend. C

zend_interned_strings_init();

It’s close
The fifth node: 103 in zend_string. C

zend_intern_known_strings(known_strings, (sizeof(known_strings)

Print here, know_strings. You can see that file, line, function, class, object, etc. and string are initialized here!

On the reference count in PHP string type

The corresponding declared address is in line zend_string. H 383

On the reference count in PHP string type

The literals have not been initialized yet, so these strings are not the same as the literals.

Summary

There are also many different cases of strings in PHP.

  • 1. For the directly hard coded string in the code, in the literal scale, the reference count is always 0, and it will not be destroyed until the whole script is executed.
  • 2. When the string, temporary string and reference count calculated in the execution phase are calculated normally, 1 will be added for each reference. Reclaim memory when reference count is 0.
  • 3. String of reference type. The reference count of multiple variables is calculated on the zend_reference. The string is referenced by zend_reference with a reference count of 1.
  • 4. Special strings created during PHP initialization will not be destroyed until the entire script is executed. The reference count is always 1.