[reprint] JIT introduction to new features of PHP 8

Time:2021-8-11

Reprinted from brother bird’s blog, original address:   https://www.laruence.com/2020/06/27/5963.html

Php8 alpha1 was released yesterday. I believe that JIT is the most concerned thing. How to use it, what to pay attention to, and how to improve the performance?

First, let’s look at a picture:

The left figure is the opcache flow diagram before php8 (Zend engine interprets and executes each time), and the right figure is the opcache diagram in php8 (Zend engine directly executes machine code). Several key points can be seen:

  • Opcache will be optimized at the opcode level. For example, two opcodes in the figure are merged into one
  • The JIT of php8 is currently provided in opcache
  • On the basis of opcache optimization, JIT is optimized again in combination with runtime information to directly generate machine code
  • JIT is not a substitute for the original opcache optimization, but an enhancement
  • At present, php8 only supports x86 CPUs (I found it when compiling)

In fact, JIT shares many basic data structures optimized by opcache, such as data flow graph, call graph, SSA, etc. for this part, if you have time later, you can write a separate article to introduce it. Today, we just focus on the use level.

After downloading and installing, in addition to the original opcache configuration, we need to add the following configuration to php.ini for JIT:

opcache.jit=1205
opcache.jit_buffer_size=64M

Opcache.jit seems a little complicated. Let me explain that this configuration consists of four independent numbers, from left to right (please note that this is based on the current version setting of alpha1, and some configurations may be fine tuned with subsequent versions):

  • Whether the first digit uses AVX instruction when generating machine code points requires CPU support
0: not used
1: Use

Second digital register allocation strategy:

0: do not use register allocation
1: Local (block) domain allocation
2: Global function domain assignment

The third number is the JIT trigger strategy

0: JIT when PHP script is loaded
1: JIT when the function is first executed
2: Percent of the maximum number of JIT calls after one run (opcache. Prof)_ Function of threshold * 100)
3: When function / method execution exceeds n (n) and opcache.jit_ hot_ Func (related) JIT after times
4: JIT a function method when its annotation contains @ JIT
5: When a trace is executed more than n times (and opcache.jit_ hot_ loop, jit_ hot_ Return, etc.) after JIT

The fourth number is the JIT optimization strategy. The larger the number, the greater the optimization effort

0: no JIT
1: Do the JIT of the jump part between oplines
2: Introverted opcode handler call
3: JIT at function level based on type inference
4: Based on type inference, the procedure call graph does function level JIT
5: Based on type inference, the procedure call graph is used as a script level JIT

Based on this, we can draw the following conclusions:

  • The configuration item of opcache.jit should use 12×5 configuration as much as possible, and the effect should be the best at this time
  • For X, if it is script level, it is recommended to use 0. If it is web service type, you can select 3 or 5 according to the test results
  • @After having attributes, the form of JIT may become<>

Now, let’s test the difference between Zend / bench.php when JIT is enabled and not enabled. The first is not enabled (PHP – D opcache. JIT)_ buffer_ size=0 Zend/bench.php):

Results not enabled:

simple             0.008
simplecall         0.004
simpleucall        0.004
simpleudcall       0.004
mandel             0.035
mandel2            0.055
ackermann(7)       0.020
ary(50000)         0.004
ary2(50000)        0.003
ary3(2000)         0.048
fibo(30)           0.084
hash1(50000)       0.013
hash2(500)         0.010
heapsort(20000)    0.027
matrix(20)         0.026
nestedloop(12)     0.023
sieve(30)          0.013
strcat(200000)     0.006
------------------------
Total              0.387

According to the above description, we choose opcache. JIT = 1205 because bench.php is a script (PHP – D opcache. JIT)_ buffer_ size=64M -d opcache.jit=1205 Zend/bench.php):

The result of enabling is:

simple             0.002
simplecall         0.001
simpleucall        0.001
simpleudcall       0.001
mandel             0.010
mandel2            0.011
ackermann(7)       0.010
ary(50000)         0.003
ary2(50000)        0.002
ary3(2000)         0.018
fibo(30)           0.031
hash1(50000)       0.011
hash2(500)         0.008
heapsort(20000)    0.014
matrix(20)         0.015
nestedloop(12)     0.011
sieve(30)          0.005
strcat(200000)     0.004
------------------------
Total              0.157

It can be seen that for Zend / bench.php, compared with not opening JIT, the time consumption is reduced by nearly 60% and the performance is improved by nearly twice.

For your research and learning, you can use opcache.jit_ Debug to observe the assembly results generated after JIT, for example:

function simple() {
  $a = 0;
  for ($i = 0; $i < 1000000; $i++)
    $a++;
}

We use PHP – D opcache. JIT = 1205 – dopcache.jit_ Debug = 0x01, you can see:

JIT$simple: ; (/tmp/1.php)
     sub $0x10, %rsp
     xor %rdx, %rdx
     jmp .L2
.L1:
     add $0x1, %rdx
.L2:
     cmp $0x0, EG(vm_interrupt)
     jnz .L4
     cmp $0xf4240, %rdx
     jl .L1
     mov 0x10(%r14), %rcx
     test %rcx, %rcx
     jz .L3
     mov $0x1, 0x8(%rcx)
.L3:
     mov 0x30(%r14), %rax
     mov %rax, EG(current_execute_data)
     mov 0x28(%r14), %edi
     test $0x9e0000, %edi
     jnz JIT$$leave_function
     mov %r14, EG(vm_stack_top)
     mov 0x30(%r14), %r14
     cmp $0x0, EG(exception)
     mov (%r14), %r15
     jnz JIT$$leave_throw
     add $0x20, %r15
     add $0x10, %rsp
     jmp (%r15)
.L4:
     mov $0x45543818, %r15
     jmp JIT$$interrupt_handler

You can try to read this compilation. For example, for the increment of I, you can see that the optimization is very strong. For example, because I is a local variable and is directly allocated in the register, the range inference of I will not be greater than 1000000, so you don’t need to judge whether integer overflow, etc.

If we use opcache. JIT = 1005, as described earlier, that is, we do not use register allocation, we can get the following results:

JIT$simple: ; (/tmp/1.php)
     sub $0x10, %rsp
     mov $0x0, 0x50(%r14)
     mov $0x4, 0x58(%r14)
     jmp .L2
.L1:
     add $0x1, 0x50(%r14)
.L2:
     cmp $0x0, EG(vm_interrupt)
     jnz .L4
     cmp $0xf4240, 0x50(%r14)
     jl .L1
     mov 0x10(%r14), %rcx
     test %rcx, %rcx
     jz .L3
     mov $0x1, 0x8(%rcx)
.L3:
     mov 0x30(%r14), %rax
     mov %rax, EG(current_execute_data)
     mov 0x28(%r14), %edi
     test $0x9e0000, %edi
     jnz JIT$$leave_function
     mov %r14, EG(vm_stack_top)
     mov 0x30(%r14), %r14
     cmp $0x0, EG(exception)
     mov (%r14), %r15
     jnz JIT$$leave_throw
     add $0x20, %r15
     add $0x10, %rsp
     jmp (%r15)
.L4:
     mov $0x44cdb818, %r15
     jmp JIT$$interrupt_handler

You can see that the part for I is now operating in memory and does not use registers.

If we use opcache. JIT = 1201, we can get the following results:

JIT$simple: ; (/tmp/1.php)
     sub $0x10, %rsp
     call ZEND_QM_ASSIGN_NOREF_SPEC_CONST_HANDLER
     add $0x40, %r15
     jmp .L2
.L1:
     call ZEND_PRE_INC_LONG_NO_OVERFLOW_SPEC_CV_RETVAL_UNUSED_HANDLER
     cmp $0x0, EG(exception)
     jnz JIT$$exception_handler
.L2:
     cmp $0x0, EG(vm_interrupt)
     jnz JIT$$interrupt_handler
     call ZEND_IS_SMALLER_LONG_SPEC_TMPVARCV_CONST_JMPNZ_HANDLER
     cmp $0x0, EG(exception)
     jnz JIT$$exception_handler
     cmp $0x452a0858, %r15d
     jnz .L1
     add $0x10, %rsp
     jmp ZEND_RETURN_SPEC_CONST_LABEL

This is just a simple introverted part of the opcode handler call.

You can also try various opcache.jit strategies and debug configuration to observe the different results, or you can try various opcache.jit_ The debug configuration, such as 0xff, will have more auxiliary information output.

Recommended Today

The abstractmethoderror reported by springcloud Alibaba is caused by a version compatibility problem

When integrating Nacos, when using feignclient to request the interface, it will report abstractmethoderror. The first reaction is that this must be a version compatibility problem, but it also took a long time to solve the version compatibility. Try various versions The error is as follows: Caused by: java.lang.AbstractMethodError: org.springframework.cloud.netflix.ribbon.RibbonLoadBalancerClient.choose(Ljava/lang/String;Lorg/springframework/cloud/client/loadbalancer/Request;)Lorg/springframework/cloud/client/ServiceInstance; at org.springframework.cloud.openfeign.loadbalancer.FeignBlockingLoadBalancerClient.execute(FeignBlockingLoadBalancerClient.java:88) ~[spring-cloud-openfeign-core-3.0.3.jar:3.0.3] at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:119) […]