“The heat dissipates from the mind, and the cool life makes the room empty” – those things of Linux temperature control

Time:2021-8-4

1、 Background

With the rapid development of science and technology, with the stronger and stronger performance of the equipment, the heat generated by each device in the equipment is also higher and higher. The heating of mobile devices is an important factor affecting the user experience. SOC and other hardware chips will also cause system instability due to overheating, and even reduce the chip life. “How to cool the device” has become an important topic.

The mobile terminal has a compact structure, and the internal space can be said to be an inch of land and an inch of gold, which makes the conventional hardware heat dissipation means such as air cooling and water cooling on the desktop useless in the mobile phone, and the software temperature control has become the key weapon to control the heating of the equipment. After all, if you can’t “cool with a small fan and long summer”, you have to give full play to the subjective initiative of the operating system, “heat is dissipated from the mind, cool is the room empty”, reduce unnecessary activities and control your own heat. Next, let’s take a look at what Linux has done to cool down.

2、 Linux temperature control framework

image

Linux thermal framework is a set of architecture related to temperature control under Linux system. It is mainly used to control the heat generated by various devices during the operation of the system and maintain the equipment temperature in a safe and comfortable range.

From the perspective of different levels of the system, it can be divided into the following three parts:

  • Userspace: expressed as a sysfs file node. The path is sys / class / thermal /, thermal_ The zone device is thermal_ Zone [n] file directory, cooling_ The device is cooling_ Device [n] file directory. The software in user space can access the thermal class file to obtain the current temperature and temperature control trigger point of each temperature zone. If it has some permissions, it can even change the temperature control strategy by setting the state value under the cooling equipment. The role and significance of each file in the directory will be detailed in the next section.

  • Kernel space: the core is thermal_ core。 The device that gets the temperature is abstracted as thermal_ zone_ Device, the device controlling the temperature is abstracted as thermal_ cooling_ Device, the temperature control strategy is abstracted as thermal_ governor。

  • Hardware: thermal_ Zone – > tsens0, tsens1,…, temperature control sensors and thermistors on hardware are abstracted as temperature zones by software; cooling_ Device – > CPU, GPU, battery,…, the IP that can achieve temperature control by adjusting its own state is abstracted by the software as a cooling device.

3、 Thermal zone and cooling device

image

Figure 2 shows the layout of temperature sensors of SOC of a mobile terminal. There are 28 sensors on the chip to monitor the current temperature of each subsystem. Similarly, the PCB also contains multiple thermistors (NTCs), which can obtain the temperature of each area on the mobile phone motherboard through algorithm calculation.

The software describes these devices that can obtain temperature such as tsensor and NTC as thermal zone, which is described in the form of DTS in the code.

  • Example code:

image

  • Polling delay passive: polling cycle when temperature control occurs.

The above configuration is 0, which means that the temperature control is triggered through tsensor interrupt without polling.

  • Polling delay: polling cycle when temperature control does not occur.

  • Thermal governor: the algorithm used when temperature control occurs in this temperature zone.

Select “user” above_ Space algorithm, which will be described in detail in the next section.

  • Thermal sensors: the corresponding tsensor.

The above configuration of “tsens0 1” represents channel 1 using tsens0 as the temperature sensor.

  • Trips: temperature control trigger point.

Where “active config0” is the temperature control trigger point of the temperature zone.

Temperature is the trigger temperature, and the above configuration is (125000) / 1000 = 125 ° for temperature control;

Hysteresis is the hysteresis temperature. The above configuration is “1000”, which means that the temperature control is released when the temperature in this temperature zone drops to (125 – 1000 / 1000) = 124 degrees.

The type is configured as “passive”, that is, when the temperature control occurs, the polling cycle is changed to polling delay passive.

The above is thermal_ The description form of zone in the coding stage. When the operating system is running, thermal_ The zone is presented as a sysfs file in user space.

image

  • available_ Policies: optional temperature control algorithm.

  • Type: name of the temperature zone.

DTS node names such as “aoss-0-usr” and “cpu-0-0-usr” above.

  • Temp: current temperature of the temperature zone.

trip_ point_ 0_ type/trip_ point_ 0_ temp/trip_ point_ 0_ HYST: name of trigger point 0 / trigger temperature / hysteresis temperature.

Such as “active config 0”, “temperature”, “hypersteresis” above.

Now we have used tsensor and NTC as thermal_ Registered in the system in the form of zone, the heating status of each subsystem of the equipment can be monitored in real time. When the temperature exceeds the set threshold, an interrupt will be reported, followed by the performance time of the cooling equipment. These devices that can achieve cooling effect by controlling their own state are abstracted as cooling by the operating system_ Device. Drivers such as cpufreq and devfreq will call of during initialization_ The interface provided by thermal. C is in thermal_ Register cooling device in core.

  • Example code:


    image

  • Polling delay passive: polling cycle when temperature control occurs.

The above configuration is 0, which means that the temperature control is triggered through tsensor interrupt without polling.

  • Polling delay: polling cycle when temperature control does not occur.

  • Thermal governor: the algorithm used when temperature control occurs in this temperature zone.

Select “user” above_ Space algorithm, which will be described in detail in the next section.

  • Thermal sensors: the corresponding tsensor.

The above configuration of “tsens0 1” represents channel 1 using tsens0 as the temperature sensor.

  • Trips: temperature control trigger point.

Where “active config0” is the temperature control trigger point of the temperature zone.

Temperature is the trigger temperature, and the above configuration is (125000) / 1000 = 125 ° for temperature control;

Hysteresis is the hysteresis temperature. The above configuration is “1000”, which means that the temperature control is released when the temperature in this temperature zone drops to (125 – 1000 / 1000) = 124 degrees.

The type is configured as “passive”, that is, when the temperature control occurs, the polling cycle is changed to polling delay passive.

The above is thermal_ The description form of zone in the coding stage. When the operating system is running, thermal_ The zone is presented as a sysfs file in user space.

image

  • available_ Policies: optional temperature control algorithm.

  • Type: name of the temperature zone.

DTS node names such as “aoss-0-usr” and “cpu-0-0-usr” above.

  • Temp: current temperature of the temperature zone.

  • trip_ point_ 0_ type/trip_ point_ 0_ temp/trip_ point_ 0_ HYST: name of trigger point 0 / trigger temperature / hysteresis temperature.

Such as “active config 0”, “temperature”, “hypersteresis” above.

Now we have used tsensor and NTC as thermal_ Registered in the system in the form of zone, the heating status of each subsystem of the equipment can be monitored in real time. When the temperature exceeds the set threshold, an interrupt will be reported, followed by the performance time of the cooling equipment. These devices that can achieve cooling effect by controlling their own state are abstracted as cooling by the operating system_ Device. Drivers such as cpufreq and devfreq will call of during initialization_ The interface provided by thermal. C is in thermal_ Register cooling device in core.

  • Example code:


    image

  • cooling-maps:

List of cooling equipment corresponding to this temperature zone.

  • cpu00_cdev:

Name of cooling equipment.

  • Trip: the temperature control trigger point of the temperature zone corresponding to the cooling equipment.

Above, “cpu00″_ CDEV “the corresponding trigger point is” cpu00_ Config “, that is, when the temperature zone of” cpu-0-0-step “reaches 110 degrees, the cooling operation is triggered.

  • Cooling device: the corresponding equipment that actually performs cooling operation and its maximum / minimum state. The format is < phase of device, min_ state,max_ state>。

Configured as “cpu0” above_ Isolate 11 “, that is, when the trigger point is reached, isolate cpu0.

When the operating system is running, cooling_ Device is also presented in user space as a sysfs file

image

  • cur_ State: the cooling_ Current cooling state of device.

  • max_ State: the cooling_ The maximum cooling state of the device.

  • Type: the name of the cooling device.

4、 Thermal governor (temperature control algorithm)

Thermal governor is the temperature control algorithm, which solves the problem of how to select cooling state when temperature control occurs.

Currently available governors include:

  • bang_bang

  • step_wise

  • low_limits

  • user_space

  • power_allocator

  • bang_bang governor:

Due to bang_ Bang governor is an algorithm used in devices that use fans to dissipate heat.

First, we need to determine whether the throttle, i.e. temperature control, is triggered. This includes two cases: the first is that the current temperature is greater than the temperature control threshold, and the second is that the current temperature is less than the temperature control threshold but greater than the lag temperature (temperature control release temperature) and is in the process of cooling.

image

bang_ The cooling strategy of bangovernor is as simple as its name, which can be summarized in one sentence:

When throttle occurs, turn on the fan; When the throttle is released, turn off the fan.

  • step_wise governor:

step_ In the process of calculating target cooling state, wise algorithm not only needs to know whether to throttle, but also adds a reference bar: trend. Trend, as its name implies, is the trend of temperature rise. The Linux thermal framework defines three trend types, namely rising, dropping and stable.

image

step_ Wise governor for cooling_ State selected policy:

When the throttle occurs and the temperature rise trend is rising, use a higher level of cooling state;

When the throttle occurs and the temperature rise trend decreases, the cooling state does not change;

When the throttle is released and the temperature rise trend is rising, the cooling state does not change;

When the throttle is released and the temperature rise trend is decreasing, use the cooling state of the lower level;

step_ Wise governor is a relatively mild temperature control strategy to improve the cooling state step by step in each polling cycle.

  • low_limit governor:

When the temperature of mobile devices is relatively low, there will also be problems such as inability to charge, so low_ Limit governor came into being. This special temperature control algorithm is used for equipment heating in low temperature environment.

Its temperature control strategy is basically a reverse step_ Wise, there is no further description here. Interested students can view the kernel source code by themselves.

  • user_space governor:

user_ Spacegovernor reports the current temperature of the temperature zone, temperature control trigger point and other information to the user space through uevent, and the user space software formulates the temperature control strategy.

  • power_allocator governor:

power_ Allocatorgovernor, namely IPA algorithm, was submitted by arm in 2015 and incorporated into the Linux kernel mainline.

The core of IPA (intelligent power allocator) model is to adjust the frequency and voltage of allocator by using PID controller, temperature of thermal zone as input and distributable power consumption value as output. Due to space constraints, the specific temperature control strategy will not be described in detail.

5、 Future development direction of Linux thermal

How to control the heating of mobile terminals and achieve an excellent balance between performance and power consumption has always been the direction of continuous efforts of major mobile chip and terminal manufacturers; In the open source community, temperature control algorithms such as IPA have been evolving; It is believed that future mobile terminal products will perform better and better in terms of heating.