Lessons from the past: talk about the “pit” trampled on the landing desktop of nail flutter | Dutter

Time:2022-6-10

Lessons from the past: talk about the

Author: liutaiju (Yuliang)

The “Dutter series of articles” will describe the technical practice and experience of nailing the cross four end application framework (code Dutter) built based on flutter, which is divided into two parts, the first part can be clickedScheme design and technical practice of Dutter 𞓜 nail flutter span four endsThank you for reading.

This article mainly introduces several flutterengine level bugs encountered and handled on the desktop during the grayscale process of nailing the flutter business. It includes:

  • Mac side:
  1. Memory leakage after flutterengine exits;
  2. Deadlock in the shutterengine shutdown phase;
  3. Crash problem in the destructor phase of low version MacOS OpenGL;
  • Windows side:
  1. “Crash + residual shadow” problem of win7 device rendering module;
  2. Flutterplugin registration stage wild pointer crash;
  3. The page is white after the visibility of the shutter window changes.

Let’s introduce them to you.

Flutterengine MAC side problem

1.1 memory leakage after flutterengine exits

Problem background

After the flutterviewcontroller on the MAC is destroyed, the memory created by it is not actually released, which may lead to memory leakage. This problem has been discussed in the shuttle issue, but has not been clearly defined. This problem is also encountered in the process of nailing the gray level of the flutter service on the Mac. If it cannot be handled, it will directly affect the feasibility of Dutter landing on the MAC:

Lessons from the past: talk about the

Location analysis

One sentence reason:

It is caused by unreasonable use of weak property in the implementation of flutterengine on the Mac. The flutterviewcontroller strongly holds the flutterengine, which holds a weak property pointing to the flutterviewcontroller. The flutterviewcontroller attempts to release the flutterengine in the dealloc process, but at this time, the weak property held in the flutterengine cannot be accessed correctly (NIL), resulting in the failure of normal execution of the release process and leakage.

The following is a brief description of the implementation.

Due to the design of OC and c++ object life cycle management, the internal object holding relationship of flutterengine is slightly special, as shown in the following figure:

Lessons from the past: talk about the

  • As the main class exposed to the outside world, the flutterviewcontroller is responsible for creating and holding flutterengine and flutterview;
  • Fluterengine will hold itself during initialization and release itself during shutdown;
  • Flutterengine will create and hold flutterrenderer, and flutterrenderer will strongly hold flutterview;
  • Flutterengine indirectly and strongly holds flutterview;
  • The flutterengine has a weak reference pointer to the flutterviewcontroller.

Normally, after the flutterviewcontroller exits, the flutterengine shutdown action will be triggered by calling the setviewcontroller of flutterengine to pass in nil. The reference implementation is as follows:

Lessons from the past: talk about the

Lessons from the past: talk about the

That is, under normal circumstances, 369 lines of code should be triggered after the flutterviewcontroller dealloc to release the flutterengine resources. However, this is not the case in actual operation. When the code runs to 359 lines, the attempt to judge if (\u viewcontroller! = controller) is not true. From the above code, we know that the controller is an externally passed in object, which is nil at this time_ As a weak proptry, viewcontroller becomes nil after flutterviewcontroller enters the dealloc process. Therefore, in this process, we hope that the shutdownengine method in is not called.

Treatment scheme

After the problem is located, the processing method is very simple. You can manually trigger the flutterengine shutdownengine method during the flutterviewcontroller dealloc. It can be implemented through the OC dynamic feature hook in the upper layer, or directly modify and recompile the flutterengine.

However, you must be careful when modifying here. Pay attention to completely restoring the shutdown process in the flutterengine. Otherwise, we may encounter the second problem: deadlock.

1.2 deadlock in the shutterengine shutdown phase

Problem background

Nailing initially adopted a relatively simple scheme to deal with the above problem of “flutterengine leakage”: in the flutterviewcontroller dealloc method, manually call the shutdownengine method provided by flutterengine to manually trigger the release of related resources.

Through this scheme, the memory does drop after the flutterviewcontroller exits, but it is found that the entire page is occasionally stuck in the grayscale. Through the simple analysis of the link with the problem and the cooperation with the violence test, we have restored the problem in the debug environment. Finally, it is initially confirmed that the UI thread and the raster thread are deadlocked. The thread status after the deadlock is roughly as follows.

UI thread status:

Lessons from the past: talk about the

Raster thread:

Lessons from the past: talk about the

Location analysis

One sentence reason:

It is caused by unreasonable calling the flutterengine shutdownengine method on the nail side. Before shutdownengine, you must first call the shutdown method of the flutterview to stop the rendering process. After the rendering process stops normally, you can enter the flutterengine resource release process. Otherwise, the above deadlock problem may occur.

Because this problem is caused by unreasonable nail calling, the specific abnormal causes will not be analyzed in depth. Interested students can refer to the above clues by themselves.

Treatment scheme

Complete the flutterengine release process in the upper layer. Call flutterview shutdown to stop the raster thread before calling flutterengine shutdownengine.

1.3 crash problem in the destruct phase of MacOS OpenGL in lower version

Problem background

This problem is followed by two problems. After solving problems 1 and 2, refer to the flutterengine shutdown process. After the flutterviewcontroller is destructed, the pin will do three things:

  1. Set the flutterview bound in the flutterrenderer to nil;
  2. Call the flutterview shutdown method;
  3. Call the flutterengine shutdownengine method.

After a series of processing, the test found that the memory leak and deadlock problems were basically cured. However, in the internal grayscale process, it is found that crash occurs on the lower version MacOS, and the stack is roughly as follows:

Lessons from the past: talk about the

Location analysis

One sentence reason:

Similar to problem 2, this problem is also introduced due to the leakage of nail treatment. It is roughly caused by the iteration of two factors. On the one hand, because the flutterview bound by the flutteropenglrenderer is reset, the OpenGL objects created in the embedded layer are released in advance; On the other hand, the implementation of lower version MacOS OpenGL is not perfect, and the key links are not protected in the destructor process, which leads to exceptions.

The following is a brief analysis of the exception related codes to avoid other students from encountering similar problems.

1. In the flutterengine setviewcontroller method, if it is in the release process, it will call the flutteropenglrenderer setflutterview method and pass in nil:

Lessons from the past: talk about the

2. When the input parameter of flutteropenglrenderer setflutterview method is nil, it will release its internally maintained nsopenglcontext object:

Lessons from the past: talk about the

3. The underlying implementation of flutterengine will execute flush when the grdirectcontext object is destructed. If the OpenGL related objects have been released at this time, crash will occur in lower versions of MacOS (10.11, 10.12):

Lessons from the past: talk about the

Treatment scheme

Because the problem is triggered by the upper code of the nail, the processing is relatively simple. Finally, we removed the flutterview empty action on all MAC devices using opengl rendering (before MacOS 10.14). That is, the final flutterviewcontroller release phase only performs the following two actions:

  1. Call the flutterview shutdown method;
  2. Call the flutterengine shutdownengine method.

Flutterengine windows side problem

2.1 “crash + residual shadow” problem of win7 device rendering module

Problem background

The background of this problem is a little complicated. If it is broken down, this problem should be divided into two sub problems.

The first problem is that crash caused by d3d11 occurs on some win7 devices (x86 + x64). The stack is roughly as follows:

Lessons from the past: talk about the

Due to the delay in locating the specific cause of this problem, and the official statement of flutter that their coverage of win7 devices is not perfect[reference]。 Therefore, we decided to customize the flutterengine slightly and force the “soft solution mode” to render the flutter page on old devices such as win7.

I thought this method could circumvent this problem, but unfortunately, this scheme exposed another bug in the flutterengine: when rendering pages in the “soft solution mode”, there is only a certain probability that the closing of the flutterviewcontroller will lead to the residual shadow of the windows desktop.

Location analysis

One sentence reason:

The main reason for this problem is that in the internal shutdown process of the flutterengine, the pointer of the flutterwindowsengine to the flutterwindowsview object is not modified in time, resulting in the occurrence of wild pointers in multi-threaded scenarios; Because the wild pointer causes the raster thread to output drawing frames to the flutterwindowsview when it has been destroyed, which leads to an exception.

When locating, we add auxiliary logs to speed up the problem locating process. By supplementing logs for key nodes, we quickly found suspicious points:

Lessons from the past: talk about the

The above figure shows the logs output by key nodes after a problem occurs. We can get the following key information through the log:

  1. Onbitmapsurfaceupdated is a member function of flutterwindowsview. However, when outputting the last two lines of onbitmapsurfaceupdated method, the destructor of flutterwindowsview has been executed (wild pointer);
  2. The last time onbitmapsurfaceupdated is executed, the window handle used for rendering is nullptr, which means that the rendered window (bound to the flutterwindowsview) can be released.

The window handle used for the final rendering is nullptr, which leads to the residual shadow problem.

Supplementary note: when calling a c++ member function, even if this is a wild pointer when calling, as long as this object is not accessed in the member function, no memory access exception (crash) will occur.

Treatment scheme

Modify the internal implementation of the flutterengine. When the flutterwindowsview is destructed in the softwarerender mode, set the null pointer of the flutterwindowsengine to it (because there will be abnormal output in the GPU mode, it has not been modified yet):

Lessons from the past: talk about the

In this way, it can be ensured that the tasks in the raster thread will not call back the rendering interface after the flutterwindowsview is destroyed:

Lessons from the past: talk about the

2.2 wild pointer crash in flutterplugin registration phase

Problem background

Many crashes occurred during the first and second gray phases of the “+ panel” business on the windows side of the nailing fleet version, and the overall crash rate of the client was as high as x%:

Lessons from the past: talk about the

Through simple analysis, the restore crash stack is roughly as follows:

Lessons from the past: talk about the

Two important messages can be reached from the stack:

  1. Crash occurs in the flutterengine initialization phase. Specifically, an exception occurs during plugin registration;
  2. The cause of crash is the problem of wild pointer.

Location analysis

One sentence reason:

The wrapper layer code provided by flutter for the windows platform contains an object pluginregistermanager designed as a singleton. Pluginregistrarmanager mainly serves flutterplugin registration. It is designed as a single instance. Internally, it maintains a mapping relationship between the flutterengine pointer and the Registrar through a map to ensure that the life cycle of the Registrar and the flutterengine is consistent. However, the code of the wrapper layer is compiled into pulgin DLL, resulting in each plugin The DLL contains a copy of the pluginregistermanager implementation, that is, the singleton mechanism fails. The problem is that the binding relationship in pluginregistrarmanager cannot be cleared correctly when the flutterengine is destructed, resulting in an invalid pointer address maintained internally and a crash when accessed again.

The following is a brief introduction to the analysis process. Through the violence test, we can reproduce the problem:

Lessons from the past: talk about the

It can be confirmed from the above figure that crash occurs because of the wild pointer of the flutterengine object. Further locate the source of engine pointer during plug-in registration, and finally locate it in the method of shuttle:: pluginregistermanager:: getinstance() – >getregistrar():

Lessons from the past: talk about the

Further analysis of the implementation in pluginregistrarmanager shows that the map + empty method is required inside getregistrar to maintain the relationship between the flutterengine address and the Registrar:

Lessons from the past: talk about the

Internally, the method will be registered in the underlying engine object through the flutterdesktoppluginregisterretdestructionhandler, which will be called when the flutterengine is destructed, and then the binding relationship will be unbound:

Lessons from the past: talk about the

The problem is that in this process, if the pluginregisterranager is not a real singleton, and the flutterengine can only maintain a valid onregisterrestroyed callback, the flutterengine addresses saved in some pluginregisterranager objects will not be cleared during the decomposition of the flutterengine, which will cause problems when used again.

Treatment scheme

Modify the pluginregistrarmanager implementation of the flutterengine wrapper layer to optimize the “Singleton” implementation scheme. The single instance lifecycle is managed from the lower layer to the lower layer, and the wrapper layer is only responsible for providing related services.

For details, please refer to:

Lessons from the past: talk about the

Lessons from the past: talk about the

2.3 page white screen after the visibility of the shutter window changes

Problem background

On the windows side shutter page, if the shutter window:

  • First, hide through ShowWindow (shutter_wnd, sw_hide);
  • Then it will be displayed through ShowWindow (shutter_wnd, swushownormal).

You will find that the contents of the shutter page cannot be displayed normally, and the canvas is blank. If the shutter page refresh is triggered by setstate or dragging the window after the white screen, the content can be rendered normally.

Location analysis

This problem is relatively clear. There is a bug in the implementation of flutter on the windows side. After the visibility of the window changes, you should start flush again to draw the latest view to the corresponding window. However, this process has not been implemented at present, resulting in the above problems.

Treatment scheme

This problem has been submitted to the issue. The nailing side temporarily bypasses this problem by means of upper layer compensation. After the visibility of the native window changes, we manually notify the shutter side to refresh the currently visible page, so as to trigger redrawing and avoid problems.

summary

The above are the main problems to be solved at the desktop end during the landing process of nail flutter. From our actual experience, although the support for windows has been officially released in the version of fluent v2.10. However, from the perspective of stability, the performance of flutter on the MAC is undoubtedly better than that of windows. If other teams want to make a try at the single end of the desktop with the flutter, we prefer to choose the MAC end, which is more advantageous than the windows end in terms of both the threshold of getting started and the performance stability.

Pay attention to [Alibaba mobile technology], Alibaba’s cutting-edge mobile dry goods & practice gives you thinking!

Recommended Today

CommunityToolkit.Mvvm-IOC

CommunityToolkit.Mvvm does not have IOC built in, you can use Microsoft.Extensions.DependencyInjection. Register ViewModel and other services in App public partial class App : Application { public App() { Services = ConfigureServices(); this.InitializeComponent(); } public new static App Current => (App)Application.Current; public IServiceProvider Services { get; } private IServiceProvider ConfigureServices() { var sc = new ServiceCollection(); […]