Case analysis of class conflict in Java multi class loader

Time:2021-4-26

As we all know, the loading mechanism of JVM class adopts the parental delegation mechanism. However, in some frameworks, in order to provide some form of “isolation and sandbox”, a custom framework calledChildFirstIn short, it destroys the parental delegation. The self-defined child class loader loads the class first instead of delegating it to the parent class loader. Because the same class can be loaded separately in different class loaders, usingChildFirstClass loader can form a “sandbox” to run two same but different versions of classes in the program at the same time.

However, the author encountered a rare case of class loading conflict, root cause andChildFirstIt is related to the mechanism.

cause

The program runs on the Flink platform and writes the data to es. after the security mechanism of a certain platform is turned on, the whole platform including es needs to be accessed based on Kerberos authentication. Based on a priori conclusion, we need to replace the magicelasticsearch-rest-clientThe GSSAPI is used to log in to Kerberos, and based on sengpo protocol, the token class is sent through HTTP, and there is an independent thread to refresh the token. Because it is a private jar, it is very unfriendly to use in the project, so we consider using Maven shade plugin to exclude the previously dependent jars when packaging jobselasticsearch-rest-clientandelasticsearch-rest-highlevel-clientAnd put the customized jar in the Flink / lib directory.

report errors

After submitting the job, task manager reports an error and exits as follows:

java.lang.LinkageError: loader constraint violation: when resolving method "org.elasticsearch.client.RestClient.builder([Lorg/apache/http/HttpHost;)Lorg/elasticsearch/client/RestClientBuilder;" the class loader (instance of org/apache/flink/util/ChildFirstClassLoader) of the current class, org/apache/flink/streaming/connectors/elasticsearch6/Elasticsearch6ApiCallBridge, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/elasticsearch/client/RestClient, have different Class objects for the type [Lorg/apache/http/HttpHost; used in the signature

The error description means:

org/apache/http/HttpHostAt the same timeChildFirstClassLoaderandAppClassLoaderLoad in theorg.elasticsearch.client.RestClient.builderWhen it comes inorg/apache/http/HttpHostInstance, the class in the method signature (formal parameter) is foundorg/apache/http/HttpHostbelong toAppClassLoaderAnd the argument isorg/apache/http/HttpHostBut it belongs to meChildFirstClassLoaderAnd that caused the conflict.

The following two articles can be referred to:

https://www.cnblogs.com/deepnighttwo/archive/2011/08/31/2160990.html

https://bigzuo.github.io/2017/03/19/java-LinkageError-loader-constraint-violation-error/

Cause analysis

Based on the following facts:

  1. The httphost class is typed in the job package.
  2. When the TM process of Flink starts, httpcore.x.x.jar under Hadoop will be added to classpath
  3. Appclassloader is responsible for loading the classes in the classpath parameter
  4. Childfirstclassloader is responsible for loading the classes in the job package

Incident code

Case analysis of class conflict in Java multi class loader

analysis:

The argument httphosts is a list < httphost > serialized to TM and a private property of elasticsearch6apicallbridge (in the job package). This means that the classes of the argument httphosts are preferentially loaded by childfirstclassloader, and based on fact 1, childfirstclassloader can be loaded into httphost

The restclient class is located in elasticsearch rest client, that is, in the Flink / lib directory. Since elasticsearch rest client will not be typed in when we package, childfirstclassloader cannot be loaded into this class. It can only be loaded by appclassloader. Moreover, due to fact 2, appclassloader can also be loaded into httphost

In this way, the above error will appear

Solution

At first, we changed Flink toparent-firstIt can be solved. After analysis: This is becauseParentFirstClassLoaderInstead of loading the httphost from the job package, the appclassloader loads the httphost, so there is no conflict.

From the above root cause analysis, another solution is to package elasticsearch rest client and other related jars into the job to ensure that they are all loaded by childfirstclassloader. However, the introduction of private jar in this way leads to the confusion of version management.

Flink support classloader.parent -first- patterns.additional Under the premise of child first, the parent first configuration for some classes is effective. However, due to the complex relationship of class loading, it can not be exhaustive.