☕[ Java technology guide] (1) introduction to the structure of class files (Part I)

Time:2021-9-2

Background introduction

After the Java source code is compiled into a class file, the class file structure is an important basis for JVM to load classes, instantiate objects, and make method calls


Each class file is composed of 8-byte byte streams. All 16 bit, 32-bit and 64 bit data will be constructed into 2, 4 and 8 8-byte units. Multibyte data items are always stored in the order of big endian1. In the Java SDK, accessing data in this format can be implemented using interfaces such as java.io.datainput and java.io.dataoutput, and classes such as java.io.datainputstream and java.io.dataoutputstream.


The contents of the class file can be represented by a set of private data types, including U1, U2 and U4, which represent the unsigned numbers of 1, 2 and 4 bytes respectively. In the Java SDK, these types of data can be read by implementing the readunsigned byte, readunsigned short and readint methods in the java.io.datainput interface.


Classfile structure each class file corresponds to a classfile structure as shown below, and its attributes are shown in the following table:

Several questions

  1. What is the difference between a runtime constant pool and a static constant pool?
  2. What’s in the class file?
  3. What are the formats of the class file after disassembly? Try to interpret the assembly instructions in the methods inside
  4. How do local variable tables and operand stacks work?

View class file

View class files in hexadecimal

skill:

VIM + XXD = hex editor VIM + XXD = hex editor

  • VIM – B xxx.class can open the class file in binary;
  • Call in VIM::%! XXD displays the current file in hexadecimal;
  • After the modification, if you want to save, execute the following command to convert hexadecimal back to binary -:%! xxd -r

The output includes line number, local variable disassembly and other information

javap

  • -V – verbose: output additional information (including line number, local variable table, disassembly, etc.)
  • -c: Disassemble the code

    For example:
  • javap -c xxx.class
  • javap -verbose Test.class

More about javap:docs.oracle.com/javase/7/docs/tech…

About disassembly:

Disassembly: the process of converting object code into assembly code. It can also be said that it means converting machine language into assembly language code and converting low-level to high-level. All the mysterious operating mechanism of the software is in the disassembly code.

Class file interpretation

Interpretation of class file in JVM specification


ClassFile {

A class file is a set of binary streams based on 8-bit bytes. Class structures have two data types:

  • Unsigned number: an unsigned number belongs to the basic attribute type. U1, U2, U4 and U8 are used to represent the unsigned number of 1 byte, 2 bytes, 4 bytes and 8 bytes respectively,It can be used to describe numbers, index references, quantity values, or utf8 encoded string values;
  • Table:A composite data type consisting of multiple unsigned numbers or other tables as data items, named by_ End of info.
    cafe babe 0000 0034 001d 0a00 0600 0f09
    0010 0011 0800 120a 0013 0014 0700 1507
    0016 0100 063c 696e 6974 3e01 0003 2829
    5601 0004 436f 6465 0100 0f4c 696e 654e
    756d 6265 7254 6162 6c65 0100 046d 6169
    6e01 0016 285b 4c6a 6176 612f 6c61 6e67
    2f53 7472 696e 673b 2956 0100 0a53 6f75
    7263 6546 696c 6501 0008 4c6f 672e 6a61
    7661 0c00 0700 0807 0017 0c00 1800 1901
    000c 6865 6c6c 6f20 776f 726c 6421 0700
    1a0c 001b 001c 0100 1263 6f6d 2f68 656c
    6c6f 2f74 6573 742f 4c6f 6701 0010 6a61
    7661 2f6c 616e 672f 4f62 6a65 6374 0100
    106a 6176 612f 6c61 6e67 2f53 7973 7465
    6d01 0003 6f75 7401 0015 4c6a 6176 612f
    696f 2f50 7269 6e74 5374 7265 616d 3b01
    0013 6a61 7661 2f69 6f2f 5072 696e 7453
    7472 6561 6d01 0007 7072 696e 746c 6e01
    0015 284c 6a61 7661 2f6c 616e 672f 5374
    7269 6e67 3b29 5600 2100 0500 0600 0000
    0000 0200 0100 0700 0800 0100 0900 0000
    1d00 0100 0100 0000 052a b700 01b1 0000
    0001 000a 0000 0006 0001 0000 0003 0009
    000b 000c 0001 0009 0000 0025 0002 0001
    0000 0009 b200 0212 03b6 0004 b100 0000
    0100 0a00 0000 0a00 0200 0000 0500 0800
    0600 0100 0d00 0000 0200 0e

    According to the above class file structure, we can sort out the following class file structure diagram:

Magic number

It is used to identify the format of this file. The magic number of class file format is 0xcafebabe.

The only function of magic number is to determine whether the file is a class file that can be accepted by the virtual machine. The magic value is fixed to 0xcafebabe and will not change. The only function of magic number is to determine whether the file is a class file that can be accepted by the virtual machine. The magic value is fixed to 0xcafebabe and will not change.

Minor version number_ Version, major version number_ version

  • The 5th – 6th bytes of the class file represent the minor version number of the class file.
  • The 7th – 8th bytes of the class file represent the major version number of the class file.

The major version number and minor version number jointly determine the version of the class file format. If the major version number of a class file is m and the minor version number is m, the version of its class file format is expressed as M.M.

Therefore, you can sort the class file format versions in dictionary order, such as 1.5 < 2.0 < 2.1. minor_ Version and major_ The value of the version project is the minor version number and major version number of such files.


  • Java virtual machine instances can only support major version numbers (MI to MJ) in a specific range and minor version numbers (0 to m) in a specific range.
  • Assuming that the format version number of a class file is V, this class file can be supported by this Java virtual machine only when MI. 0 ≤ V ≤ MJ. M is established. Different versions of Java virtual machine implementation support different version numbers,Java virtual machine implementations with higher version numbers can support class files with lower version numbers

The following table lists the hexadecimal version number information of each version of JDK:



The above class file 0000 0034 corresponds to JDK1.8 in the table.

Constant pool counter constant_ pool_ count

Immediately following the version information is the constant pool information. The first two bytes represent the constant pool counter, and the subsequent variable length data represents the specific information of the constant pool. constant_ Index of pool table [1, constant_ pool_ count-1]

  • The constant pool describes all literal information in the entire class file. Constant pool counter_ pool_ The value of count is equal to the constant pool_ Pool) the number of entries in the table plus one.
  • If constant_ The pool index is greater than zero and less than constant_ pool_ Count, the index is considered valid. There are exceptions for long and double types.
  • In the constant pool of the class file, all 8-byte constants occupy the space of two table members (items). If a constant_ Long_ Info or constant_ Double_ If the index of the item of info structure in the constant pool is n, the index of the next valid item in the constant pool is n + 2. At this time, the item with index n + 1 in the constant pool is valid but must be considered unavailable.

The bytecode of the class file corresponds to 001d, whose value is 29, indicating that there are 29 – 1 = 28 constants in total.

Constant pool table constant_ pool[]

Immediately following the constant pool counter are 28 constants, because each constant corresponds to different types and needs to be analyzed one by one.


constant_ Pool [] is a structure table that represents various string constants, class and interface names, field names, and other constants referenced in the classfile structure and its substructures. Each constant_ The format of the pool table entry is indicated by its first “label” byte. All types of constant pool table entries have the following common formats:

cp_info {
    u1 tag;
    u1 info[];
}

In the constant pool, each CP_ The format of info items must be the same. They are all represented by a CP_ Info type starts with a single byte “tag” entry. The content tag of the following info [] item is determined by the type of. The valid types and corresponding values of tag are shown in the figure below. Each tag item must be followed by 2 or more bytes. These bytes are used to give the information of this constant. The information format of additional bytes is determined by the value of tag.

14 constant structures in constant pool

“>

These CP_ The info table structure has different data structures, and its corresponding data structure is shown in the figure below.

“>

Next, we begin to analyze the aboveLog.classThe meaning of each byte of the file has been stated in the first sentence above. The constant pool is immediately followed by the constant pool counter. Let’s start the analysis:

1st constant

The next byte of 001d is 0A, which is a decimal number 10. Looking up the table, we can see that it is a method reference type (constant_ Methodref_ Info). In CP_ The structure in info is as follows:

The search method is to determine the tag value first, and the tag value determines which constant it currently belongs to. Here the tag is 10.

Then look at its structure. It shows that there are two U2indexNote that the last four bytes belong to the first constant, in which the second to third bytes represent class information, and the fourth to fifth bytes represent name and class descriptor.

Next, we take out the data of this part: 0A 0600 000f:


This constant item:

The 2nd – 3rd byte, with a value of 00 06, indicates the information represented by the 6th constant pointing to the constant pool. According to the results of our analysis later, we know that the sixth constant is Java / Lang / object.


The 4th – 5th byte, with a value of 000f, indicates the information represented by the 15th constant pointing to the constant pool. According to the information decompiled by javap, the 15th constant is: () v.


Combine the two:java/lang/Object.:V, i.eObjectofinitInitialization method.

javap -v Log.class
Classfile /Users/xxx/Desktop/Log.class
  Last modified 2020-1-8; size 427 bytes
  MD5 checksum 745be5a6df4d9554e783dbbcecaf9b6d
  Compiled from "Log.java"
public class com.hello.test.Log
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #6.#15         // java/lang/Object."<init>":()V
   #2 = Fieldref           #16.#17        // java/lang/System.out:Ljava/io/PrintStream;
   #3 = String             #18            // hello world!
   #4 = Methodref          #19.#20        // java/io/PrintStream.println:(Ljava/lang/String;)V
   #5 = Class              #21            // com/hello/test/Log
   #6 = Class              #22            // java/lang/Object
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               main
  #12 = Utf8               ([Ljava/lang/String;)V
  #13 = Utf8               SourceFile
  #14 = Utf8               Log.java
  #15 = NameAndType        #7:#8          // "<init>":()V
  #16 = Class              #23            // java/lang/System
  #17 = NameAndType        #24:#25        // out:Ljava/io/PrintStream;
  #18 = Utf8               hello world!
  #19 = Class              #26            // java/io/PrintStream
  #20 = NameAndType        #27:#28        // println:(Ljava/lang/String;)V
  #21 = Utf8               com/hello/test/Log
  #22 = Utf8               java/lang/Object
  #23 = Utf8               java/lang/System
  #24 = Utf8               out
  #25 = Utf8               Ljava/io/PrintStream;
  #26 = Utf8               java/io/PrintStream
  #27 = Utf8               println
  #28 = Utf8               (Ljava/lang/String;)V
{
  public com.hello.test.Log();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3                  // String hello world!
         5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 5: 0
        line 6: 8
}
SourceFile: "Log.java"

In fact, it can be seen from the above results that the first constant corresponds to the 6th and 15th constants. The meaning of the combination is also written in the notes below.

Many other constants are similar. Let’s see how strings come from.

21st constant

The 21st constant, the data is

0100 1263 6f6d 2f68 656c 6c6f 2f74 6573 742f 4c6f 67

Here, the tag value is 01, and the corresponding structure is as follows:



Length is U2, corresponding to 0012. The description is followed by 18 bytes: 63 6f6d 2f68 656c 6c6f 2f74 6573 742f 4c6f 67; Look up the ASCII table and get 63-c, 6f-o, 6d-m, 2f – / ··· 4c-l, 6f-o, 67-g,

The combination is: COM / hello / test / log.

I believe that through the above two examples, you will know how to analyze the indexes in the constant pool. However, many times we can directly view the constant pool information of the class file with the help of the javap command provided by JDK, but manual analysis can let you better understand why the result is like this. In fact, what javap comes out is what others have analyzed and summarized.

access_ Flags access flag


After the constant pool ends, the next two bytes represent the access of the class or interface_ flags)。 The data here is 00 21.


access_ Flags is a mask flag used to indicate the access rights and basic attributes of a class or interface. access_ The value range and corresponding meaning of flags are shown in the table below:

☕[ Java technology guide] (1) introduction to the structure of class files (Part I)


  • The first column is the tagName;
  • The second column is the corresponding value;
  • The third column is the corresponding description.

haveACC_SYNTHETICFlag, which means that it is generated by the compiler itself, not by the source code written by the programmer.

haveACC_ENUMFlag means that it or its parent class is declared as an enumeration type.

haveACC_INTERFACEThe class of flag means that it is an interface rather than a class. On the contrary, it is a class rather than an interface.


If a class file is set to ACC_ Interface flag, ACC must also be set_ Abstract flag. At the same time, it can no longer set acc_ FINAL、 ACC_ Super and ACC_ Enum flag.


Annotation annotation type must haveACC_ANNOTATIONFlag, if the annotation flag is set,ACC_INTERFACEMust also be set at the same time. If not set at the same timeACC_INTERFACEMark, then the class file can have the division in table 4.1ACC_ANNOTATIONAll other marks except. of courseACC_FINALandACC_ABSTRACTExcept for such mutually exclusive tags.

ACC_ The super flag is used to determine the class in the class fileinvokespecialWhat kind of execution semantics does the instruction use. At present, the compiler of Java virtual machine should set this flag. ACC_ The super flag exists for backward compatibility with the class file compiled by the old compiler. In the class file generated by the compiler before jdk1.0.2, access_ There is no ACC in the flag_ Super flag. Meanwhile, the Java virtual machine before jdk1.0.2 encountered acc_ The super tag automatically ignores it.


There is no access used in the table_ The flags flag bit is reserved for future expansion. These reserved flags will be set to 0 in the compiler, and the Java virtual machine implementation will automatically ignore them.

Class index, parent class index, interface index

After accessing the tag, the data of class index, parent class index and interface index are displayed. Here, the data are 00 05, 00 06 and 00 00.

Both the class index and the parent class index are U2 type data, while the interface index set is a set of U2 type data, which can be obtained from the composition of the previous class file. The inheritance relationship of this class is determined by these three data in the class file.

this_ Class index

Class index, this_ The value of class must be constant_ A valid index value for an item in the pool table.

constant_ The entry of pool table at this index must be constant_ Class_ Info type constant, indicating the class or interface defined by this class file. The class index here is 00 05, indicating that it points to the fifth constant in the constant pool. Through our previous analysis, we know that the final information of the fifth constant is the log class.

super_ Class parent class index

super_classThe value must be 0 or constant_ A valid index value for an item in the pool table.


If its value is not 0, then constant_ The entry of pool table at this index must be constant_ Class_ Info type constant, which represents the direct parent class of the class defined in this class file.


The direct parent of the current class and access of all its indirect parents_ No ACC in flag_ Final tag. For an interface, its class file is super_ The value of the class item must be constant_ A valid index value for an item in the pool table.


constant_ The entry of pool table at this index must be constant representing java.lang.object_ Class_ Info type constant.


If the class file is super_ If the value of class is 0, the class file can only define the java.lang.object class, which is the only class without a parent class. The parent class index here is 00 06, which indicates that it points to the sixth constant in the constant pool. Through our previous analysis, we know that the final information of the sixth constant is the object class.


Because it does not inherit any classes, soDemoClass is the default object class.

interfaces_ Count interface counter

interfaces_ The value of count indicates the number of direct parent interfaces of the current class or interface.

Interfaces [] interface table

The value of each member in the interfaces [] array must be a pair of constant_ A valid index value of an item in the pool table, whose length is interfaces_ count。 Each member interfaces [i] must be constant_ Class_ Info type constant, where 0 ≤ I < interfaces_ count。

In the interfaces [] array, the order of interfaces represented by members is the same as that given in the corresponding source code (from left to right), that is, interfaces [0] corresponds to the leftmost interface in the source code.

In the bytecode file of the log class, because no interface is implemented, the two bytes immediately following the parent class index are 0x0000, which indicates that the class does not implement any interface. Therefore, the following interface index table is empty.


Unfinished to be continued

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Recommended Today

Java Engineer Interview Questions

The content covers: Java, mybatis, zookeeper, Dubbo, elasticsearch, memcached, redis, mysql, spring, spring boot, springcloud, rabbitmq, Kafka, Linux, etcMybatis interview questions1. What is mybatis?1. Mybatis is a semi ORM (object relational mapping) framework. It encapsulates JDBC internally. During development, you only need to pay attention to the SQL statement itself, and you don’t need to […]