What is the maximum length of literal quantity defined by string in Java

Time:2021-4-6

The bottom layer of string object in Java is stored in character array. Theoretically, the maximum length of char [] is the maximum value of int

Thinking:

First of all, string literal constants are maintained by the string class and can be determined at compile time (please refer to the string constant pool for details). Therefore, if the literal constant of string has a maximum length (let’s assume for the moment), and the literal constant we use exceeds this limit, then the compiler will be able to give an error message during compilation. Therefore, we can use IO stream to generate java file. The content of the file is to declare a string object, then assign a literal constant value, adjust the length of the literal constant according to the dynamic compilation result, and finally get the maximum length of the literal constant

Draw a conclusion based on the following code (the code comes from the book “Java in-depth analysis: 36 topics on the essence of Java”):

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStream;
 
import javax.tools.JavaCompiler;
import javax.tools.ToolProvider;
 
public class LiteralLength {
 
 public static void main(String[] args) throws Exception {
 String fileName = "D:/Literal.java";
 StringBuilder prefix = new StringBuilder();
 prefix.append("public class Literal{ String s = \"");
 int low = 0;
 int high = 100_0000;
 int mid = (low + high)/2;
 StringBuilder literal = new StringBuilder(high);
 
 int result;
 
 String ch = "A";
 JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
 //Replacing err of system with custom error output stream
 OutputStream err = new OutputStream() {
  
  @Override
  public void write(int b) throws IOException {
  
  
  }
 };
 
 int max = 0;
 for (int i = 0; i < mid; i++) {
  literal.append(ch);
 }
 while(low <= high){
  StringBuilder fileContent 
        = new StringBuilder(literal.length() + prefix.length() * 2);
  fileContent.append(prefix);
  fileContent.append(literal);
  fileContent.append("\";}");
  FileWriter w = new FileWriter(fileName);
  BufferedWriter bw = new BufferedWriter(w);
  bw.write(fileContent.toString());
  bw.close();
  w. Close(); // generate java file
  result = compiler.run(null, null, err, fileName);
  
  //Number of code points
  int codePointCount = literal.codePointCount(0, literal.length());
  If (result = = 0) {// 0 means there are no compilation errors
  low = mid + 1;
  mid = (low + high)/2;
  max = codePointCount;
  for (int i = codePointCount; i < mid; i++) {
   literal.append(ch);
  }
  System.out.println (length + max) 
            +"Compile successfully, increase the length to" + mid ");
  
  }else{
  //Compilation error, indicating that the literal amount is too long
  high = mid - 1;
  mid = (low + high)/2;
  System.err.println (length + codepointcount) 
            +"Compile failed, reduce the length to" + mid ");
  int start = ch.length() == 1? mid : mid *2;
  literal.delete(start, literal.length());
  }
 }
 err.close();
 System.out.println (maximum literal length: + max);
 
 
 }
}

Output results:

Length 500000 compilation failed, reduced to 249999
Length 249999 failed to compile. Reduce the length to 124999
Length 124999 compilation failed. Reduce the length to 62499
Length 62499 compiled successfully, increased to 93749
Length 93749 compilation failed, reduced to 78124
Length 78124 compilation failed, reduced to 70311
Length 70311 compilation failed, reduced to 66405
Length 66405 compilation failed, reduced to 64452
Length 64452 is compiled successfully. Increase the length to 65428
Length 65428 is compiled successfully. Increase the length to 65916
Length 65916 failed to compile, reduced to 65672
Length 65672 failed to compile. Reduce the length to 65550
Length 65550 compilation failed, reduced to 65489
Length 65489 compiled successfully. Increase the length to 65519
Length 65519 compiled successfully. Increase the length to 65534
Length 65534 compiled successfully. Increase the length to 65542
Length 65542 failed to compile. Reduce the length to 65538
Length 65538 compilation failed. Reduce the length to 65536
Length 65536 failed to compile. Reduce the length to 65535
Length 65535 failed to compile. Reduce the length to 65534
Maximum literal length: 65534

But if you change the code


String ch = "α";

Conclusion: the maximum literal length is 32767

If string ch = word;

Maximum literal length: 21845

In the class file, use constant_ Utf8_ Info table to store all kinds of constant strings, including string literal constants, fully qualified names of classes or interfaces, names and descriptors of methods and variables, etc. CONSTANT_ Utf8_ The structure of the info table is shown in the table.

According to table 3-1, constant_ Utf8_ The info table uses 2 bytes to represent the length of the string, so the maximum length of the bytes array is 216 − 1, which is 65535 bytes. However, why are the running results of four characters (“a”, “a”, “Zi” and “㊣”) different? The reason is that at constant_ Utf8_ In the info table, 1 byte is used to represent bytes from “\ \ u0001” ~ “\ \ u007f”; 2 bytes are used to represent null characters (null, that is “\ \ u0000”) and from “\ \ u0080” ~ “\ \ u07ff”; 3 bytes are used to represent bytes from “\ \ u0800” ~ “\ \ ufff”; 6 bytes are used to represent supplementary characters, that is, characters with code points ranging from “U + 10000” ~ “U + 10ffff” . It can also be said that the supplementary characters are represented by a proxy pair, and the value range of the proxy pair is “\ ud800” ~ “\ \ udfff”. These characters are between “\ \ u0800” ~ “\ \ uFFFF”. Each proxy character is represented by 3 bytes, a total of 6 bytes. The above storage is realized in the class file, which should not be confused with the characters in Java programs. For Java programs, “a”, “á” and “word” are represented by a char type variable, that is, 2 bytes, while “[illustration]” (supplementary character) is represented by two char type variables, that is, 4 bytes.

The maximum length of string literal constant is different from that of string in memory. The maximum length of the latter is the maximum value of int type, that is, 2147483647. The maximum length of the former is 65534 according to different characters (Unicode value of characters). You can manually modify the class file to make the output result 65535.

The maximum length of a string literal constant is determined by constant_ Utf8_ Info table. The length is determined at compile time. If it exceeds constant_ Utf8_ The upper limit of the bytes array in the info table will result in a compilation error.

The above is the whole content of this article, I hope to help you learn, and I hope you can support developer more.