Encoding and decoding of Java IO stream


Encoding and decoding

Understanding of code table:

In the computer, no matter any array of transmission, storage, persistence, are in the form of binary, so when I save a character, the computer needs to persist to the hard disk, or save in memory. At this time, the data stored in memory and hard disk is obviously binary. So when I need to extract these characters from the hard disk and memory, and then display them, why does binary become characters?

This is the meaning of code table

1. The code table is actually a table of mutual mapping between a character and its corresponding binary.

2. This table specifies the mapping relationship between character and binary.

3. When the computer stores characters, it will query the code table and then store the corresponding binary.

4. When the computer takes out the characters, it will query the binary code table, and then convert it into the corresponding character display.

It can be roughly understood as follows:

1. Different code tables hold different character maps

2. In some code tables, a character takes up one byte, and the range that a byte can represent is – 128 to 127, with a total of 256. 256 character mapping

3. In some code tables, one character takes up two or even three bytes, so it can hold more character maps

Common code table:

American code table, code table only English upper and lower case letters, numbers, American punctuation, etc. Each character takes up one byte, and the binary of all character mappings is positive, so there are 128 character mappings.


It is compatible with ASCII code table, and Chinese characters are added. The code table contains English uppercase and lowercase letters, numbers, American punctuation marks, accounting for one byte, and Chinese for two bytes. The binary of Chinese mapping is negative, so it has 128×128 = 16384 character mapping relationship.


Compatible with GB2312 code table, English uppercase and lowercase letters, numbers, American punctuation, one byte. Chinese takes up two bytes. The first byte is negative, and the second byte is positive and negative. Therefore, there is 128 * 256 = 32768 character mapping relationship.

Unicode code table:

The international code table contains most commonly used characters in various countries. Each character takes up 2 bytes, so there are 65536 character mapping relationships. The Java language uses the Unicode code table

UTF-8 code table: (an implementation of Unicode code table)

It is also an international code table, but English takes up one byte and Chinese three bytes


Encoding and decoding:


Coding is the process of information transforming from one form or format to another, which is called coding

To put it simply: coding is to transform the information that you can understand into the information that you can’t understand through the coding table.


It’s the reverse of coding.

In short, decoding is to transform the information that you can’t understand into the information that you can understand through the encoding table

Note: in the process of development, the server and client coding should be consistent, and both parties should agree on a coding table in advance

Reasons for garbled Code:

1. Artificial conversion

2. The server and client code tables are inconsistent

3. The server system code is inconsistent with the human code

4. The URL code is inconsistent with the human code

This work adoptsCC agreementReprint must indicate the author and the link of this article

Recommended Today

Large scale distributed storage system: Principle Analysis and architecture practice.pdf

Focus on “Java back end technology stack” Reply to “interview” for full interview information Distributed storage system, which stores data in multiple independent devices. Traditional network storage system uses centralized storage server to store all data. Storage server becomes the bottleneck of system performance and the focus of reliability and security, which can not meet […]