Strings / characters chat in swift3.0

Time:2021-4-5

Strings / characters chat in swift3.0

preface

This article mainly analyzes the difference and simple usage of string characters between swift and Objective-C. If there is something wrong, I hope you can help correct it in time.

String null

In swift language, there are two ways to initialize empty string

//Method 1:
let testEmptyString0 = ""

//Method 2:
let testEmptyString1 = String()

In the development process, how should we use the correct way to deal with the empty string?

//Method 1: this method is actually judgment characters.count  Is it 0
if testEmptyString0.isEmpty {
    // empty
}

//Method 2:
if testEmptyString0.characters.count {
    // empty
}

//Method 3:
if (testEmptyString0 as NSString).length {
    // empty
}

String length calculation

Objective-C

First of all, let’s recall how the length of a string is calculated in Objective-C? I think everyone should know. Let’s see what Apple says

A string object is implemented as an array of Unicode characters (in other words, a text string). An immutable string is a text string that is defined when it is created and subsequently cannot be changed. To create and manage an immutable string, use the NSString class. To construct and manage a string that can be changed after it has been created, use NSMutableString.

A string object presents itself as an array of Unicode characters. You can determine how many characters it contains with the length method and can retrieve a specific character with the characterAtIndex: method.

After reading this passage, we all know how to implement nsstring and how to get its length. How to implement the length method? Apple official said: the length method uses the number of utf-16 code units in the receiver. What is utf-16? (for those who are interested in children’s shoes, please take a look at an article I wrote earlier,Character coding (1)), which is not detailed here.

Swift 3.0

Unicode scalar representation

In swift, characters and strings are based on Unicode scalar. They are encoded by 21 bit binary system, with a total of 17 planes (except the utf-16 proxy pairs in the Basic Multilingual plane, that is, the encoding space from U + d800 to U + dfff), that is, the encoding range is U + 0000-u + d7fff or u + e000-u + 10ffff.

A Unicode scalar is any Unicode code point in the range U+0000 to U+D7FF inclusive or U+E000 to U+10FFFF inclusive. Unicode scalars do not include the Unicode surrogate pair code points, which are the code points in the range U+D800 to U+DFFF inclusive.”

Therefore, in swift, we can directly use the form of Unicode scalar to represent characters or strings, such as:

Let tingc = "\ u {542C}" // listen

Let xinc = "\ u {5fc3}" // heart

Scalable glyph clusters (clusters)

In swift, each character type instance represents a single scalable glyph Cluster — a readable character consisting of a sequence of one or more Unicode scalars.

The Pinyin of Chinese character “ting” is t ī ng. Taking the letter ī as an example, it is expressed in two ways. First, it can be directly represented by a single Unicode scalar ī (Latin small letter I with macro), that is, U + 012b, which contains a Unicode scalar in the font cluster. Second, it can be represented by two Unicode scalars, a Latin small letter I and a scalar of combining macro account, namely U + 0069 U + 0304. In this way, when the letter I is rendered by the Unicode text rendering system, it will be converted into ī, and the font cluster contains two Unicode scalars.


let tingO = "t" + "\u{0069}" + "ng" // Prints "ting "

let tingPS = "t" + "\u{0069}" + "\u{0304}" + "ng" // Prints "tīng"

let tingPD = "t" + "\u{012B}" + "ng" // Prints "tīng"

In both cases, the letter ī represents both a single character type instance in swift and an extensible glyph cluster.To learn more about scalable glyph clustering, refer to this link

String length

Now that we have a brief look at scalable glyph clustering, let’s look at some interesting things about swift strings.

The string type in swift is a collection of character type instances. In the development process, we generally use two methods to calculate the length of a string. The first method is to convert the string to nsstring type in Objective-C, and obtain its length through the length method. The second method is to use string attributes characters.count The way to get it. This section mainly discusses the second method. This article will compare the two methods at the end.

In swift, careful students may have found that the number of characters in tingpd and tingps strings is the same


print("tingPD-Count:\(tingPD.characters.count), tingPS-Count:\(tingPS.characters.count)") 
// Prints "tingPD-Count:4, tingPS-Count:4"

Now let’s solve this problem. As I said earlier, stringcharacter in swift is based on Unicode scalar, and string is the collection of characters (i.e. inclusion relation), while string attribute is not characters.count In fact, it is to calculate the number of characters. How to define a character, or what is a character? At this time, another concept, grapheme cluster boundaries, is introduced, and “what is a character? “This question is the answer to this question. For those who want to know more about it, please see:Portal. From the perspective of user perceived, whether it is the character ī (U + 012b) or I (U + 0069) plus a tonal character (U + 0304), the final result of these two representations is the same readable character, so the number of characters in tingpd and tingps strings is the same.

Through the simple explanation above, we can draw two conclusions

  1. When a string stitches a character, it does not necessarily change the number of strings, that is characters.count The value of.

  2. When the glyph cluster boundary is not obtained, the number of characters in the string cannot be calculated. Therefore, all the Unicode scalars in the string must be traversed to obtain the glyph cluster boundary and then determine the number of characters in the string.

Let’s take a look at an example. I believe we all understand the reason for the output result:


var iWord = "i"

print("iword-Count: \(iWord.characters.count)")
// Prints "iword-Count: 1"

iWord += "\u{0304}" // ī
print("iword-Count: \(iWord.characters.count)")
// Prints "iword-Count: 1"

. length and characters.count The difference between

First. Length is the calculation method of string length in Objective-C, and characters.count It can be said that it is the calculation method of string length in swift. Because the string type in swift can be converted to nsstring type in Objective-C, there may be the following two writing methods in the process of swift development:


print("tingPS.characters.count")
// Prints "4"
print("(tingPS as NSString).length")
// Prints "5"

From the above results, we can see that the length of the string obtained by the. Length method is 5, while the characters.count Equal to 4, the reader may be a little confused, how to calculate the length of the same string is inconsistent? In fact. Length and characters.count The calculation principle of is explained above. This section is a brief summary

. length and characters.count The return values are not always the same. The. Length method is calculated and returned by using the coding unit represented by utf-16, that is, the letter I (U + 0069) and the tone character (U + 0304) will be treated as two characters, so the length is 2. . character.count The number of characters is determined by the glyph cluster boundary. If you don’t understand, please see the above. PS: in fact, this is the reason why swift uses index to access strings

Recommended Today

Hot! Front and rear learning routes of GitHub target 144K

Hello, Sifu’s little friend. I’m silent Wang Er. Last week, while appreciating teacher Ruan Yifeng’s science and technology weekly, I found a powerful learning route, which has been marked with 144K on GitHub. It’s very popular. It covers not only the front-end and back-end learning routes, but also the operation and maintenance learning routes. As […]