Translation: practical Python Programming 01_ 04_ Strings


Contents | previous section (1.3 numbers) | next section (1.5 list)

1.4 string

This section describes how to handle text.

Represents literal text

In the program, the literal amount of string is written in quotation marks.

#Single quote
a = 'Yeah but no but yeah but...'

#Double quote
b = "computer says no"

#Triple Quotes
c = '''
Look into my eyes, look into my eyes, the eyes, the eyes, the eyes,
not around the eyes,
don't look around the eyes,
look into my eyes, you're under.

Generally, a string can only occupy one line. Three quotation marks capture all text before the end of quotation marks, including all formats.

There is no difference between using single quotation marks (‘) and double quotation marks (“). However, what kind of quotation marks must be used to start a string, and what kind of quotation marks must be used to end a string.

String escape code

Escape codes are used to represent control characters and characters that cannot be easily entered on the keyboard. Here are some common escape codes:

'\ n' line feed
'\ R' carriage return
'\ t' tab (TAB)
Literal single quote
Literal double quote
Literal backslash

String representation

Each character in a string is internally stored as a so-called Unicode “code point”, which is an integer. The following transition sequence can be used to specify the exact code point.

a = '\xf1'          # a = 'ñ'
b = '\u2200'        # b = '∀'
c = '\U0001D122'    # c = '?'
d = '\N{FOR ALL}'   # d = '∀'

Please refer to the Unicode character database for all available character codes.

String index

You can access a single character of a string as you would an array. You can use an integer index starting at 0, and a negative index specifies the position relative to the end of the string.

a = 'Hello world'
b = a[0]          # 'H'
c = a[4]          # 'o'
d = a[-1]         # 'd' (end of string)

You can also specify an index range to cut or select substrings

d = a[:5]     # 'Hello'
e = a[6:]     # 'world'
f = a[3:8]    # 'lo wo'
g = a[-5:]    # 'world'

Characters at the end index are not included. The missing index is assumed to be the beginning or end of the string.

String operation

String operations include: splicing, length calculation, member judgment and copy.

# Concatenation (+)
a = 'Hello' + 'World'   # 'HelloWorld'
b = 'Say ' + a          # 'Say HelloWorld'

# Length (len)
s = 'Hello'
len(s)                  # 5

# Membership test (`in`, `not in`)
t = 'e' in s            # True
f = 'x' in s            # False
g = 'hi' not in s       # True

# Replication (s * n)
rep = s * 5             # 'HelloHelloHelloHelloHello'

The method of string

Strings have methods to perform various operations on data.

Example: remove any white space at the beginning or end.

s = '  Hello '
t = s.strip()     # 'Hello'

Example: case conversion.

s = 'Hello'
l = s.lower()     # 'hello'
u = s.upper()     # 'HELLO'

Example: text replacement.

s = 'Hello world'
t = s.replace('Hello' , 'Hallo')   # 'Hallo world'

More string methods:

String has a variety of methods for testing and processing text data.

Here is a small sample of string methods:

s.endswith(suffix)     # Check if string ends with suffix
s.find(t)              # First occurrence of t in s
s.index(t)             # First occurrence of t in s
s.isalpha()            # Check if characters are alphabetic
s.isdigit()            # Check if characters are numeric
s.islower()            # Check if characters are lower-case
s.isupper()            # Check if characters are upper-case
s.join(slist)          # Join a list of strings using s as delimiter
s.lower()              # Convert to lower case
s.replace(old,new)     # Replace text
s.rfind(t)             # Search for t from end of string
s.rindex(t)            # Search for t from end of string
s.split([delim])       # Split string into list of substrings
s.startswith(prefix)   # Check if string starts with prefix
s.strip()              # Strip leading/trailing space
s.upper()              # Convert to upper case

Variability of strings

Strings are immutable or read-only. Once created, the value of the string cannot be modified.

>>> s = 'Hello World'
>>> s[1] = 'a'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

All operations and methods that handle string data always create a new string.

String conversion

usestr()Function to convert any value to a string.str()The result of the function is a string containing the text andprint()Statement produces the same text.

>>> x = 42
>>> str(x)

Byte string

Usually, an 8-byte string is encountered in the underlying I / O. it is written as follows:

data = b'Hello World\r\n'

Specify a byte string instead of a text string by placing the lowercase B before the first quotation mark(Adding B before the string means that this is a byte string encoded in ASCII)。 Most common text string operations can be applied to byte strings.

len(data)                         # 13
data[0:5]                         # b'Hello'
data.replace(b'Hello', b'Cruel')  # b'Cruel World\r\n'

The byte string index is a bit different because it returns a byte value in integer form:

data[0]   # 72 (ASCII code for 'H')

Conversion between byte string and text string:

text = data.decode('utf-8') # bytes -> text
data = text.encode('utf-8') # text -> bytes

'utf-8'This parameter specifies the encoding of the character. Other common coding methods are'ascii'and'latin1'

Original string

The original string is an unexplained literal quantity of a string with a backslash. Specified by prefixing “R” before the original quotation marks.

>>> rs = r'c:\newdata\test' # Raw (uninterpreted backslash)
>>> rs

The output string is literal text contained in quotation marks, which is exactly the same as the input text. This is useful when the backslash has a special meaning. For example: file name, regular expression, etc.


A string with a formatted expression substitution.

>>> name = 'IBM'
>>> shares = 100
>>> price = 91.1
>>> a = f'{name:>10s} {shares:10d} {price:10.2f}'
>>> a
'       IBM        100      91.10'
>>> b = f'Cost = ${shares*price:0.2f}'
>>> b
'Cost = $9110.00'

Note: This requires Python 3.6 or laterThe meaning of formatted code will be explained later.


In these exercises, you will try to manipulate Python string types. You should operate at the python interactive prompt, where you can easily see the results. Important note:

In exercises that should interact with the interpreter,
>>> When Python wants you to enter a new statement, you will get an interpreter prompt. Some of the statements in the exercise will span multiple lines – you may need to press enter several times to make them execute. To remind you, do not enter the>>>Prompt.

Start by defining a string containing a series of stock symbols. The string is as follows:

>>> symbols = 'AAPL,IBM,MSFT,YHOO,SCO'

Exercise 1.13: extract single characters and substrings

A string is an array of characters. Try to extract some characters:

>>> symbols[0]
>>> symbols[1]
>>> symbols[2]
>>> symbols[-1]        # Last character
>>> symbols[-2]        # Negative indices are from end of string

In Python, strings are read-only.

Try to pass thesymbolsTo verify this, the first character of the string becomes the lowercase letter ‘a’.

>>> symbols[0] = 'a'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

Exercise 1.14: String splicing

Although string data is read-only, you can always reassign variables to newly created strings.

Try the following statement, which stitches a new stock code “goog” into thesymbolsThe end of the string.

>>> symbols = symbols + 'GOOG'
>>> symbols

too bad! This is not what we want. Modify it so that the variablesymbolsThe saved value is'AAPL,IBM,MSFT,YHOO,SCO,GOOG'

>>> symbols = ?
>>> symbols

hold'HPQ'Add tosymbolsBefore string:

>>> symbols = ?
>>> symbols

In these examples, it looks as if the original string is being modified, a clear violation that the string is read-only. Actually, it’s not. Each time, these operations create a brand new string. When variable namesymbolsIs reassigned, which points to a newly created string. Then, the old string is destroyed because it is no longer used.

Exercise 1.15: member test (substring test)

Try to useinThe operator checks the substring. Try these actions at the interactive prompt.

>>> 'IBM' in symbols
>>> 'AA' in symbols
>>> 'CAT' in symbols

Why checkAACome back when you need toTrue ?

Exercise 1.16: string methods

At the python interactive prompt, try some new string methods.

>>> symbols.lower()
>>> symbols

Remember that strings are always read-only. If you want to save the result of the operation, you need to put it in a variable.

>>> lowersyms = symbols.lower()

Try more:

>>> symbols.find('MSFT')
>>> symbols[13:17]
>>> symbols = symbols.replace('SCO','DOA')
>>> symbols
>>> name = '   IBM   \n'
>>> name = name.strip()    # Remove surrounding whitespace
>>> name

Exercise 1.17: f-strings

Sometimes you want to create a string and embed the values of other variables in it.

To do this, use f-strings. Example:

>>> name = 'IBM'
>>> shares = 100
>>> price = 91.1
>>> f'{shares} shares of {name} at ${price:0.2f}'
'100 shares of IBM at $91.10'

Modify the mortgage. Py program from exercise 1.10 to create its output using f-strings.

Try to implement it so that the output is well aligned.

Exercise 1.18: regular expressions

One limitation of basic string operations is that they do not support any type of advanced pattern matching. To do this, you need to use Python’sreModules and regular expressions. Regular expression processing is a big topic. Here is just a brief example

>>> text = 'Today is 3/27/2018. Tomorrow is 3/28/2018.'
>>> # Find all occurrences of a date
>>> import re
>>> re.findall(r'\d+/\d+/\d+', text)
['3/27/2018', '3/28/2018']
>>> # Replace all occurrences of a date with replacement text
>>> re.sub(r'(\d+)/(\d+)/(\d+)', r'--', text)
'Today is 2018-3-27. Tomorrow is 2018-3-28.'

ofreFor more information about the module, please see the official document: .


When you start trying to use the interpreter, you always want to know more about the operations supported by different objects. For example, how to find out which operations are valid for strings?

Depending on your Python environment, you may be able to complete with the tab key to see a list of available methods. For example, try entering the following code:

>>> s = 'hello world'
>>> s.<tab key>

If clicking tab doesn’t work, you can use Python’s built-in functionsdir(). Example:

>>> s = 'hello'
>>> dir(s)
['__add__', '__class__', '__contains__', ..., 'find', 'format',
'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition',
'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit',
'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase',
'title', 'translate', 'upper', 'zfill']

dir()Function to generate a(.)A list of all operations that appear after.

usehelp()Function to get more information about a specific operation.

>>> help(s.upper)
Help on built-in function upper:

    S.upper() -> string

    Return a copy of the string S converted to uppercase.

Contents | previous section (1.3 numbers) | next section (1.5 list)

Note: please refer to