CSAPP English Learning Series: Chapter 2: data representation

Time:2021-10-10
Generally speaking, binary operations do not satisfy the combination law and distribution rate, and only the commutative law can be used.
XOR can be combined with the law: A ^ (b ^ C) = (a ^ b) ^ C
And and or can use the distribution law: A & (b|c) = (a|b) & (a|c)
Multiplication and addition can use the distribution law: a * (B + C) = a * B + A * C
We consider the three most important representations of numbers. Unsigned
encodings are based on traditional binary notation, representing numbers greater
than or equal to 0. Two’s-complement encodings are the most common way to
represent signed integers, that is, numbers that may be either positive or negative.
Floating-point encodings are a base-2 version of scientific notation for representing real numbers. Computers implement arithmetic operations, such as addition
and multiplication, with these different representations, similar to the corresponding operations on integers and real numbers.

representation
Beauty[ ˌ r ɛ pr ɪ z ɛ n ˈ te ʃə n]
n. Performance; state

notation
Beauty [no ʊˈ te ɪʃ n]
n. Marking method

positive
Beauty[ ˈ p ɑː z ə t ɪ v]
Adj. Positive; Greater than zero

negative
Beauty[ ˈ ne ɡə t ɪ v]
Adj. Negative; Negative

scientific
Beauty[ ˌ sa ɪə n ˈ t ɪ f ɪ k]
Adj. Scientific

arithmetic
Beauty[ əˈ r ɪθ m ə t ɪ k]
Adj. Arithmetic; Operational

multiplication
Beauty[ ˌ m ʌ lt ɪ pl ɪˈ ke ɪʃ n]
n. Multiplication

correspond
Beauty[ ˌ k ɔː r əˈ sp ɑː nd]
v. Compliance; amount to; signal communication
2.1 Information Storage

In subsequent chapters, we will cover how the compiler and run-time system
partitions this memory space into more manageable units to store the different
program objects, that is, program data, instructions, and control information.
Various mechanisms are used to allocate and manage the storage for different
parts of the program. This management is all performed within the virtual address
space. For example, the value of a pointer in C—whether it points to an integer,
a structure, or some other program object—is the virtual address of the fifirst byte
of some block of storage. The C compiler also associates type information with
each pointer, so that it can generate different machine-level code to access the
value stored at the location designated by the pointer depending on the type of
that value. Although the C compiler maintains this type information, the actual
machine-level program it generates has no information about data types. It simply
treats each program object as a block of bytes and the program itself as a sequence
of bytes.

subsequent
Beauty[ ˈ s ʌ bs ɪ kw ə nt]
Adj. Later; Subsequent

partition
Beauty [P ɑː r ˈ t ɪʃ n]
n. Isolators; Partition

various
Beauty[ ˈ veri ə s]
Adj. Various; Vastly different

mechanism
Beauty[ ˈ mek ə n ɪ z ə m]
n. Mechanism, function

associate
Beauty[ əˈ so ʊ sie ɪ t]
v. Association, contact; Relate to

maintain
Beauty [me] ɪ n ˈ te ɪ n]
v. Maintain, maintain
2.1.1 Hexadecimal Notation

A common task in working with machine-level programs is to manually con-
vert between decimal, binary, and hexadecimal representations of bit patterns.
Converting between binary and hexadecimal is straightforward, since it can be
performed one hexadecimal digit at a time. Digits can be converted by referring
to a chart such as that shown in Figure 2.2. One simple trick for doing the conver-
sion in your head is to memorize the decimal equivalents of hex digits A, C, and F.
The hex values B, D, and E can be translated to decimal by computing their values
relative to the fifirst three.

decimal
Beauty[ ˈ des ɪ ml]
Adj. Decimal; Decimal

hexadecimal
Beauty[ ˌ h ɛ ks əˈ d ɛ s ə m ə l]
Adj. Hexadecimal

pattern
Beauty[ ˈ pæt ə rn]
n. Mode; example

straightforward
Beauty[ ˌ stre ɪ t ˈ f ɔː rw ə rd]
Adj. Simple and clear

memorize
Beauty[ ˈ mem ə ra ɪ z]
Vt. remember, memorize

equivalent
Beauty[ ɪˈ kw ɪ v ə l ə nt]
n. Equivalent; Counterpart
2.1.2 Data Sizes

Programmers should strive to make their programs portable across different
machines and compilers. One aspect of portability is to make the program insensi
tive to the exact sizes of the different data types. The C standards set lower bounds
on the numeric ranges of the different data types, as will be covered later, but there
are no upper bounds (except with the fifixed-size types). With 32-bit machines and
32-bit programs being the dominant combination from around 1980 until around
2010, many programs have been written assuming the allocations listed for 32-
bit programs in Figure 2.3. With the transition to 64-bit machines, many hidden
word size dependencies have arisen as bugs in migrating these programs to new
machines. For example, many programmers historically assumed that an object
declared as type int could be used to store a pointer. This works fifine for most
32-bit programs, but it leads to problems for 64-bit programs.

strive
Beauty [stra] ɪ v]
Vi. strive to; Strive for

aspect
Beauty[ ˈ æspekt]
n. Aspects; a look

insensitive
Beauty[ ɪ n ˈ s ɛ ns ɪ t ɪ v]
Adj. Insensible

bound
Beauty [ba ʊ nd]
n. Jump; limit

dominant
Beauty[ ˈ d ɑː m ɪ n ə nt]
Dominant

combination
Beauty[ ˌ k ɑː mb ɪˈ ne ɪʃ n]
n. Combination

assume
Beauty[ əˈ su ː m]
v. Suppose that

allocation
Beauty[ ˌ æl əˈ ke ɪʃ n]
n. Distribution

transition
Mei [tr æ n] ˈ z ɪʃ n]
n. Transition, transformation, change
2.1.3 Addressing and Byte Ordering

At times, however, byte ordering becomes an issue. The first case is when
binary data are communicated over a network between different machines. A
common problem is for data produced by a little-endian machine to be sent to
a big-endian machine, or vice versa, leading to the bytes within the words being
in reverse order for the receiving program.
A second case where byte ordering becomes important is when looking at
the byte sequences representing integer data.
A third case where byte ordering becomes visible is when programs are
written that circumvent the normal type system.

vice
Beauty [va ɪ s]
n. Crime; evil

versa
Beauty ['v ɝ s ə]
Adj. Reverse

sequence
Beauty[ ˈ si ː kw ə ns]
n. [number] sequence

circumvent
Beauty[ ˌ s ɜ rk ə m ˈ vent]
Vt. surround; Prevent with meter
2.1.4 Representing Strings

A string in C is encoded by an array of characters terminated by the null (having
value 0) character. Each character is represented by some standard encoding, with
the most common being the ASCII character code. Thus, if we run our routine
show_bytes with arguments "12345" and 6 (to include the terminating character),
we get the result 31 32 33 34 35 00. Observe that the ASCII code for decimal digit
x happens to be 0x3x, and that the terminating byte has the hex representation
0x00. This same result would be obtained on any system using ASCII as its
character code, independent of the byte ordering and word size conventions. As
a consequence, text data are more platform independent than binary data.

obtained
Beauty[ ə b'te ɪ nd]
v. Gain

independent
Beauty[ ˌɪ nd ɪˈ pend ə nt]
Adj. Independent

convention
Beauty [k] ə n ˈ ven ʃ n]
n. Custom

consequence
Beauty[ ˈ k ɑː ns ɪ kwens]
n. Results
2.1.5 Representing Code

A fundamental concept of computer systems is that a program, from the
perspective of the machine, is simply a sequence of bytes. The machine has no
information about the original source program, except perhaps some auxiliary
tables maintained to aid in debugging. We will see this more clearly when we study
machine-level programming in Chapter 3.

fundamental
Beauty[ ˌ f ʌ nd əˈ mentl]
Adj. Basic; Ingrained

perspective
Beauty [P ə r ˈ spekt ɪ v]
n. Opinion

auxiliary
Beauty[ ɔːɡˈ z ɪ li ə ri]
Adj. Auxiliary; Spare

maintain
Beauty [me] ɪ n ˈ te ɪ n]
v. Maintain, maintain
2.1.6 Introduction to Boolean Algebra

Claude Shannon (1916–2001), who later founded the field of information
theory, first made the connection between Boolean algebra and digital logic. In
his 1937 master’s thesis, he showed that Boolean algebra could be applied to the
design and analysis of networks of electromechanical relays. Although computer
technology has advanced considerably since, Boolean algebra still plays a central
role in the design and analysis of digital systems.

algebra
Beauty[ ˈ æld ʒɪ br ə]
n. Algebra

thesis
Beauty[ ˈθ i ː s ɪ s]
n. Thesis

electromechanical
Beauty[ ɪˌ lektro ʊ m ə' kæn ɪ k ə l]
Adj. Electro mechanical

considerably
Beauty [k] ə n ˈ s ɪ d ə r ə bl ɪ]
Quite, very
2.1.7 Bit-Level Operations in C

One common use of bit-level operations is to implement masking operations,
where a mask is a bit pattern that indicates a selected set of bits within a word. As
an example, the mask 0xFF (having ones for the least signifificant 8 bits) indicates
the low-order byte of a word. The bit-level operation x & 0xFF yields a value
consisting of the least signifificant byte of x, but with all other bytes set to 0. For
example, with x = 0x89ABCDEF, the expression would yield 0x000000EF. The
expression ~0 will yield a mask of all ones, regardless of the size of the data
representation. The same mask can be written 0xFFFFFFFF when data type int is
32 bits, but it would not be as portable.

indicate
Beauty[ ˈɪ nd ɪ ke ɪ t]
v. Indicate, imply; instructions

significant
Beauty [S] ɪɡˈ n ɪ f ɪ k ə nt]
Adj. Important; Notable

regardless
Beauty [R] ɪˈɡɑː rdl ə s]
Adv. anyway

representation
Beauty[ ˌ r ɛ pr ɪ z ɛ n ˈ te ʃə n]
n. Performance; state
2.1.8 Logical Operations in C

A second important distinction between the logical operators ‘&&’ and ‘||’
versus their bit-level counterparts ‘&’ and ‘|’ is that the logical operators do not
evaluate their second argument if the result of the expression can be determined
by evaluating the fifirst argument. Thus, for example, the expression a && 5/a will
never cause a division by zero, and the expression p && *p++ will never cause the
dereferencing of a null pointer.

distinction
Beauty [D] ɪˈ st ɪŋ k ʃ n]
n. Difference

versus
Beauty[ ˈ v ɜː rs ə s]
As opposed to

counterparts
Mei ['ka] ʊ nt ə p ɑ ts]
n. A person of equal status with the other party

evaluate
Beauty[ ɪˈ væljue ɪ t]
v. Estimate

determine
Beauty [D] ɪˈ t ɜː rm ɪ n]
v. Identification; determination

division
Beauty [D] ɪˈ v ɪʒ n]
n. Division
2.1.9 Shift Operations in C

The C standards do not precisely defifine which type of right shift should be
used with signed numbers—either arithmetic or logical shifts may be used. This
unfortunately means that any code assuming one form or the other will potentially
encounter portability problems. In practice, however, almost all compiler/machine
combinations use arithmetic right shifts for signed data, and many programmers
assume this to be the case. For unsigned data, on the other hand, right shifts must
be logical.
(shift right: arithmetically shift right to fill the highest value on the left, and logically shift right to fill 0)

precisely
Beauty [PR] ɪˈ sa ɪ sli]
Adv. precisely

arithmetic
Beauty[ əˈ r ɪθ m ə t ɪ k]
Adj. Arithmetical

potentially
Beauty [P ə' ten ʃə li]
Adv. potentially

encounter
Beauty[ ɪ n ˈ ka ʊ nt ə r]
v. Encounter

combinations
Beauty [k] ɒ mb ɪ' ne ɪʃ nz]
n. Cooperation
2.2 Integer Representations

Figure 2.8 lists the mathematical terminology we introduce to precisely de-
fifine and characterize how computers encode and operate on integer data. This
terminology will be introduced over the course of the presentation. The fifigure is
included here as a reference.

mathematical
Beauty[ ˌ mæ θəˈ mæt ɪ kl]
Adj. Mathematical; exact

terminology
Beauty[ ˌ t ɜː rm ɪˈ n ɑː l ə d ʒ i]
n. Special terms; term

precisely
Beauty [PR] ɪˈ sa ɪ sli]
Adv. precisely

characterize
Beauty[ ˈ kær ə kt ə ra ɪ z]
v. Make characteristic

terminology
Beauty[ ˌ t ɜː rm ɪˈ n ɑː l ə d ʒ i]
n. Special terms; term

presentation
Beauty[ ˌ pri ː zen ˈ te ɪʃ n]
n. Grant; awarding ceremony; introduce

reference
Beauty[ ˈ refr ə ns]
n. Mention
2.2.1 Integral Data Types

One important feature to note in Figures 2.9 and 2.10 is that the ranges are not
symmetric—the range of negative numbers extends one further than the range of
positive numbers. We will see why this happens when we consider how negative
numbers are represented.

symmetric
Beauty [S] ɪ' metr ɪ k]
Adj. Proportionate, balanced
2.2.2 Unsigned Encodings

In the fifigure, we represent each bit position i by a rightward-pointing blue bar of
length 2^i . The numeric value associated with a bit vector then equals the sum of
the lengths of the bars for which the corresponding bit values are 1.

associate
Beauty[ əˈ so ʊ sie ɪ t]
v. Association, connection

vector
Beauty[ ˈ v ɛ kt ɚ]
n. Vector; course

corresponding
Beauty[ ˌ k ɔː r əˈ sp ɑː nd ɪŋ]
Adj. Corresponding
2.2.3 Two’s-Complement Encodings

For some programs, it is essential that data types be encoded using representations with specific sizes.
For example, when writing programs to enable a machine to communicate over the Internet according
to a standard protocol, it is important to have data types compatible with those specifified by the protocol.
We have seen that some C data types, especially long, have different ranges on different machines,
and in fact the C standards only specify the minimum ranges for any data type, not the exact ranges.
Although we can choose data types that will be compatible with standard representations on most
machines, there is no guarantee of portability.

essential
Beauty[ ɪˈ sen ʃ l]
Adj. Basic; Essential

compatible
Beauty [k] ə m ˈ pæt ə bl]
Adj. Compatible; Coexistable

guarantee
Beauty[ ˌɡ ær ə n ˈ ti ː]
n. Guarantee; Warranty bill

CSAPP English Learning Series: Chapter 2: data representationCSAPP English Learning Series: Chapter 2: data representation

2.2.4 Conversions between Signed and Unsigned

C allows casting between different numeric data types. For example, suppose
variable x is declared as int and u as unsigned. The expression (unsigned) x
converts the value of x to an unsigned value, and (int) u converts the value of u
to a signed integer. What should be the effect of casting signed value to unsigned,
or vice versa? From a mathematical perspective, one can imagine several different
conventions. Clearly, we want to preserve any value that can be represented in
both forms. On the other hand, converting a negative value to unsigned might yield
zero. Converting an unsigned value that is too large to be represented in two’s complement form might yield TMax. For most implementations of C, however,
the answer to this question is based on a bit-level perspective, rather than on a
numeric one.

declared
Beauty [D] ɪˈ klerd]
v. Announce

vice versa
Beauty[ ˌ va ɪ s  ˈ v ɜː rs ə]
Adv. the reverse is also true; vice versa

perspective
Beauty [P ə r ˈ spekt ɪ v]
n. Lens, telescope; Opinion

preserve
Beauty [PR] ɪˈ z ɜː rv]
v. Maintain; protect
2.2.5 Signed versus Unsigned in C

Some possibly nonintuitive behavior arises due to C’s handling of expres-
sions containing combinations of signed and unsigned quantities. When an op-
eration is performed where one operand is signed and the other is unsigned, C
implicitly casts the signed argument to unsigned and performs the operations
assuming the numbers are nonnegative. As we will see, this convention makes
little difference for standard arithmetic operations, but it leads to nonintuitive
results for relational operators such as < and >. Figure 2.19 shows some sample
relational expressions and their resulting evaluations, when data type int has a
32-bit two’s-complement representation. Consider the comparison -1 < 0U. Since
the second operand is unsigned, the fifirst one is implicitly cast to unsigned, and
hence the expression is equivalent to the comparison 4294967295U < 0U (recall
that T2Uw(−1) = UMaxw), which of course is false. The other cases can be under-
stood by similar analyses.

intuitive
Beauty[ ɪ n ˈ tu ːɪ t ɪ v]
Adj. Intuitive

operand
Beauty[ ˈɑ p ə rænd]
n. Operand

implicitly
Beauty[ ɪ m ˈ pl ɪ s ɪ tl ɪ]
Adv. implicitly

convention
Beauty [k] ə n ˈ ven ʃ n]
n. Custom

evaluations
Beauty[ ɪ vælj' ʊ e ɪʃ nz]
n. Assignment

comparison
Beauty [k] ə m ˈ pær ɪ sn]
n. Compare

equivalent
Beauty[ ɪˈ kw ɪ v ə l ə nt]
Adj. Equal
2.2.6 Expanding the Bit Representation of a Number

For converting a two’s-complement number to a larger data type, the rule
is to perform a sign extension, adding copies of the most signifificant bit to the
representation, expressed by the following principle. We show the sign bit xw−1 in
blue to highlight its role in sign extension. When converting from short to unsigned
, the program first changes the size and then the type.

significant
Beauty [S] ɪɡˈ n ɪ f ɪ k ə nt]
Adj. Important

principle
Beauty[ ˈ pr ɪ ns ə pl]
n. Law
2.2.7 Truncating Numbers

Casting x to be short will truncate a 32-bit int to a 16-bit short. As we saw
before, this 16-bit pattern is the two’s-complement representation of −12,345.
When casting this back to int, sign extension will set the high-order 16 bits to
ones, yielding the 32-bit two’s-complement representation of −12,345.

truncate
Beauty[ ˈ tr ʌŋ ke ɪ t]
Vt. shortening

representation
Beauty[ ˌ r ɛ pr ɪ z ɛ n ˈ te ʃə n]
n. Performance

extension
Beauty[ ɪ k ˈ sten ʃ n]
n. Extend

This work adoptsCC agreement, reprint must indicate the author and the link to this article