Bits, Bytes, and Fundamental Types

ESM 261
Fall 2007
James Frew

[single page] [slide show]

Basics: Bits and Bytes

 

Bit: binary (0|1) digit

Byte: 8 "contiguous" bits

  • smallest addressable storage unit
  • holds 1 ISO character
  • portable across different hardware
 

"Word": {2,4,8,…} "contiguous" bytes

  • NOT necessarily portable
    • e.g. byte order may vary

Numbers

 

Integer

  • signed
  • unsigned
 

Real

  • fixed-point
  • scaled
  • floating-point

Binary Integers

 

For n bits:

  • unsigned: 0 ≤ i ≤ 2n-1
  • signed: -2(n-1) ≤ i ≤ 2(n-1)-1
 

Important values of 2n:

28 = 256
216 = 65,536
232 = 4,294,967,296
264 = 18,446,744,073,709,551,616

Integer Overflow

Wraparound

Mistake signed for unsigned → "false contour":

unsigned   signed
 

Storing Fractions in Integers

Fixed point integers

Scaled integers

Floating-Point Numbers

(-1)signbit × 2(exponent-E) × (1 + fraction × 2-F)

 

Exponent

  • 8 bit: 1..254 - (E=127)
    • 2-126..127
  • 11 bit: 1..2046 - (E=1023)
    • 2-1022..1023
 

Fraction

  • 23 bit (F:23) or 52 bit (F:52)

Floating-Point Numbers

(-1)signbit × 2(exponent-E) × (1 + fraction × 2-F)

 

Special values of (exponent, fraction)

  • (max, 0): INF: infinity
    • e.g. 1/0
  • (max, !0): NaN: not a number
    • (e.g. √-1)
 

Decimal ranges

  • 32-bit: 10±38
  • 64-bit: 10±308

NB: quantization is nonlinear:

  • each order-of-magnitude gets 2F counts

[demo] (requires Java)

Numeric Type Tradeoffs

 

Integer

  • (can be) compact
  • portable
    • (except byte order)
  • exact representation
  • uniform quantization
  • (unsigned) directly displayable as pixels
 

Floating-point

  • automatic scaling
  • widest range of values
  • well-defined arithmetic
    • over/underflow
    • singular values

Text

 

Character

  • ISO 8859-1
    • 8 bits/character
      • 256 possible characters
    • encodes Latin alphabet
      • e.g. works for French, but not for Russian
    • most widely supported encoding in US
  • Unicode
    • 8..32 bits/character
      • up to ~4 billion possible characters
    • encodes (potentially) all human language characters
      • (and even some nonhuman ones...)
 

String: sequence of characters

  • portable, if you also know:
    • order
    • length, from one of:
      • count ("here come N characters")
      • delimiter (end-of-string character)

"Printable" Text

Subset of possible 1-byte characters

Most portable type

"Binary Text"

 

Bitwise conversion: bytes ↔ text

  • E.g.: 4 bits ↔ hexadecimal character
    • 0000..1001 ↔ 0..9
    • 1010..1111 ↔ A..F
  • [demo]
 

Most portable "byte stream"

  • inflation: byte becomes >1 character
    • less if larger radix
      • hex → 2x
      • base-64 (e.g. uuencode) → 1.25x
  • need printable chars for delimiters

Same Bits, Different Types

 

binary

  • 11000000010010010000111111011011

hexadecimal

  • C0490FDB

ISO Latin-1

  • À I control-O Û
 

unsigned integer

  • 3,226,013,659

signed integer

  • -1,068,953,637

IEEE floating-point

  • -3.1415927

Floating-Point Example

decimal -3.1415927

binary 1 10000000 10010010000111111011011

Reading

Wikipedia

optional

The Data Handbook, chapters 1 through 6 (pp. 14-81)
(basically a verbose version of this lecture)
What every computer scientist should know about floating-point arithmetic
David Goldberg, ACM Computing Surveys 23:1, 5-48 (March 1991).
If you're really interested in how floating-point arithmetic works, this is the place to start. It's easier going than the title implies ...