Relational Databases

ESM 261
Fall 2007
James Frew

[single page] [slide show]

What's a Database

  A collection of data?
  • everything you collected for your group project?

A computer information system?

  • file?
  • spreadsheet?
  • GIS?
  Date's criteria
  • Integration
  • Sharing
  • Persistence
  • Entities and Relationships
  • Properties

What's a Relational Database?

Data are:

[Access example]

What's a Table?

  Table
  • set of rows
    • homogeneous (same tuple-type)
  Row
  • no more than one value per column
  • row(column,...) = tuple(element,...)
    • = record(field,...)
  Column
  • unique (within table) name
  • single domain: all values have same type and constraints

[Access example]

What's a Query

A question about a table, expressed as specific constraints on column values.

A query against one table (the base table) produces another table (the answer table) whose rows and columns "answer the question" (satisfy the query constraints)

[Access example]

Problems with Single-Table Databases

(or, why not just keep everything in Excel?)

 

Redundant data

  • attribute(s) applied to a group of rows
    must be repeated in each of the rows

Typographical errors

  • redundant values increase probability of transcription error
 

Updating data

  • if redundant values
    then must change all of them

Modifying data

  • insertion anomaly
    • can't insert partial rows
  • deletion anomaly
    • must delete whole row

[Access example]

Multiple Table Databases

 

Eliminate redundancy
by factoring single table into multiple tables

  • Each table = single kind of thing
    • (class of object)
  • Each row = single thing
    • (object)
 

Preserve relationships
by references between tables

  • Replace redundant values
    with reference to unique values
    • (key)
[Access example] [Access example]

Keys

A key uniquely identifies, and can therefore be used as a reference to, a single row

 

Primary key: column(s) whose values uniquely identify a row

  • Data values that are naturally unique
    • may be more than 1 column
  • Arbitrary value
    • e.g. synthesized by the DBMS
      • e.g. Access AutoNumber
 

Foreign key: reference to another row's primary key

  • Foreign keys are how databases maintain explicit relationships between rows, within or between tables.

[Access example]

Relationships

 

One-to-One

  • E.g. person A has 1 spouse B
  • are A and B really different aspects of the same thing?
    • yes: merge into single table
    • no: like one-to-many

One-to-Many

  • E.g. person A has several children B
  • Most common relationship
    • table B has foreign keys into table A
 

Many-to-Many

  • E.g.
    • student A takes several classe B
    • class B has several students A
  • Needs a third table
    • table C has foreign keys into tables A and B

Database Design

Develop a logical data model, and then translate it into a physical data model.

Entity-Relationship (ER) Models

 

Standard way to represent logical data models

  • Entities: kinds of things
  • Attributes: facts about things
    • "has-a"
  • Relationships: connections between kinds of things
    • "is-a", "is-part-of", "is-a-...-of"
 

Graphical representation

  • Entity
    • node (e.g. box)
  • Attribute
    • label (e.g. text in box)
  • Relationship
    • arc (e.g. line)
    • cardinality (e.g. "crow's foot")

NB: Access "relationships" tool is a physical data model, not an ER model

From an ER Model to a Relational Model

  1. Create 1 table per entity.
  2. If parent ("1:") entity then
    1. create single-column primary key.
  3. Else if child (":n") entity then
    1. Create foreign keys for primary key of each parent.

[Access example]

What is Normalization?

Lossless transformation of a relational data model into one with:

I.e.: the rules for factoring single tables into multiple tables.

First Normal Form

Indivisible attributes

Second Normal Form

Non-key attributes functionally dependent on primary key

Third Normal Form

No transitive dependencies between attributes

Normalization: Summary

Project original relation to eliminate:

  1. Multi-valued attributes
  2. Functional dependencies on non-key attributes
  3. Transitive functional dependencies between attributes

I.e.: “Eliminate all functional dependencies in which the determinant is not a candidate key”

Payback: robust data model

What's a Join?

Used in 2 senses:

Implementing Relationships with Joins

 

One-to-one

  • foreign key with no duplicates
  • [Access example]

One-to-many

 

Many-to-many

  • third table
    • two foreign keys
      • compound primary key
    • NB: the kind of thing is relationship
  • [Access example]

Add Rows, Not Columns

General rule:

Data Integrity

Use the database to help ensure your data is correct and consistent.

Data Integrity Mechanisms

Lookup text

Validation rules

Referential integrity

There's More Than One Way To Join

Inner Join

Outer (Right, Left) Join

[Access example]

Nulls: Threat, or Menace?

 

NULL: special value that means "unknown"

  • NULL <> anything, even another NULL
    • A = B is false if A or B or both are NULL
    • can't join on a NULL foreign key
  • Ignored by aggregate functions
    • SUM, AVG, etc.
  • Test using IsNull(value) function
    • true if value is NULL
  NULL vs. zero-length string: unknown vs. known
  • E.g. a fax number
    • NULL: don't know if there is or isn't one
    • zero-length string: know there isn't one

[Access example]

"Null" Could Mean:

 

Unknown

  • Entity exists, has attribute, don't know value
    • e.g.: unconscious ER patient: name?

Not applicable

  • Entity exists, doesn't have attribute
    • e.g.: male patient: pregnant?

Missing

  • Entity missing, attribute exists, has value
    • e.g.: patient has no wallet: bank balance?
  Unclassified
  • Entity exists, has attribute, value unrepresentable
    • e.g.: patient is green with purple spots: disease code?

Erroneous

  • Entity exists, has attribute, value illegal
    • e.g.: patient's temperature is 100 C

(examples from: Data and Databases: Concepts and Practice, by Joe Celko. ISBN 1-55860-432-4)

Readings and Examples

  recommended   optional