Relational Databases
[single page] [slide show]
What's a Database
| |
A collection of data?
- everything you collected for your group project?
A computer information system?
|
|
Date's criteria
- Integration
- Sharing
- Persistence
- Entities and Relationships
- Properties
|
What's a Relational Database?
Data are:
- organized as tables
- retrievable through queries
[Access example]
What's a Table?
| |
Table
- set of rows
- homogeneous (same tuple-type)
|
|
Row
- no more than one value per column
- row(column,...) = tuple(element,...)
|
|
Column
- unique (within table) name
- single domain: all values have same type and constraints
|
[Access example]
What's a Query
A question about a table, expressed as specific constraints on column
values.
- I.e. change questions like:
- "Which employees are paid more than $50,000?"
- into directives like:
- "Get the employee names from the rows where the salaries are
greater than 50000"
A query against one table (the base table) produces another table (the
answer table) whose rows and columns "answer the question" (satisfy
the query constraints)
[Access example]
Problems with Single-Table Databases
(or, why not just keep everything in Excel?)
| |
Redundant data
- attribute(s) applied to a group of rows
must be repeated in each of the rows
Typographical errors
- redundant values increase probability of transcription error
|
|
Updating data
- if redundant values
then must change all of them
Modifying data
- insertion anomaly
- can't insert partial rows
- deletion anomaly
|
[Access example]
Multiple Table Databases
| |
Eliminate redundancy
by factoring single table into multiple tables
- Each table = single kind of thing
- Each row = single thing
|
|
Preserve relationships
by references between tables
- Replace redundant values
with reference to unique values
|
| [Access example] |
[Access example] |
Keys
A key uniquely identifies, and can therefore be used as a reference
to, a single row
| |
Primary key: column(s) whose values uniquely
identify a row
- Data values that are naturally unique
- may be more than 1 column
- Arbitrary value
- e.g. synthesized by the DBMS
|
|
Foreign key: reference to another row's primary
key
- Foreign keys are how databases maintain explicit relationships
between rows, within or between tables.
|
[Access example]
Relationships
| |
One-to-One
- E.g. person A has 1 spouse B
- are A and B really different aspects of the same thing?
- yes: merge into single table
- no: like one-to-many
One-to-Many
- E.g. person A has several children B
- Most common relationship
- table B has foreign keys into table A
|
|
Many-to-Many
- E.g.
- student A takes several classe B
- class B has several students A
- Needs a third table
- table C has foreign keys into tables A and B
|
Database Design
Develop a logical data model, and then translate it into a physical data
model.
- Logical: database-independent
- entities and attributes
- relationships
- Physical: database-dependent
- tables and columns
- foreign keys
Entity-Relationship (ER) Models
| |
Standard way to represent logical data models
- Entities: kinds of things
- Attributes: facts about things
- Relationships: connections between kinds of things
- "is-a", "is-part-of", "is-a-...-of"
|
|
Graphical representation
- Entity
- Attribute
- Relationship
- arc (e.g. line)
- cardinality (e.g. "crow's foot")
|
NB: Access "relationships" tool is a physical data
model, not an ER model
From an ER Model to a Relational Model
- Create 1 table per entity.
- If parent ("1:") entity then
- create single-column primary key.
- Else if child (":n") entity then
- Create foreign keys for primary key of each parent.
[Access example]
What is Normalization?
Lossless transformation of a relational data model into one with:
- Less redundancy
- Ideal: "one fact on one place"
- Fewer functional dependencies
- Ideal: "Each attribute must represent a fact about the key,
the whole key, and nothing but the key."
- Date (1995), after Kent (1983)
I.e.: the rules for factoring single tables into multiple tables.
First Normal Form
Indivisible attributes
- Indivisible: atomic, scalar
- attribute's internal structure is opaque to the database
- I.e.: no multi-valued attributes
- I.e.: any legal relation is already in 1NF
- Fix by projecting to multiple tables
- Project: extract a subset of a table's columns:
- Old
dingus(dingusID, color, {option, ...})
- New
dingus(dingusID, color)
option(optionID, option, dingusID)
Second Normal Form
Non-key attributes functionally dependent on primary key
- Functional dependence
- B = f(A) means A determines B (usually written: A → B)
- I.e.: non-key attributes must pertain to the table's entity, not some
other entity
- Fix by projecting functional dependencies into separate tables:
- Old
dingus(dingusID, quantity, vendor, city)
- New
dingus(dingusID, quantity, vendorID)
vendor(vendorID, vendor, city)
Third Normal Form
No transitive dependencies between attributes
- Transitive dependency
- If A -> B and B -> C,then may (incorrectly) assume A -> C
- I.e.: no functional dependencies on non-key attributes
- Fix by projecting non-key determinant to new relation
- E.g.: a publisher whose price is a function of #pages
- Old
book(title, publisher, pages, price)
- New
book(title, publisher, pages)
price(publisher, pages, price)
Normalization: Summary
Project original relation to eliminate:
- Multi-valued attributes
- Functional dependencies on non-key attributes
- Transitive functional dependencies between attributes
I.e.: Eliminate all functional dependencies in which the determinant
is not a candidate key
- Date (1995)
- Candidate key: attribute(s) that could be a primary key
Payback: robust data model
- Table design prevents inconsistencies and anomalies
What's a Join?
Used in 2 senses:
- Relationship between two tables (i.e. foreign key → primary key)
- Access: drawing a connection in Tools|Relationships
- Combination of two or more tables in a query
- Access: checking fields from two or more tables in Query Design
view
Implementing Relationships with Joins
| |
One-to-one
- foreign key with no duplicates
- [Access example]
One-to-many
|
|
Many-to-many
- third table
- two foreign keys
- NB: the kind of thing is relationship
- [Access example]
|
Add Rows, Not Columns
General rule:
- If the activity the database is designed for requires adding
or deleting columns,
then you haven't gotten the relationships right.
- Don't use # columns to represent # things
- Don't use columns to represent values
- [Access example]
Data Integrity
Use the database to help ensure your data is correct and consistent.
- Domain error
- constrain data to subset of a built-in type's possible values
- by formula
- by enumeration
- Inter-column inconsistencies
- enforce relationships between values in different columns
- Inter-table inconsistencies
- enforce relationships between values in different tables
Data Integrity Mechanisms
Lookup text
- Force column values to be foreign keys into "domain table" of
all possible values
Validation rules
- Specify functional or logical relationships between columns
Referential integrity
- Foreign/primary key relationship
There's More Than One Way To Join
Inner Join
- Only rows where TableA.PrimaryKey = TableB.ForeignKey
Outer (Right, Left) Join
- Plus all rows from right/left table
- NULLs where keys don't match
[Access example]
Nulls: Threat, or Menace?
| |
NULL: special value that means "unknown"
NULL <> anything, even another NULL
A = B is false if A or B or
both are NULL
- can't join on a
NULL foreign key
- Ignored by aggregate functions
- Test using
IsNull(value) function
|
|
NULL vs. zero-length string: unknown vs.
known
- E.g. a fax number
NULL: don't know if there is or isn't one
- zero-length string: know there isn't one
|
[Access example]
"Null" Could Mean:
| |
Unknown
- Entity exists, has attribute, don't know value
- e.g.: unconscious ER patient: name?
Not applicable
- Entity exists, doesn't have attribute
- e.g.: male patient: pregnant?
Missing
- Entity missing, attribute exists, has value
- e.g.: patient has no wallet: bank balance?
|
|
Unclassified
- Entity exists, has attribute, value unrepresentable
- e.g.: patient is green with purple spots: disease code?
Erroneous
- Entity exists, has attribute, value illegal
- e.g.: patient's temperature is 100 C
(examples from: Data and Databases: Concepts and Practice,
by Joe Celko. ISBN 1-55860-432-4) |
Readings and Examples