http://handoutsuite.googlepages.com

Functional Dependencies & Normalization

Better Design Guidelines:

3.0 Introduction to Normalization

Take care of the meaning of a relation schema. Do not combine attributes from multiple entity types and relationship types into a single relation.

The basic objective of logical modeling is to develop a good description of the data, its relationships and its constraints.

employee (name, roomNr) room (roomNr, telNr)

Assume we have an entity-relationship diagram and we derive a database schema from it. Is the database schema good?

Approach: Decomposition of relations To get the phone number of Noha, perform a join between employee and room.

m

Question: When is a database schema good?

The join is lossless: No loss of information.

ge

s.

Update Anomaly (II)

og

le

pa

Assume we have the following schema employee_department (name, ssn, bdate, address, dNr, dname, dMangName) Suppose, we wish to change the manager of a department. We have to update the tuples of all employees who work in the department.

go

3.0.1 Why we need normalization?

it

e.

Three anomalies and illustrative examples

ut

su

Update Anomaly (I) Assume we have the following schema:

do

employee (name, roomNr, telNr)

co

Normalization Theory The purpose of normalization is to achieve a design that - is highly flexible - reduces redundancy - is free of update, insertion and deletion anomalies

an

Rule: The phone number is associated with room number and not with the name of the person.

If you do not update all records, it results in an inconsistent state in the database.

It would be impossible to tell the true manager for specific employees. This is an Update Anomaly.

/h

Insertion Anomaly

:/

Amira has moved from Room 3.301 to Room 2.205

ht

tp

UPDATE employee SET roomNr = 2.205 WHERE roomNr = 3.301 AND name = 'Amira' ('Amira',2.205,230) is wrong, because telNr 230 is associated with roomNr 3.301. Problem: Relation attributes.

DBMS – Module 3

employee

has

too

many

Assume we have the following schema employee_department(name, ssn, bdate, address, dNr, dname, dMangName) An Insertion Anomaly occurs when we try to insert a new department that has no employee yet. Since the key is ssn, we are not permitted to insert a record with a null value for ssn.

1

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization

Deletion Anomaly

Assume we have the following schema employee_department (name, ssn, bdate, address, dNr, dname, dMangName) When we delete the last employee working for a particular department, a deletion anomaly occurs. We would lose all information about the department. Similarly, if a department will be closed, we lose all information about the employees working in this department.

Answer: Use the store relation. Problem: The meaning of the designs is not the

m

same.

co

Shortly we can keep these points in mind 3.0.2 Informal Design Guidelines

s.

Better Design Guidelines:

ge

Design the relation schema so that no insertion, deletion, or modification anomalies are present in the relations.

le

pa

1. Semantics of the Relation attributes - Whenever we group attributes to form a relation schema, we assume that a certain meaning is associated with the attributes. Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation.

it

e.

The guidelines may sometimes have to be violated in order to improve the performance of certain queries.

go

og

employee (name, ssn, bdate, address, dNr) department (dname, dNr, dMangName)

do

ut

su

Example: Retrieve information concerning the department of an employee along with employee attributes.

/h

an

Anomalies must be noted and accounted for by using triggers or procedures that would make automatic update: To avoid inconsistencies.

tp

:/

Problems of Decomposition Join with Loss of Information

ht

store (dept, items, color}

We decompose the relation into two relations: simple projection of store: r1 (dept, item) r2 (item, color) Query: Find a department that does not sell red items

DBMS – Module 3

2. Reducing the Redundant information in Tuples and Update Anomalies Design the relation schema so that no insertion, deletion, or modification anomalies are present in the relations. 3. Reducing the Null values in Tuples - Try to avid placing attributes in a base relation whose values may frequently be null. If nulls are un avoidable, make sure that they apply in exceptional cases only and do not apply to majority of Tuples in the relation. 4. Generation of Spurious (fake) Tuples Design relation schema so that they can be JOINed with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious Tuples are generated. Do not have relations that contain matching attributes other than foreign key- primary key combinations. If such relations are unavoidable, do not join them on such attributes. 2

Functional Dependencies & Normalization

http://handoutsuite.googlepages.com

3.0.3 Brief History of Normalization

More formally, suppose we have a relation R. A functional dependency A Æ B between two sets of attributes A and B that are subsets of R holds if For every pair of Tuples t1, t2 ε R that have t1.A = t2.A they must also have t1.B = t2.B

Research into anomalies was first done by Codd. Codd identified the first three normal forms. A relation is in a specific normal form if it satisfies the set of constraints for that form.

A functional dependency is an integrity constraint that every instance of the database must obey.

The constraints are schema constraints. They are properties of the intension not just of a particular extension.

m

Note that A and B can be sets that consist of a single attribute. Functional dependency is a generalization of the notion of a key. If A Æ B is a functional dependency, we will call A determinant and B dependent.

s.

co

Research by Boyce and Codd led to a refinement of the third form.

ge

Research by Fagin, Zaniolo and Delobel resulted in the definition of three new normal forms.

pa

Functional Dependencies - Example

og go

it

e.

The design objective is to put the schema in the highest normal form that is practical and appropriate for the data in the database.

le

All of the normal forms are nested in that each satisfies the constraints of the previous one.

Functional dependencies with a single attribute on the left-hand side: AÆB

3.1

CÆB

an

do

ut

su

In attempting to pinpoint the causes of anomalies, researchers have identified three kinds of dependencies: • Functional Dependencies • Multivalued Dependencies • Join Dependencies.

:/

/h

Functional Dependencies

ht

tp

A Functional dependency describes a relationship between attributes. Definition: If A and B are attributes of a relation R, then B is functionally dependent on A (denoted by A Æ B) if, and only if, each value of A in R is associated with exactly one value of B in R. ƒ ƒ

B

A

Functional dependencies with two attributes on the left-hand side: A,C Æ B B,C

A

The definition says that if two tuples in an extension R have the same value of A, they must have the same values for B. Thus, functional dependency requires that the value of a certain set of attributes determines uniquely the value of another set of attributes.

DBMS – Module 3

3

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization

Techniques to Achieve First Normal Form

3.2 Normal Forms ( 1NF, 2NF, 3NF, BCNF) 3.2.1 First Normal Form (1NF) Assume that a department may exist in different locations.

Remove the attribute dlocation that violates 1NF and place it in a separate relation deptLoc with the primary key dnumber

ge

s.

co

m

If we will try to store multiple locations in the same field of the department record, our table might appear as follows:

le

pa

Expand the key so that there will be a separate tuple in the original relation for each location. Disadvantage: Redundancy

su

go

it

e.

Definition: A relation R is in first normal form (1NF) if the domains of all attributes of R are atomic, i.e. every attribute is single-valued for each tuple.

og

This example violates the definition of first normal form.

do

ut

Each attribute in each row, or each cell of the table, contains only one value.

If a maximum number of values is known, replace the attribute by a maximum number of attributes. Disadvantage: Null Values

/h

an

No sets, lists, repeating fields or groups are allowed in the domain.

:/

Why is First Normal Form Important?

ht

tp

In order that the relational operators will work correctly. Example 1: If we perform the following query SELECT dname FROM department WHERE dlocation = 'Cairo' Should the record Research be included? Example 2: If we were to do a natural join with some other table using dlocation as the joining column, would this record be paired with only those having Cairo, Alex and Sharm as dlocation, or with those having either Cairo or Alex or Sharm. DBMS – Module 3

4

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization Full Functional Dependency and Second Normal Form

Assume we dependencies

have

the

following

Techniques to Achieve Second Normal Form

{ssn, dnumber} Æ hours ... ssn Æ ename dnumber Æ {dname, dlocation}

functional

Decompose the relation schema into a number of 2NF relations in which nonprime attributes will be associated only with part of the primary key on which they are fully functionally dependent.

s.

co

m

{ssn, dnumber} Æ hours {ssn, dnumber} Æ ename ... {ssn, dnumber} Æ dolocation ssn Æ ename dnumber Æ {dname, dlocation}

pa

le

og

go

it

Second Normal Form

su

3.2.2

Make sure to keep a relation with the original key and any attributes that are fully functionally dependent on it.

e.

We say that such attributes are not fully functionally dependent on the combination.

ge

We find attributes that are functionally dependent on the combination {ssn, dnumber} but also functionally dependent on a subset of that combination.

do

ut

A functional dependency X Æ Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold anymore.

:/

/h

an

An attribute of a relation schema R is a prime attribute of R if it is a member of some candidate key of R.

ht

tp

A relation R is in second normal form (2NF) if every nonprime attribute A in R is fully functionally dependent on the primary key of R.

Decomposition from 1NF to 2NF

Given the relation R(A,B,C,D) where (A, B) is the primary key and A Æ D is a functional dependency. Decomposition (lossless) into: R1 (A, D) with a primary key A. R2 (A, B, C) with a primary key (A,B) and foreign key A.

The test for 2NF involves testing for functional dependencies whose left-hand side attributes are part of the primary key. If the primary key contains a single attribute, the tests need not to be applied at all.

DBMS – Module 3

5

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization Transitive Dependency and Third Normal Form

You can remember the characteristics of third normal form by saying that each non-key attribute must depend on the key, the whole key and nothing but the key. Techniques to Achieve Third Normal Form

Primary key is sid (only candidate key): sid Æ credits

because

s.

Assume that number of credits determines status of a student:

ge

ssn Æ dnumber dnumber Æ dmangssn

pa

credits Æ status

The dependency of dmangssn on dnumber is undesirable since dnumber is not a key of the relation.

og

go

The same holds for the functional dependency dnumber Æ dname.

e.

sid Æ credits ^ credits Æ status ) sid Æ status

ut

su

it

We cannot insert the information that any student with 30 credits has sophomore status until we have such student. If we delete the record of the only student with a certain number of credits, we loose the information about the status associated with those credits

Decompose and set up a relation that includes the nonkey attribute(s) that functionally determine(s) other nonkey attribute(s).

/h

3.2.3 Third Normal Form

an

do

ƒ

le

sid functionally determines status in two ways, directly and transitively, through the non-key attribute status.

ƒ

co

m

ssn Æ dmangssn is transitive through dnumber

tp

:/

Aim of the third normal form is to eliminate certain transitive dependencies.

ht

A functional dependency in a relation schema R is a transitive dependency if there is a set of attributes Z such that X Æ Z and Z Æ Y hold. A functional dependency X Æ Y is trivial if Y

X.

Definition: A relation R is in third normal form (3NF) if, whenever a nontrivial functional dependency X Æ A holds in R, either 1. X is a superkey of R, or 2. A is prime attribute of R, i.e. member of some candidate key. DBMS – Module 3

Decomposition from 2NF to 3NF

Given the relation R(A,B,C) where A is the primary key and B Æ C is a functional dependency. Decomposition (lossless) into: R1(B,C) with a primary key B. R2(A,B) with a primary key A and foreign key B.

6

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization

3.2.4 Boyce-Codd Normal form

Techniques to Achieve Boyce Codd Form

Third normal form is sufficient for relations that have a single candidate key.

Decompose the NewFac relation NewFac(facName, dept, office, rank, dateHired)

Third normal form is deficient when there are multiple candidate keys that are composite and overlapping.

into Fac1 (dept, office) Fac2 (facName, office, rank, dateHired)

A relation schema R is in Boyce-Codd Normal form (BCNF) if whenever a nontrivial functional dependency X Æ A holds in R, then X is a superkey of R.

co

m

Fac1 and Fac2 are in BCNF since their only determinants are their keys, respectively.

s.

If we had chosen {facName, office} as the primary key of NewFac, we would have

ge

Every relation in BCNF is in 3NF; however a relation in 3NF is not necessarily in BCNF.

pa

office Æ dept.

Relation in 3NF but not in BCNF

le

Since office is not a superkey, the relation would not be in BCNF. In fact, it would not be in 2NF since dept would not be fully dependent on the key.

go

Summary of Normal Forms

e.

No two faculty members within a single department have the same name.

og

NewFac(facName, dept, office, rank, dateHired)

do

ut

su

it

Each faculty member has only once office. A department may have several faculty offices Faculty members form the same department may share offices. We have the following functional dependencies:

:/

/h

an

office Æ dept {facName, dept} Æ office, rank, dateHired {facName, office} Æ dept, rank, dateHired

ht

tp

{facName, dept} and {facName, office} are overlapping candidate keys. Assume that {facName, dept} is the primary key.

A well-developed set of rules of thumb is used to guide the normalization process based on primary keys. Normal Form Test 1NF Relation should have no nonatomic attributes 2NF For relations where primary key contains multiple attributes - no nonkey attribute should be functionally dependent on a part of the primary key 3NF Relation should not have a nonkey attribute functionally determined by another nonkey attribute

office is not a superkey. This violates BCNF. The relation is in 3NF since office is part of a candidate key.

DBMS – Module 3

7

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization

3.3 Normalization Process

Reasoning about Functional Dependencies

Normalization can be carried out using the heuristic approach demonstrated previously.

The schema designer specifies some functional dependencies (FDs) that are semantically obvious.

A formal approach is based strictly on functional dependencies and other types of constraints. • The approach uses formal normalization algorithms to create relation schemas. • All the attributes are placed in a single large relation called the universal relation. • Using functional dependencies and other constraints, the universal relation is decomposed into smaller relational schemas until the process reaches point where no further decomposition is preferred.

Given some FDs F, we can usually infer additional FDs that are implied by the set F. 3.3.1 Armstrong's Inference Rules: X then X Æ Y

m

Reflexivity : if Y

co

Augmentation : if X Æ Y then XZ Æ YZ

ge

s.

Transitivity : if X Æ Y and Y Æ Z then X Æ Z

pa

The rules are sound: Any functional dependency that can be derived form a given set of FDs using these rules is true.

og

le

The results of the decomposition process should have some important qualities: Relation schema in BCNF or at least in 3NF.

3.3.2 Closure of a set of Functional Dependencies

it

e.

Attribute Preservation, Dependency Preservation, and Lossless Decomposition

go

The rules are complete: No other rules are needed to be added to increase their effectiveness.

su

Properties of Relational Decompositions



ht

tp

:/

/h

an

do

ut

Attribute Preservation: Every attribute of the universal table appears in at least one of the relations) No data item is lost. • Dependency Preservation: The functional dependencies involve attributes that are all in the same table ) To avoid having to do joins. - It is always possible to find a dependency preserving decomposition that in 3NF. - It is not always possible to find a dependency preserving decomposition that in BCNF. R(J,K,L) JK Æ L and L Æ K Two candidate keys: JK and JL R is not in BCNF Any decomposition of R will fail to preserve JK Æ L. Lossless Decomposition: Natural join of the small relations produces exactly the universal relation ) To preserve all information in the original relation. DBMS – Module 3

For normalization, it is necessary to identify superkeys, candidate keys, and other determinants. This can be done if we have all the functional dependencies implied by a given set of functional dependencies. Closure of F (denoted by F+): is the set of all FDs that are implied by F. We can further simplify the computation of F+ by using additional rules: Union: if X Æ Y and X Æ Z then X Æ YZ Decomposition: if X Æ YZ then X Æ Y and X Æ Z Pseudo-transitivity: if X Æ Y and ZY Æ W then XZ Æ W These rules can be inferred from Armstrong's rules.

8

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization 3.3.3 Minimal Cover for a Set of Functional Dependencies

If we have tuples:

Two sets of functional dependencies are equivalent if E+ = F+. A set of functional dependencies F is minimal if

MVD Rules

F+), i.e. every dependency in G can be inferred from F.

Every FD is an MVD (promotion). ƒ If X ÆY, then swapping Y ’s between two tuples that agree on X doesn’t change the tuples. ƒ Therefore, the “new” tuples are surely in the relation, and we know X ÆY.

pa

og go



Complementation: If X Æ ÆY, and Z is all the other attributes, then XÆ ÆZ.

e. su

it

3.4 Multivalued Dependencies

/h

an

do

ut

Definition A multivalued dependency (MVD) on R, X Æ Æ Y , says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation.

tp

:/

i.e., for each value of X, the values of Y are independent of the values of R-X-Y.

ht

le

We calculate X+ in F and demonstrate that X+ contains the attributes of Y .

Drinkers (name, addr, phones, beersLiked) – A drinker’s phones are independent of the beers they like. ƒ name Æ Æphones and ƒ name Æ Æ beersLiked. – Thus, each of a drinker’s phones appears with each of the beers they like in all combinations. – This repetition is unlike FD redundancy. ƒ name Æ addr is the only FD. Tuples Implied by name Æ Æphones

DBMS – Module 3

b1 b2 b1 b2

ge



Algorithm for Cover: To prove that F is a cover for G, we examine each FD X Æ Y in G.

Example

p1 p2 p2 p1

Then these tuples must also be in the relation.

Cover: F is a cover for G if every FD in G is also in F+ (G

Minimal Cover F for a set of FDs G if F is a cover for G but no proper subset of F is a cover for G.

a a a a

m

-

phones beersLiked

co

-

sue sue sue sue

Every dependency in F has a single attribute for its right-hand side. We cannot replace any dependency X Æ A in F with a dependency Y Æ A where Y is a proper subset of X and still have a set of dependencies that is equivalent to F. We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.

addr

s.

-

name

Splitting Doesn’t Hold • Like FD’s, we cannot generally split the left side of an MVD. • But unlike FD’s, we cannot split the right side either --- sometimes you have to leave several attributes on the right side. Example Drinkers (name, areaCode, phone, beersLiked, manf) • • •

A drinker can have several phones, with the number divided between areaCode and phone (last 7 digits). A drinker can like several beers, each with its own manufacturer. Since the areaCode-phone combinations for a drinker are independent of the beersLikedmanf combinations, we expect that the following MVD’s hold: name Æ Æ areaCode phone name Æ Æ beersLiked manf 9

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization

phone

beersLiked

555-1111

Bud

555-1111

WickedAle

Decomposition and 4NF • If X ÆÆY is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF. 1. XY is one of the decomposed relations. 2. All but Y – X is the other.

555-9999

Bud

Example

555-9999

WickedAle

Drinkers (name, addr, phones, beersLiked) FD: name Æ addr MVD’s: name ÆÆ phones name Æ Æ beersLiked

Here is possible data satisfying these MVD’s:

But we cannot swap area codes or phones by themselves. That is, neither nameÆ ÆareaCode nor nameÆ Æphone holds for this relation.

s.

co

m

name areaCode manf Sue 650 A.B. Sue 650 Pete’s Sue 415 A.B. Sue 415 Pete’s

Key is {name, phones, beersLiked}. All dependencies violate 4NF.

pa

ge

-

le

3.4.1 Fourth Normal Form

Decompose using name Æ addr:

The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.

ut

su

it

e.



:/

Nontrivial MVD means that: 1. Y is not a subset of X, and 2. X and Y are not, together, all the attributes. Note that the definition of “superkey” still depends on FD’s only.

-

ht

tp

-

/h

an

do

4NF Definition • A relation R is in 4NF if: whenever XÆÆY is a nontrivial MVD, then X is a superkey.

BCNF versus 4NF •

Remember that every FD X ÆY is also an MVD, X ÆÆY. • Thus, if R is in 4NF, it is certainly in BCNF. - Because any BCNF violation is a 4NF violation (after conversion to an MVD). • But R could be in BCNF and not 4NF, because MVD’s are “invisible” to BCNF. DBMS – Module 3

1. Drinkers1 (name, addr) - In 4NF; only dependency is name Æ addr. 2. Drinkers2 (name, phones, beersLiked) - Not in 4NF. MVD’s name Æ Æ phones and name Æ Æ beersLiked apply. No FD’s, so all three attributes form the key.

go



og

-



Either MVD name Æ Æ phones or name Æ Æ beersLiked tells us to decompose to: - Drinkers3 (name, phones) - Drinkers4 (name, beersLiked)

3.5 Join Dependencies and Fifth Normal Form A relation schema R is in 5 NF with respect to a set F of Functional, Multivalued and join dependencies if, for every nontrivial join dependency JD(R1, R2, …Rn) in F+, every Ri is a superkey of R. 3.5.1 Domain-Key Normal Form ( ULTIMATE NORMAL FORM !! ) Each attribute value in a tuple is of the appropriate domain and that every key constraint is enforced. --Not used because of its practical utility is limited

For normal database applications normalization up to BCNF is enough.

10

http://handoutsuite.googlepages.com

Functional Dependencies & Normalization

3.6 Practical database design tuning

3.6.3 Database design process

3.6.1 The Information System Life Cycle

1. Requirements collection and analysis

1. Feasibility analysis 2. Concepual database design 2. Requirement collection and analysis

3. Design

co

4. Data model mapping

m

3. Choice of a DBMS

4. Implementation

ge

s.

5. Physical database design

5. Validation and acceptance testing

le

pa

6. Database system implementation and tuning

The Database Application System Life

it

3.6.2

e.

go

og

6. Deployment, operation and maintenance

su

Cycle

do

ut

1. System definition

/h

an

2. Database design

tp

:/

3. Database implementation

ht

4. Loading or data conversion

5. Application conversion

6. Testing and validation

7. Operation

8. Monitoring and maintenance DBMS – Module 3

11

http://handoutsuite.googlepages.com

Relational Algebra & SQL

3.7.1.1 Domains, Attributes, Tuples, and Relations

• • •

ge

• •

pa

Theoretically: a relation is a set of tuples; no tuple can occur more than once Mathematicians define a relation to be a subset of a Cartesian product of a list of domains.

le

3.7.1 Relational Model Concepts

go

Example: D1 = {1, 2} and D2 = {a, b, c} D1 X D2 = {(1, a), (1, b), (1, c), (2, a), (2, b), (2, c)}

e.

ut

su



it



The relational model is based on the concept of a relation. A relation is physically represented as a table or two-dimensional array. Tables are used to hold information about objects to be represented in the database.

og

What is a Relational Database?



A relation is represented as a two-dimensional table with columns and rows. Columns are called attributes or fields. The number of these columns is the arity (degree) of the relation The rows are called tuples. The number of rows is called cardinality. Each attribute has values taken from a domain, e.g., subj has domain string Domains: string, integer, real, date, . . . atomic types

m



co

Lessons from the Codd paper: A Relational Model for Data for Large Shared Data Banks • Let us separate physical level from conceptual level • Model the data independently from how it will be used (accessed, printed, etc.) • Describe the data mathematically • A relation describes an association between data items – tuples with attributes • We generally think of tables and rows, but that's somewhat imprecise • Use standard mathematical (logical) operations over the data – these are the relational algebra or relational calculus

s.

3.7 The Relational Data Model (1970)

ht

tp

:/

/h

an

do

Using the terms of EER model, both entity sets and relationships sets are shown using tables.

Describing Relations

A relation is defined by a schema, which specifies domains of each field in the relation. In DBMS, data definition language (DDL) is used like programming language type definitions In relational DBs, we use relation(attribute:domain) Student (sid:int, name:string, gpa:real) Enroll (sid:int, grade:string, cid:string) Course (cid:string, subj:string, sem:string) Teaches (fid:int, cid:string) Professor fid:int, name:string) Database people, when they are discussing design, often assume domains are evident to the reader: Student (sid, name, gpa)

Our focus now is the relational schema: set of tables DBMS – Module 3

12

http://handoutsuite.googlepages.com

Relational Algebra & SQL

3.7.1.3 Integrity Constraints (ICs)

3.7.1.2 Definitions: Schema, Relational database • A relation (instance) is a table (a set of tuples). • A schema is the structure of the table together with a specification of the domains and any other restrictions on possible values. • A (relational) database schema is a collection of schemas. • A relational database (instance) is a collection of tables, each has a distinct name.

m

Integrity Constraint (IC): condition that must be true for any instance of the database. Example: a schema specifies domains of the fields in the relation, and is called domain constraints. • ICs are specified when schema is defined. • Note: this might not be true in Web databases. • ICs are checked when relations are modified. • A legal instance of relation is one that satisfies all specific ICs. • DBMS does not allow illegal instances.

su

s.

ge

pa

le

it

Some attributes appear in more than one relation.

og



If the DBMS checks ICs, stored data is more faithful to real world meaning, that is, for consistency and accuracy. When ICs are enforced, DBMS also avoids data entry errors.

do

ut

Student (studID, lastName, firstname, major) Enroll (studID,classNo, grade)

/h

an

To distinguish between the two appearances of attributes, we use the relation name followed by a period followed by the attribute name.

:/

Enroll.studID

tp

Student.studID

When an attribute appears in more than one relation, its appearance usually represents a relationship or interaction between tuples of the two relations.

ht

3.7.1.4 Key Constraints: Entity Integrity

go



A relational database schema can have any number of relations. Relation schemas can be represented by giving the name of each relation followed by the attribute names in parentheses with the primary key underlined. The clearest way of showing foreign keys is by drawing arrows from the foreign keys to the primary keys they refer to.

A set of fields is a (candidate) key for a relation if

e.



co

Representing Relational Database Schemas

• • •

no two distinct tuples can have the same values in all key fields, any proper subset of the key does not satisfy this condition. If a set of fields satisfies condition 1 but does not satisfy condition 2, then it is called a super key for the relation.

If there are more than one candidate keys for a relation, one is chosen to be the primary key. Given a relation schema, the primary keys of the schema is underlined. Student (sid:int, name:string, gpa:real) Entity Integrity: in a relation no attribute of a primary key can have a null value.

DBMS – Module 3

13

http://handoutsuite.googlepages.com

Relational Algebra & SQL

3.8 The Relational Algebra

3.7.1.5

Key Constraints: Foreign Key Constraint Foreign key: A set of fields in one relation that is used to refer to a tuple in another relation. This set must correspond to the primary key of the second relation.

Queries and query languages

Query: A question about the data in a database. Query: A statement requesting the retrieval of information from a database.

Example: sid is a foreign key in Enroll referring to Student

Example: Find the names of students who are taking CSEN501.

Student (sid:int, name:string, gpa:real)

m

A query is applied to relation instances and the result of a query is also a relation instance.

co

Enroll (sid:int, grade:string, cid:string)

Query language: language in which queries are expressed.

ge

s.

Foreign key constraints: Let S1 be a foreign key in relation R1 referring to R2. Then the values of S1 in R1 must match (be a subset of) the values of S1 in R2.

og go



:/

/h

an

do

ut

su

it

e.

The database does not include any invalid foreign key value. That is, all foreign key constraints are enforced. Suppose Table B has a foreign key that points to a field in Table A. Referential integrity would prevent you from adding a record to Table B that does not exist in Table A. The referential integrity rules might also specify that whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted. This is called cascading delete.

Query languages are not intended to be used for complex calculations. Query languages support easy and efficient access to large data sets.

le



Referential Integrity:

pa

Query languages versus programming languages!

tp

3.7.2 Relational Data Manipulation Languages

tables to produce tables. Just as we operate numbers with arithmetic, we operate on tables with relational algebra. SQL is, roughly speaking, a generalization of relational algebra. Internal languages: an SQL query is rewritten as

relational algebra expression, which can in turn be rewritten into a more efficient form and evaluated using a bunch of well developed algorithms.

ht

Variety languages used by relational database management systems

Relational Algebra is a set of 6 operators that act on

Procedural languages: The user tells the system how to manipulate the data, e.g. Relational Algebra Declarative languages: the user states what data is needed but not exactly how it is to be located, e.g. Relational Calculus and SQL Graphical languages: allowing the user to give an example or an illustration of what data should be found, e.g. QBE

DBMS – Module 3

14

http://handoutsuite.googlepages.com

Relational Algebra & SQL

Relational Algebra

3.8.1 Projection

A set of operations (functions), each of which takes a relation (or relations) as input and produces a relation as output.

Given a list of column names A and a relation R. π A(R): extracts the columns in A from the relation R.

Basic operations: using these we can build up sophisticated database queries.

Example:

s.

co

m

Project [ π ] If we are interested in only some attributes of a relation, we use the Project operation to get those columns

ge

Select [ σ ] Is used to select a subset of tuples from a relation that satisfy a selection condition

pa

Result of the query

is

og

le

Union [ U ]

π sid, gpa (Student)

go

Set Difference [ ─ ]

it

e.

Cartesian Product [ X ]

do

ut

su

Rename [ ρ ] We can rename either the relation name, or the attribute names or both.

π A(R) has duplicate values.

Example:

an

Additional operations:

Suppose the result of

/h

Intersection [ ∩ ]

ht

tp

:/

Join [ ] Is used to combine related tuples from two relations into single tuples. and Division.

Among these Union, Intersection, Set Difference, Cartesian product are from mathematical set theory. The other group consists of operations developed specially for relational databases, are : Select, Project, Join

DBMS – Module 3

In relational algebra, the answer is always a set (has to eliminate duplicates). However, SQL and some other languages return, by default, a bag (don't eliminate duplicates)

15

http://handoutsuite.googlepages.com

Relational Algebra & SQL

3.8.2 Selection

3.8.3 Set Operations

Given a condition C and a relation R.

Set operations: S U T, S ─ T, S ∩ T •

σ C(R): extracts those rows from the relation R that satisfy C.



Example:

m



Union (S U T): a relation that includes all tuples that are either in S or in T or in both S and T. Duplicate tuples are eliminated. Intersection (S ∩ T): a relation that includes all tuples that are in both S and T. Difference (S ─ T): a relation that includes all tuples that are in S but not in T.

pa

ge

s.

co

Condition: All these operations must be unioncompatible: Same number of fields (same degree). Corresponding fields have the same domain (same type). Selection - What can go into the condition?

le

Set Operations - Union

e.

go

og

Condition C in σ C(R) is built up from Boolean operations on the field names:

su

it

Example: gpa < 2:0, sname = Ali.

σ gpa<2.0 ^ sname

/h

an

Question: What is the result of = Ali(Student):

v

do

ut

Predicates constructed from these using ^ (and) , (or) , ⌐ (not).

ht

tp

:/

Answer

DBMS – Module 3

16

http://handoutsuite.googlepages.com

Relational Algebra & SQL

Set Operations - Intersection

Set Operations - Intersection In relational algebra, basic set operations are union and set difference only. We can implement the other set operations using those basic operations.

s.

co

m

For example, for any relations S and T, we can already express S ∩ T S ∩ T = S ─ (S ─ T) It is mathematically nice to have fewer operators, however operations like set difference may be less efficient than intersection.

ge

3.8.4 Cartesian Product

og

le

pa

Product S x T connects two relations S and T that are not necessarily union-compatible.

ht

tp

:/

/h

an

do

ut

su

it

e.

go

Set Operations Difference

DBMS – Module 3

17

http://handoutsuite.googlepages.com

Relational Algebra & SQL

3.8.5 Natural Join

3.9 Structured Query Language (SQL)

S T: special case of conditional join, equality on common fields of S and T

3.9.1 Simple SQL Queries

• • •

Projection: Find the names of the students: Recall π sname(Student) In SQL Query :SELECT sname FROM Student

Equality condition only On all common fields Leave only one copy of these fields in the resulting relation.

m

Example:

s.

co

Selection: Find the courses taught by Slim Recall σ teacher = Slim(Course)

le

pa

ge

SELECT * FROM Course WHERE teacher = 'Slim'

go

og

Basic Syntax of SQL Queries

ut

su

it

e.

SELECT [DISTINCT] attribute-list FROM relation-list WHERE condition

an

do

What We Cannot Compute with Relational Algebra?

/h

Arithmetic operations, e.g., 3 + 3.

ht

tp

:/

Aggregate, e.g. the number of students who are taking CSEN501, or the average GPA of all students. IN SQL, these are possible - SQL has numerous extensions to relational algebra. Recursive queries: given a relation parent() compute the ancestor. These are not possible in SQL either. Complex structures, e.g. lists, arrays, nested relations . . . SQL cannot handle complex structures either, but they are possible in object-oriented data models and query languages. DBMS – Module 3

relation-list is a list of relation names, possibly with a range variable after some name. attribute list is a list of attributes of relations in relation-list. A * can be used to denote all attributes. You may rename the attributes condition Comparison: Attr op Const or Attr op Attr Boolean connectives: AND, OR, NOT. Other conditions: like performs pattern matching in string data, e.g. sname like 'f%' (%: one or more characters, _: one character) DISTINCT: is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated.

18

http://handoutsuite.googlepages.com

Relational Algebra & SQL

3.9.2 Update Language: Inserting New Tuples

3.9.4 Concept of a View in SQL

Single tuple insertion:

A view is a single table that is derived from other tables. These other tables could be base tables or previously defined views. A view doesn’t necessarily exist in physical form; it is considered a virtual table

INSERT INTO Student (sid, sname, gpa) VALUES (3, 'Maria', 0.7); An insert command that causes an IC violation is rejected!

CREATE VIEW AS SELECT … FROM … WHERE …

m

Deletion

ge

s.

co

To delete tuples satisfying a condition from some relation: DELETE FROM WHERE ;

le

pa

DELETE FROM Student WHERE ‘sname’ = ’Maria’ ;

ut

su

it

e.

To change certain attributes in certain tuples of a relation: UPDATE SET WHERE ;

go

og

Updates

an

do

UPDATE Student SET gpa = ’1.5’ WHERE sname = ’Maria’;

:/

/h

3.9.3 Creating tables in SQL

ht

tp

We use CREATE command Example: Create relations:

CREATE TABLE Student (sid INTEGER, sname CHAR(10), gpa REAL) CREATE TABLE Course (cid INTEGER, cname CHAR(10), credit INTEGER, teacher CHAR(10)) CREATE TABLE Enroll (sid INTEGER, cid INTEGER, grade CHAR(1))

DBMS – Module 3

19

http://handoutsuite.googlepages.com

Rule: The phone number is associated with room number and ... Answer: Use the store relation. Problem: .... If we had chosen {facName, office} as the primary.

789KB Sizes 6 Downloads 169 Views

Recommend Documents

No documents