Normalization in database 101

As we read that data modeling is done for:

Reduce data redundancy
Data integrity
Support efficient querying

Normalisation is an essential part of database design that helps ensure that

Structure of your database is efficient
Avoid redundancy.

It builds upon the conceptual & logical data models we've discussed

Does this question ring a bell?

Is the table and their attributes optimal?
Is there any formal mathematical approach to design the data model?

We will consider the whole design as a single table & use some methods to obtain the optimal design.

Data Redundancy:Unnecessary repetition the stored data.

This causes storage overhead, and inconsistency in data

Key Concepts of Normalization

If we have table

staffNo |  name   | position | salary | branchNo | bAddress
  S21    prashant  manager     12000    B005      2200 Blvd
  S22    basnet    manager     22000    B003      Main Street 
  S21    John     Assistant    43000    B003      Hooks

Here if we ant to add new employee in branch B005, then we need to add new tuples

staffNo |  name   | position | salary | branchNo | bAddress
  S21    prashant  manager     12000    B005      2200 Blvd
  S22    basnet    manager     22000    B003      Main Street 
  S23    John     assistant    43000    B003      Hooks
  S24    Bohn     supervisor   30000    B005      Sybil

Insertion anomaly:

While adding new data, we have redundant data like branch Address
Insertion anomaly, mistakenly entering different branch address for new B005 branch.

Deletion Anomaly:

staffNo |  name   | position | salary | branchNo | bAddress
  S21    prashant  manager     12000    B005      2200 Blvd
  S22    basnet    manager     22000    B003      Main Street 
  S23    John     assistant    43000    B003      Hooks
  S24    Bohn     supervisor   30000    B005      Sybil
  S25    Subin    manager      35000    B007      Atlanta

IF we want to remove staffNo, S25, we are also removing branch B007 and branch address i.e atlanta. We don't have any other copy of it.

Modification Anomaly:

staffNo |  name   | position | salary | branchNo | bAddress

Let's say we have 1 million data in this table and 400,000 record are connected with B001, and we move the location of the branch B001, then the branch address needs to be updated for 400,000 datas

What is Normalization?

Splitting of the big table into organised small tables. Ultimate goal of normalization is to split big tables.

It is the process of splitting relations into well structured relations that allow users to insert, delete and update tuples without introducing database inconsistencies.

Staff
StaffNo |  SName | Position | salary | branchNo 

Branch
branchNo  |  bAddress

The concept of functional dependency?

staffNo |  name   | position | salary | branchNo | bAddress
  S21    prashant  manager     12000    B005      2200 Blvd
  S22    basnet    manager     22000    B003      Main Street 
  S23    John     assistant    43000    B003      Hooks
  S24    Bohn     supervisor   30000    B005      Sybil
  S25    Subin    manager      35000    B007      Atlanta

Position is functional dependent on staffNo.

When we know staff no, we can find the position of the that staff

for example S21 is manager.

What about is staffNo functionally dependent on position?

if i say manager, can you find the staffNo

All non key attributes are functionally dependent on key attribute

How do we represent this functional dependency?

R.b -> R.a

If we have the b attribute we can find the a attribute. This a attribute of R table is functional dependent on b table.

Ways to Normalization:

Consider whole database as a big table, break it into tables in steps
Breaking tables is done based on functional dependencies exist between its attributes.
It is done in steps, the result of applying each step is called normal level

We start from unnormalised form UNF,

UNF

If it includes composite attributes
most of the database these days does not support composite value.
for example

branchNo | branchAddress    | telNos
 B001.     8 Jefferson way    503-555-3618, 503-555-2727

1NMF: All relational database are already in 1NF.

if we are using atomic value then that means we are in 1NF.

branchNo   | branchAddress    
 B001.     8 Jefferson way  

branchNo     | telNos
 B001          503-555-3618
 B001          503-555-2727

thus it is 1NF

2NMF: Non key attributes should not have functional dependency to any subset of primary key

Applies to table that has composite primary key

here we can see

Name & assistant depends on staffNo only
Branch address depends on the branchNo only
hourPerWeek depends on (staffNo + branchNo)

So this can be further normalised.

here :

State & Risk is functional dependent on S#
Quantity is dependent on S# & P#

Thus we need to split the table

Let's analyse these tables. We made it to 2nd normal form. But we still see two non key attribute are functional dependent. Risk is dependent to the State. If we know state we can find the Risk factor. S# determines state, state determines Risk

This is also known as transitive dependency.

We will have delete and update anomaly.

3NMF
First the table needs to be in 2NMF.

How do we convert the 2NMF to 3NMF?

We need to break the table further & make sure no functional dependency exist between two non key attributes.