Is there any formal mathematical approach to design the data model?
We will consider the whole design as a single table & use some methods to obtain the optimal design.
Data Redundancy:Unnecessary repetition the stored data.
This causes storage overhead, and inconsistency in data
Key Concepts of Normalization
If we have table
staffNo | name | position | salary | branchNo | bAddress
S21 prashant manager 12000 B005 2200 Blvd
S22 basnet manager 22000 B003 Main Street
S21 John Assistant 43000 B003 Hooks
Here if we ant to add new employee in branch B005, then we need to add new tuples
staffNo | name | position | salary | branchNo | bAddress
S21 prashant manager 12000 B005 2200 Blvd
S22 basnet manager 22000 B003 Main Street
S23 John assistant 43000 B003 Hooks
S24 Bohn supervisor 30000 B005 Sybil
Insertion anomaly:
While adding new data, we have redundant data like branch Address
Insertion anomaly, mistakenly entering different branch address for new B005 branch.
Deletion Anomaly:
staffNo | name | position | salary | branchNo | bAddress
S21 prashant manager 12000 B005 2200 Blvd
S22 basnet manager 22000 B003 Main Street
S23 John assistant 43000 B003 Hooks
S24 Bohn supervisor 30000 B005 Sybil
S25 Subin manager 35000 B007 Atlanta
IF we want to remove staffNo, S25, we are also removing branch B007 and branch address i.e atlanta. We don't have any other copy of it.
Modification Anomaly:
staffNo | name | position | salary | branchNo | bAddress
Let's say we have 1 million data in this table and 400,000 record are connected with B001, and we move the location of the branch B001, then the branch address needs to be updated for 400,000 datas
What is Normalization?
Splitting of the big table into organised small tables. Ultimate goal of normalization is to split big tables.
It is the process of splitting relations into well structured relations that allow users to insert, delete and update tuples without introducing database inconsistencies.
staffNo | name | position | salary | branchNo | bAddress
S21 prashant manager 12000 B005 2200 Blvd
S22 basnet manager 22000 B003 Main Street
S23 John assistant 43000 B003 Hooks
S24 Bohn supervisor 30000 B005 Sybil
S25 Subin manager 35000 B007 Atlanta
Position is functional dependent on staffNo.
When we know staff no, we can find the position of the that staff
for example S21 is manager.
What about is staffNo functionally dependent on position?
if i say manager, can you find the staffNo
All non key attributes are functionally dependent on key attribute
How do we represent this functional dependency?
R.b -> R.a
If we have the b attribute we can find the a attribute. This a attribute of R table is functional dependent on b table.
Ways to Normalization:
Consider whole database as a big table, break it into tables in steps
Breaking tables is done based on functional dependencies exist between its attributes.
It is done in steps, the result of applying each step is called normal level
We start from unnormalised form UNF,
UNF
If it includes composite attributes
most of the database these days does not support composite value.
for example
branchNo | branchAddress | telNos
B001. 8 Jefferson way 503-555-3618, 503-555-2727
1NMF: All relational database are already in 1NF.
if we are using atomic value then that means we are in 1NF.
branchNo | branchAddress
B001. 8 Jefferson way
branchNo | telNos
B001 503-555-3618
B001 503-555-2727
thus it is 1NF
2NMF: Non key attributes should not have functional dependency to any subset of primary key
Applies to table that has composite primary key
here we can see
Name & assistant depends on staffNo only
Branch address depends on the branchNo only
hourPerWeek depends on (staffNo + branchNo)
So this can be further normalised.
here :
State & Risk is functional dependent on S#
Quantity is dependent on S# & P#
Thus we need to split the table
Let's analyse these tables. We made it to 2nd normal form. But we still see two non key attribute are functional dependent. Risk is dependent to the State. If we know state we can find the Risk factor. S# determines state, state determines Risk
This is also known as transitive dependency.
We will have delete and update anomaly.
3NMF
First the table needs to be in 2NMF.
How do we convert the 2NMF to 3NMF?
We need to break the table further & make sure no functional dependency exist between two non key attributes.
As we read that data modeling is done for:
Normalisation is an essential part of database design that helps ensure that
It builds upon the conceptual & logical data models we've discussed
Does this question ring a bell?
We will consider the whole design as a single table & use some methods to obtain the optimal design.
This causes storage overhead, and inconsistency in data
Key Concepts of Normalization
If we have table
Here if we ant to add new employee in branch B005, then we need to add new tuples
Insertion anomaly:
Deletion Anomaly:
IF we want to remove staffNo, S25, we are also removing branch B007 and branch address i.e atlanta. We don't have any other copy of it.
Modification Anomaly:
Let's say we have 1 million data in this table and 400,000 record are connected with B001, and we move the location of the branch B001, then the branch address needs to be updated for 400,000 datas
What is Normalization?
The concept of functional dependency?
Position is functional dependent on staffNo.
When we know staff no, we can find the position of the that staff
What about is staffNo functionally dependent on position?
All non key attributes are functionally dependent on key attribute
How do we represent this functional dependency?
If we have the b attribute we can find the a attribute. This a attribute of R table is functional dependent on b table.
Ways to Normalization:
We start from unnormalised form UNF,
thus it is 1NF
here we can see
So this can be further normalised.
here :
Thus we need to split the table
Let's analyse these tables. We made it to 2nd normal form. But we still see two non key attribute are functional dependent. Risk is dependent to the State. If we know state we can find the Risk factor. S# determines state, state determines Risk
We will have delete and update anomaly.
How do we convert the 2NMF to 3NMF?
We need to break the table further & make sure no functional dependency exist between two non key attributes.