- DBMS Tutorial
- What is Database Management System (DBMS)?
- Components of DBMS
- Applications of DBMS
- Three Schema DBMS Architecture
- Difference between DBMS and RDBMS?
- Difference between File Oriented System and DBMS
- Types of Data Models
- DBMS Schema and Instances
- Data Independence and Data Abstraction
- Database Users and Administrator
- DBMS Languages and Interfaces
DBMS ER Model
DBMS Relational Data Model
What is Normalization?
Normalization is the technique of decomposing relational tables with anomalies to generate smaller and well-organized relations. The fundamental aim of normalization is to reduce redundancy, which represents that data to be saved only once in a relation. Normalization was established by IBM researcher E.F. Codd in the 1970s.
Reasons for Normalization
There are four reasons for normalization are as follows.
- To structure the data so that any pertinent relationship between entities can be represented.
- To allow easy retrieval of data in response to query report requests.
- To facilitate the maintenance of the data through updation, insertions, and deletions.
- To reduce the need to reconstruct or reorganize information when new application requirements arise.
Advantages of Normalization
The following are the advantages of normalization.
- It is a more productive data mechanism.
- To avoid redundant fields or columns.
- It is a more flexible data structure, i.e., and we should be able to insert new rows and data values simply.
- It is easier to maintain the data structure, i.e., it is easy to perform operations, and complex queries can be easily managed.
- To minimizes data duplication.
- Free from modification anomalies.
Disadvantages of Normalization
The normalized database design suffers from some limitations in the following areas.
In a normalized database, extra joins are needed to recover information from numerous tables. Joins need additional I/O and may raise the system overhead than single-table lookups. In addition, more often, a normalized database acquires an extra CPU processor to carry out join logic and to keep up with data and referential integrity.
Proper normalization of the database is very important. There is no tested tool for detailed normalization. Most normalization is done manually. There is a chance of an improperly normalized database, which can cause significant problems.
Proper database design is also important because it can be very difficult to modify the structure of a database after the fact. Database restructuring cannot be easy as the database generally serves as the back-end of a larger application. Changes to the database may scrap the application.
Data Insertion and Maintenance
Though a fully normalized database makes it easy to retrieve data, sometimes insertion and updating rows can be problematic. It is very difficult when they have referential integrity on themselves. The object-oriented database management system (OODBMS) can solve these problems largely. Many of the complex Web applications, particularly banking and high-volume community sites, would be better served by using an OODBMS.
Normalization is a process of decomposing large, inefficiently structured relation schemas (tables) into smaller, more efficiently structured relation schemas (tables) that possess desirable properties without losing any information in the process. Normalization holds the proposition that a well-defined database includes redundant data to an absolute minimum. Thus, in turn, guarantees data integrity and ensures that the information retrieved from the database will be correct, accurate, and reliable.
Normalization eliminates the properties that are undesirable by working through a sequence of phases producing normal forms.
A relation in a specific normal form if it meets a particular prescribed set of conditions. The normal forms present designers of databases with the following:
- A chain of tests can be performed on individual relation schema, making the relational database normalized to a certain degree. A relation that violates the test should be decomposed into an association that meets the normalization tests separately. This occurs primarily if the test is unsuccessful.
- For the analysis of relation by a formal framework, it provides schemas on the basis of their keys and on the dependencies that are functional amongst their attributes.
Normal forms do not assure a high-quality database design even if they are considered in isolation from other factors. It is frequently not satisfactory to analyse independently that every relation schema in the database is, say, in BCNF or 3NF. Alternatively, the technique of normalization through decomposition has to certify the presence of extra features that the relational designs, captured together, need to acquire
Properties of Normal Forms:
- Non-additive join or lossless join property: This assures that the spurious tuple issue does not recur. Spurious tuples are defined as tuples that are not present in the original relation but produced by the normal join of decomposed sub relations.
- Dependency preservation property: This guarantees that all functional dependencies that are functional are characterized in some of the individuals, resulting in relations.
E.F.Codd developed the normalization process in the early 1970s. Normalization is often executed as a sequence of tests on a relational schema to establish whether it violates or satisfies the conditions of a normal form. Codd suggests three normal forms called first (1NF), second (2NF), and third (3NF) normal forms initially. Consequently, R.Boyce and Codd jointly launched a robust definition for the third normal form called Boyce-Codd Normal Form (BCNF).
Anomalies in DBMS
Changing the data in some relations can have undesirable consequences known as modification anomalies. Anomalies can be eliminated by modifying the structure of the relation.
Types of Modification Anomalies
There are three types of Modification Anomalies are as follows.
To understand these anomalies, let’s take an example:
Consider the following two sets of relation schemas and instances for each:
EMPLOYEE (EMP-NO, EMP-NAME, DESG, SALARY, DEPT-NO)
DEPT (DEPT-NO, DEPT-NAME, LOCATION)
The two different sets of relation schemas contain precisely the same information, though in slightly different ways. The EMPLOYEE relation includes many redundant data, and the details of each department are repeated for every employee working at that branch. In DEPT relation, the department details appear only once for each department, and only the department number is repeated in EMPLOYEE relation to indicate where each employee is located. The relations that include redundant information can have problems known as update anomalies, which are categorized as insertion anomalies, deletion anomalies, and modification anomalies.
Suppose we want to insert the details for a new department,”DEPT-No.04″, but no employee members are assigned to it. There is no way, in the EMPLOYEE relation, to represent this information.
We cannot merely enter values for the attributes belonging to employee information because one of these attributes is the key of the relation, and remember that no key attribute can be assigned a null value because this violates referential integrity. The relation in the EMPLOYEE table does not have this problem because the details for the department and the employees are maintained in separate relations. The department details would have to be inserted first in the DEPT relation, and after that, employee information is to be inserted into the EMPLOYEE table.
And if we want to insert the details for a new employee in the EMP relation, the new tuple must also include correct information about the branch where he will be working. If not, the data will become uncertain.
Suppose we want to delete a tuple from the EMP relation whose “EMP-No.002”, from the above table, then it will delete the employee designation, salary, and department information too, which cannot be recovered further.
Suppose we want to change the value of one of the attributes of a particular department. To change the address of a “DEPT-No.03” in the employee table, we have to update on every tuple in the relation instance. If this modification is not carried out on all the relevant tuples in the relation case, the relation and the database will become uncertain. In EMPLOYEE relation, update the location for all employees who work in “DEPT-No.03” requires the updation of only a single tuple in the DEPT relation. Modification anomalies are related to the first type of insertion anomaly.
It is to be noted that the EMPLOYEE relation still has an insertion anomaly to some extent. We cannot insert employee information without knowing his department number. We assume here that “DEPT-No” is the foreign key of the “EMPLOYEE” relation referencing the “DEPT” relation. The following set of relations, “WORK-ON” can be a well-designed database, which suffers no insertion or deletion anomalies and has little updation anomaly.
EMPLOYEE (EMP-NO, EMP-NAME, DESG, SALARY, DEPT-NO)
DEPT (DEPT-NO, DEPT-NAME, LOCATION)
WORK-ON (EMP-NO, DEPT-NO)
Apply now for Advanced DBMS Course