Quick Contact

    What is Normalization?

    Normalization is the technique of decomposing relational tables with anomalies to generate smaller and well-organized relations. The fundamental aim of normalization is to reduce redundancy, which represents that data to be saved only once in a relation. Normalization was established by IBM researcher E.F. Codd in the 1970s.

    Reasons for Normalization

    There are four reasons for normalization are as follows.

    1. To structure the data so that any pertinent relationship between entities can be represented.
    2. To allow easy retrieval of data in response to query report requests.
    3. To facilitate the maintenance of the data through updation, insertions, and deletions.
    4. To reduce the need to reconstruct or reorganize information when new application requirements arise.
    Advantages of Normalization

    The following are the advantages of normalization.

    1. It is a more productive data mechanism.
    2. To avoid redundant fields or columns.
    3. It is a more flexible data structure, i.e., and we should be able to insert new rows and data values simply.
    4. It is easier to maintain the data structure, i.e., it is easy to perform operations, and complex queries can be easily managed.
    5. To minimizes data duplication.
    6. Free from modification anomalies.
    Disadvantages of Normalization

    The normalized database design suffers from some limitations in the following areas.

    1. Performance

      In a normalized database, extra joins are needed to recover information from numerous tables. Joins need additional I/O and may raise the system overhead than single-table lookups. In addition, more often, a normalized database acquires an extra CPU processor to carry out join logic and to keep up with data and referential integrity.

    2. Normalization Procedures

      Proper normalization of the database is very important. There is no tested tool for detailed normalization. Most normalization is done manually. There is a chance of an improperly normalized database, which can cause significant problems.

    3. Database Restructuring

      Proper database design is also important because it can be very difficult to modify the structure of a database after the fact. Database restructuring cannot be easy as the database generally serves as the back-end of a larger application. Changes to the database may scrap the application.

    4. Data Insertion and Maintenance

      Though a fully normalized database makes it easy to retrieve data, sometimes insertion and updating rows can be problematic. It is very difficult when they have referential integrity on themselves. The object-oriented database management system (OODBMS) can solve these problems largely. Many of the complex Web applications, particularly banking and high-volume community sites, would be better served by using an OODBMS.

    Normalization Process

    Normalization is a process of decomposing large, inefficiently structured relation schemas (tables) into smaller, more efficiently structured relation schemas (tables) that possess desirable properties without losing any information in the process. Normalization holds the proposition that a well-defined database includes redundant data to an absolute minimum. Thus, in turn, guarantees data integrity and ensures that the information retrieved from the database will be correct, accurate, and reliable.

    Normalization eliminates the properties that are undesirable by working through a sequence of phases producing normal forms.

    A relation in a specific normal form if it meets a particular prescribed set of conditions. The normal forms present designers of databases with the following:

    • A chain of tests can be performed on individual relation schema, making the relational database normalized to a certain degree. A relation that violates the test should be decomposed into an association that meets the normalization tests separately. This occurs primarily if the test is unsuccessful.
    • For the analysis of relation by a formal framework, it provides schemas on the basis of their keys and on the dependencies that are functional amongst their attributes.

    Normal forms do not assure a high-quality database design even if they are considered in isolation from other factors. It is frequently not satisfactory to analyse independently that every relation schema in the database is, say, in BCNF or 3NF. Alternatively, the technique of normalization through decomposition has to certify the presence of extra features that the relational designs, captured together, need to acquire

    Properties of Normal Forms:
    1. Non-additive join or lossless join property: This assures that the spurious tuple issue does not recur. Spurious tuples are defined as tuples that are not present in the original relation but produced by the normal join of decomposed sub relations.
    2. Dependency preservation property: This guarantees that all functional dependencies that are functional are characterized in some of the individuals, resulting in relations.

    E.F.Codd developed the normalization process in the early 1970s. Normalization is often executed as a sequence of tests on a relational schema to establish whether it violates or satisfies the conditions of a normal form. Codd suggests three normal forms called first (1NF), second (2NF), and third (3NF) normal forms initially. Consequently, R.Boyce and Codd jointly launched a robust definition for the third normal form called Boyce-Codd Normal Form (BCNF).

    What is Normalization
    Anomalies in DBMS

    Changing the data in some relations can have undesirable consequences known as modification anomalies. Anomalies can be eliminated by modifying the structure of the relation.

    Types of Modification Anomalies

    There are three types of Modification Anomalies are as follows.

    What is Normalization

    To understand these anomalies, let’s take an example:

    Example

    Consider the following two sets of relation schemas and instances for each:

    EMPLOYEE (EMP-NO, EMP-NAME, DESG, SALARY, DEPT-NO)

    DEPT (DEPT-NO, DEPT-NAME, LOCATION)

    EMPLOYEE
    EMP-NO EMP-NAME DESG SALARY DEPT-NO
    001 Gautam Manager 30,000 01
    002 Rahul Developer 35,000 03
    003 Rajiv Teacher 20,000 02
    004 Neha System Analyst 22,000 03
    DEPT
    DEPT-NO DEPT-NAME LOCATION
    01 Administration Delhi
    02 Personnel Mumbai
    03 Project Noida
    EMPLOYEE
    EMP-NO EMP-NAME DESG SALARY DEPT-NO DEPT-NAME
    001 Gautam Manager 30,000 01 Administration
    002 Rahul Developer 35,000 03 Project
    003 Rajiv Teacher 20,000 02 Personnel
    004 Neha System Analyst 22,000 03 Project

    The two different sets of relation schemas contain precisely the same information, though in slightly different ways. The EMPLOYEE relation includes many redundant data, and the details of each department are repeated for every employee working at that branch. In DEPT relation, the department details appear only once for each department, and only the department number is repeated in EMPLOYEE relation to indicate where each employee is located. The relations that include redundant information can have problems known as update anomalies, which are categorized as insertion anomalies, deletion anomalies, and modification anomalies.

    Insertion Anomaly

    Suppose we want to insert the details for a new department,”DEPT-No.04″, but no employee members are assigned to it. There is no way, in the EMPLOYEE relation, to represent this information.

    EMP-NO EMP-NAME DESG SALARY DEPT-NO DEPT-NAME
    001 Gautam Manager 30,000 01 Administration
    002 Rahul Developer 35,000 03 Project
    003 Rajiv Teacher 20,000 02 Personnel
    004 Neha System Analyst 22,000 03 Project
    NULL NULL NULL NULL 04 SALES

    We cannot merely enter values for the attributes belonging to employee information because one of these attributes is the key of the relation, and remember that no key attribute can be assigned a null value because this violates referential integrity. The relation in the EMPLOYEE table does not have this problem because the details for the department and the employees are maintained in separate relations. The department details would have to be inserted first in the DEPT relation, and after that, employee information is to be inserted into the EMPLOYEE table.

    And if we want to insert the details for a new employee in the EMP relation, the new tuple must also include correct information about the branch where he will be working. If not, the data will become uncertain.

    Deletion Anomaly

    Suppose we want to delete a tuple from the EMP relation whose “EMP-No.002”, from the above table, then it will delete the employee designation, salary, and department information too, which cannot be recovered further.

    EMP-NO EMP-NAME DESG SALARY DEPT-NO DEPT-NAME
    001 Gautam Manager 30,000 01 Administration
    002 Rahul Deletion Anomaly
    003 Rajiv Teacher 20,000 02 Personnel
    004 Neha System Analyst 22,000 03 Project
    Updation Anomaly

    Suppose we want to change the value of one of the attributes of a particular department. To change the address of a “DEPT-No.03” in the employee table, we have to update on every tuple in the relation instance. If this modification is not carried out on all the relevant tuples in the relation case, the relation and the database will become uncertain. In EMPLOYEE relation, update the location for all employees who work in “DEPT-No.03” requires the updation of only a single tuple in the DEPT relation. Modification anomalies are related to the first type of insertion anomaly.

    It is to be noted that the EMPLOYEE relation still has an insertion anomaly to some extent. We cannot insert employee information without knowing his department number. We assume here that “DEPT-No” is the foreign key of the “EMPLOYEE” relation referencing the “DEPT” relation. The following set of relations, “WORK-ON” can be a well-designed database, which suffers no insertion or deletion anomalies and has little updation anomaly.

    Example

    EMPLOYEE (EMP-NO, EMP-NAME, DESG, SALARY, DEPT-NO)

    DEPT (DEPT-NO, DEPT-NAME, LOCATION)

    WORK-ON (EMP-NO, DEPT-NO)

    EMPLOYEE
    EMP-NO EMP-NAME DESG SALARY
    001 Gautam Manager 30,000
    002 Rahul Developer 35,000
    003 Rajiv Teacher 20,000
    004 Neha System Analyst 22,000
    DEPT
    DEPT-NO DEPT-NAME LOCATION
    01 Administration Delhi
    02 Personnel Mumbai
    03 Project Noida
    WORK-ON
    EMP-NO DEPT-NO
    001 01
    002 03
    003 02
    004 03

    Enroll Yourself in Live Training: DBMS Training

    Copyright 1999- Ducat Creative, All rights reserved.