Previous Table of Contents Next


What if your goal is to simplify the data structure by removing several of the one-to-many relationships? Adding redundancy poses two problems: You need additional space for the redundant item, as well as a technique to update the redundant item if it changes. One solution is to build a table of columns that roll the CITY and STATE tables into the CUSTOMER table. For the example in Table 1.1, assume that the STATE table contains 50 rows, the CITY table has 2,000 rows, and the CUSTOMER table has 10,000 rows.

From Table 1.1, you can see that the CITY and STATE tables can be removed entirely for a total savings of 400,000 bytes (refer to Figure 1.10). What about the cost_of_living field? If we choose to eliminate the CITY table, and duplicate cost_of_living in every CUSTOMER row, it would be necessary to visit each and every CUSTOMER row—which means changing the cost of living 10,000 times. Before this change, the following SQL was used to update each CITY table:


Figure 1.10  A denormalized E/R model sales database.

Table 1.1 Redundancy matrix to determine optimal normalization.
Column Size Duplication Total Space Change
state_bird 10 10,000 100,000 Rare
state_flower 10 10,000 100,000 Rare
region_name 2 10,000   20,000 Never
cost_of_living 8 10,000   80,000 Quarterly
city_mascot 10 10,000 100,000 Rare
UPDATE CITY SET cost_of_living = :var1
WHERE CITY = :var2;
2000 ROWS UPDATED

While the management of redundancy seems a formidable challenge, the following SQL UPDATE statement makes this change easily, and we can make the change to all affected rows:

UPDATE CUSTOMER SET cost_of_living = :var1
WHERE CITY = :var2;
100,000 ROWS UPDATED

Using this same state_bird query as before, we see how it is simplified by removing the extra tables:

SELECT state_bird
FROM CUSTOMER, ORDER, QUANTITY, ITEM
WHERE
item_name = 'BIRDSEED'
AND
ITEM.item_nbr = QUANTITY.item_nbr
AND
QUANTITY.order_nbr = ORDER.order_nbr
AND
ORDER.cust_nbr = CUSTOMER.cust_nbr;

It is still necessary to join three tables together, but this results in a much faster, simpler query than the original five-way table join. You can carry this concept to the point where this model is condensed into a single, highly redundant table.

Misleading Data Relationships

When creating an E/R model, it is often tempting to look at the data model from a purely logical perspective without any regard for the physical implications of the model. The designer strives to recognize and establish all of the logical relationships in the model while sometimes finding that the relationships are misleading. A relationship can be misleading when the relationship actually exists, but the application may have no need to reference this relationship. Consider the E/R model for a university shown in Figure 1.11.


Figure 1.11  An example of misleading data relationships.

Consider the association of the hair_color attribute to the student entity. Does a many-to-many data relationship really exist between hair_color and student? Many students have blonde hair, and blonde hair is common to many students. Why not create a many-to-many relationship between student and hair_color? The solution depends upon whether any other non-key data items exist within the hair_color entity.

If many other data items relating to hair color are present, then it is perfectly appropriate to create another entity called hair_color. But in this case, even though a many-to-many relationship exists between hair_color and student, hair_color is a stand-alone data attribute, so it is unnecessary to create an additional data structure.

Another example is the zip_code attribute in the student entity. At first glance, it appears that a violation of Third Normal Form (that is, a transitive dependency) has occurred between city and zip_code. In other words, it appears that a zip_code is paired with the city of residence for the student. If each city has many zip_codes, while each zip_code refers only to one city, it makes sense to model this as a one-to-many data relationship (see Figure 1.11). The presence of this data relationship requires creating a separate entity called zip with attached student entities. However, this is another case where the zip entity lacks key attributes, making it impractical to create the zip entity. In other words, zip_code has no associated data items. Creating a database table with only one data column would be nonsense.

This example demonstrates that it is not enough to group together “like” items and then identify the data relationships. A practical test must be made regarding the presence of no-key attributes within an entity class. If an entity has no attributes (that is, the table has only one field), the presence of the entity is nothing more than an index to the foreign key in the member entity. It can be removed from the E/R model. This technique not only simplifies the number of entities, but it creates a better environment for a client/server architecture. More data is logically grouped together, resulting in less data access effort. Figure 1.12 shows an example of correct relationships.


Figure 1.12  An example of correct many-to-many relationships.


Previous Table of Contents Next