Previous Table of Contents Next


For example, the SQL to calculate the sum of all orders in the western region might look very complex, involving a five-way table join:

CREATE TABLE TEMP AS
SELECT (QUANTITY.quantity_sold * ITEM.list_price) line_total
FROM QUANTITY, ITEM, CUSTOMER, CITY, STATE
WHERE
QUANTITY.item_nbr = ITEM.item_nbr    /* join QUANTITY and ITEM */
AND
ITEM.cust_nbr = CUSTOMER.cust_nbr    /* join ITEM and CUSTOMER */
AND
CUSTOMER.cust_nbr = CITY.cust_nbr    /* join CUSTOMER and CITY */
AND
CITY.state_name = STATE.state_name   /* join CITY and STATE */
AND
STATE.region_name = 'WEST';

In the real world, of course, we would introduce enough redundancy to eliminate the city and state tables. The point is clear: A manager who wants to analyze a series of complete order totals would need to do a huge amount of realtime computation. Here we arrive at the basic tradeoff: If we want true freedom from redundant data, we must pay the price at query time.

Remember, the rules of database design have changed. Ten years ago, normalization theory emphasized the need to control redundancy and touted the benefits of a structure that was free of redundant data. Today, with disk prices at an all-time low, the attitude toward redundancy has changed radically. The relational vendors are offering a plethora of tools to allow snapshots and other methods for replicating data. Other vendors, such as UniSQL, are offering database products that allow for non–First Normal Form implementations. Today, it is perfectly acceptable to create First Normal Form implementations of normalized databases—prejoining the tables to avoid the high performance costs of runtime SQL joins.

The basic principle behind the STAR query schema is the introduction of highly redundant data for performance reasons. Let’s evolve the 3NF database into a STAR schema by creating a fact table to hold the quantity for each item sold. Essentially, a fact table is a First Normal Form representation of the database, with a very high degree of redundancy being added into the tables. This un-normalized design, shown in Figure 10.7, greatly improves the simplicity of the design—but at the expense of redundant data.


Figure 10.7  The completed STAR schema.

At first glance, it is hard to believe that this representation contains the same data as the fully normalized database. The new fact table will contain one row for each item on each order, resulting in a tremendous amount of redundant key information. Of course, the STAR query schema is going to require far more disk space than the 3NF database from which it was created, and the STAR schema would most likely be a read-only database due to the widespread redundancy that has been introduced into the model. Also, the widespread redundancy would make updating difficult, if not downright impossible.

Also note the dimension tables surrounding the fact table. Some of the dimension tables contain data that can be added to queries with joins, while other dimensions such as REGION do not contain any data and only serve as indexes to the data.

Considering the huge disk space consumption and read-only restriction, what does this STAR schema really buy for us? The greatest benefit is the simplicity of data retrieval. Now that we have a STAR schema, we can formulate SQL queries to quickly get the information that we desire.

For example, getting the total cost for an order now becomes simple:

SELECT sum(total_cost) order_total
FROM FACT
WHERE
FACT.order_nbr = 123;

By doing some of the work up front, the realtime query become both faster and simpler.

Now, let’s consider what would happen if the user of this schema wanted to analyze information by aggregate values. Assume our manager wants to know the breakdown of sales by region. The data is not organized by region, but the fact table can easily be queried to find the answer.

At this point, retrieving the sum of all orders for the western region becomes trivial, as shown in the following snippet:

SELECT sum(total_cost)
FROM FACT
WHERE
region = 'WEST';

In addition to making the query simpler in structure, all of the table joining has been eliminated, so we can easily get the extracted information from our STAR schema.


Note:  A value such as region would be an ideal candidate for the use of Oracle 7.3 bitmapped indexes. Columns that have less than three distinct values can see dramatic performance improvements by utilizing the bitmapped index technique. Bitmapped indexes are described later in this chapter.

The natural consequence of this approach is that many IS shops will keep two copies of their production databases: one in Third Normal Form for online transaction processing, and another denormalized version of the database for decision support and data warehouse applications.

Populating STAR Schemas With Distributed SQL

Although it is evident at this point that having several copies of the same database can sometimes be desirable, problems arise with this dual approach when attempting to keep the STAR schema in sync with the operational database. Fortunately, Oracle provides several mechanisms to assist in this synchronization. Since it is safe to assume that the STAR schema will be used by executives for long-range trend analysis, it is probably not imperative that the STAR schema be completely up-to-date with the operational database. Consequently, we can develop an asynchronous method for updating the STAR schema.

If we make this assumption, then a single SQL statement can be used to extract the data from the operational database and populate the new rows into the STAR schema. In Listing 10.1, we assume that the STAR schema resides at our corporate headquarters in London. The table is called the FACT_TABLE.

Listing 10.1 Updating the STAR schema.

INSERT INTO FACT_TABLE@london
VALUES
(SELECT
    order_year,
    order_quarter,
    order_month,
    order_nbr,
    salerperson_name,
    customer_name,
    customer_city,
    customer_state,
    customer_region,
    item_nbr,
    quantity_sold,
    price*quantity_sold
FROM QUANTITY, ITEM, CUSTOMER, CITY, STATE
WHERE
QUANTITY.item_nbr = ITEM.item_nbr     /* join QUANTITY and ITEM */
AND
ITEM.cust_nbr = CUSTOMER.cust_nbr     /* join ITEM and CUSTOMER */
AND
CUSTOMER.city_name = CITY.city_name   /* join CUSTOMER and CITY */
AND
CITY.state_name = STATE.state_name    /* join CITY and STATE */
AND
order_date = SYSDATE                  /* get only today's transactions */
);


Previous Table of Contents Next