Previous Table of Contents Next


This is a very simple method for achieving the extraction, normalization, and insertion of the operational data into the STAR schema. By specifying the SYSDATE in the WHERE clause, we ensure that only the day’s transactions are extracted and loaded into the STAR schema FACT_TABLE. Of course, we are still undertaking a very large five-way table join, but we would hope to run this extraction during off hours where the retrieval would not impact the production users.

But what about rows that have been deleted? While uncommon, we still need to account for the possibility that some orders may be canceled. We need a mechanism for updating the STAR schema to reflect these deletions.

The most obvious method for removing deleted orders from the STAR schema is to create a DELETE trigger on the ORDER table of the operational system. This delete trigger will fire off a remote delete from the trigger to delete all rows from the STAR schema that are no longer valid:

CREATE TRIGGER delete_orders
    AFTER DELETE ON ORDER
AS
(DELETE FROM FACT_TABLE@london
    WHERE
    order_nbr = :del_ord
);

We now have a mechanism for keeping our data warehouse in relative synchronization with the operational database.

What are we going to do as the FACT_TABLE expands beyond normal table capacity? Let’s assume that our organization processes 20,000 orders daily, leading to 7.3 million rows per year. With Oracle’s efficient indexing, a table this large can create unique performance problems, primarily because the index must spawn many levels to properly index 7.3 million rows. Whereas a typical query might involve three index reads, a query against a 7-million-row table might involve five index reads before the target row is fetched.

To alleviate this problem, many designers will partition the table into smaller subtables, using the data as the distinguishing factor. As such, we may have a table for each month, with a name such as FACT_TABLE_1_96, FACT_TABLE_2_96, and so on.

Whenever we need to address multiple tables in a single operation, we can use the SQL UNION ALL verb to merge the tables together:

SELECT * FROM FACT_TABLE_1_96
UNION ALL
SELECT * FROM FACT_TABLE_2_96
UNION_ALL
SELECT * FROM FACT_TABLE_3_96
ORDER BY order_year, order_month;


Note:  In addition to having the benefit of smaller table indexes, this type of table partitioning combined with the UNION ALL statement has the added benefit of allowing Oracle’s parallel query engine to simultaneously perform full-table scans on each of the subtables. In this case, a separate process would be invoked to process each of the three table scans. Oracle query manager would then gather the result data and sort in according to the ORDER BY clause. In the above example, we could expect a 50 percent performance improvement over a query against a single FACT_TABLE.

Aggregation, Roll-Ups, And STAR Schemas

We have now defined and populated a STAR schema that contains the total_sales for each order for each day. While it is now easy to see the total for each order, rarely do the users of a decision support require this level of detail. Most managers would be more interested in knowing the sum of sales or units sold—aggregated by month, quarter, region, and so on. Even with a STAR schema, these types of aggregation would be hard to compute at runtime with an acceptable response time. Essentially, we can either aggregate at runtime or preaggregate the data offline, making the totals available without realtime computation. One simple alternative to realtime aggregation is to write SQL to preaggregate the data according to the dimensions that the end user may want to see. In our example, let’s assume that management would want to aggregate monthly sales by region, state, item type, and salesperson. Since we have four possible dimensions, we can generate a list of the following six aggregate tables to precreate. Assume that all of the tables would have a month_year field as their primary key:

  Region by state—This table would have region_name, state_name, and monthly_sales as columns.
  Region by item_type—This table would have region_name, item_type, and monthly_sales as columns.
  Region by salesperson—This table would have region_name, salesperson_name, and monthly_sales as columns.
  State by item_type—This table would have state_name, item_type, and monthly_sales as columns.
  State by salesperson—This table would have state_name, salesperson_name, and monthly_sales as columns.
  item_type by salesperson—This table would have item_type, salesperson_name, and monthly_sales as columns.

The SQL to produce these table can be easily run as a batch task at end-of-month processing. For example, the SQL to create the region_by_state table might look like this:

INSERT INTO REGION_ITEM_TYPE
VALUES
(SELECT '3', '1996', region_name, item_type, total_cost)
FROM FACT_TABLE_3_96
GROUP BY region_name, item_type
);

The sample REGION_ITEM_TYPE table might look like the one shown in Table 10.3.

Table 10.3 A sample REGION_ITEM_TYPE table.
DATE REGION TYPE MONTHLY_SALES
3/96 WEST Clothes $113,999
3/96 WEST Hardware $56,335
3/96 WEST Food $23,574
3/96 EAST Clothes $45,234
3/96 EAST Hardware $66,182
3/96 EAST Food $835,342
3/96 SOUTH Clothes $1,223
3/96 SOUTH Hardware $56,392
3/96 SOUTH Food $9,281
3/96 NORTH Clothes $826,463
3/96 NORTH Hardware $77,261
3/96 NORTH Food $43,383

These aggregate tables can be built in the middle of the night if need be—right after the master fact tables have been populated with the day’s sales. The next morning, the prior day’s sales will have been rolled up into these summaries, giving management an accurate, fast, and easy-to-use tool for decision support.


Previous Table of Contents Next