Previous | Table of Contents | Next |
This is a very simple method for achieving the extraction, normalization, and insertion of the operational data into the STAR schema. By specifying the SYSDATE in the WHERE clause, we ensure that only the days transactions are extracted and loaded into the STAR schema FACT_TABLE. Of course, we are still undertaking a very large five-way table join, but we would hope to run this extraction during off hours where the retrieval would not impact the production users.
But what about rows that have been deleted? While uncommon, we still need to account for the possibility that some orders may be canceled. We need a mechanism for updating the STAR schema to reflect these deletions.
The most obvious method for removing deleted orders from the STAR schema is to create a DELETE trigger on the ORDER table of the operational system. This delete trigger will fire off a remote delete from the trigger to delete all rows from the STAR schema that are no longer valid:
CREATE TRIGGER delete_orders AFTER DELETE ON ORDER AS (DELETE FROM FACT_TABLE@london WHERE order_nbr = :del_ord );
We now have a mechanism for keeping our data warehouse in relative synchronization with the operational database.
What are we going to do as the FACT_TABLE expands beyond normal table capacity? Lets assume that our organization processes 20,000 orders daily, leading to 7.3 million rows per year. With Oracles efficient indexing, a table this large can create unique performance problems, primarily because the index must spawn many levels to properly index 7.3 million rows. Whereas a typical query might involve three index reads, a query against a 7-million-row table might involve five index reads before the target row is fetched.
To alleviate this problem, many designers will partition the table into smaller subtables, using the data as the distinguishing factor. As such, we may have a table for each month, with a name such as FACT_TABLE_1_96, FACT_TABLE_2_96, and so on.
Whenever we need to address multiple tables in a single operation, we can use the SQL UNION ALL verb to merge the tables together:
SELECT * FROM FACT_TABLE_1_96 UNION ALL SELECT * FROM FACT_TABLE_2_96 UNION_ALL SELECT * FROM FACT_TABLE_3_96 ORDER BY order_year, order_month;
Note: In addition to having the benefit of smaller table indexes, this type of table partitioning combined with the UNION ALL statement has the added benefit of allowing Oracles parallel query engine to simultaneously perform full-table scans on each of the subtables. In this case, a separate process would be invoked to process each of the three table scans. Oracle query manager would then gather the result data and sort in according to the ORDER BY clause. In the above example, we could expect a 50 percent performance improvement over a query against a single FACT_TABLE.
We have now defined and populated a STAR schema that contains the total_sales for each order for each day. While it is now easy to see the total for each order, rarely do the users of a decision support require this level of detail. Most managers would be more interested in knowing the sum of sales or units soldaggregated by month, quarter, region, and so on. Even with a STAR schema, these types of aggregation would be hard to compute at runtime with an acceptable response time. Essentially, we can either aggregate at runtime or preaggregate the data offline, making the totals available without realtime computation. One simple alternative to realtime aggregation is to write SQL to preaggregate the data according to the dimensions that the end user may want to see. In our example, lets assume that management would want to aggregate monthly sales by region, state, item type, and salesperson. Since we have four possible dimensions, we can generate a list of the following six aggregate tables to precreate. Assume that all of the tables would have a month_year field as their primary key:
The SQL to produce these table can be easily run as a batch task at end-of-month processing. For example, the SQL to create the region_by_state table might look like this:
INSERT INTO REGION_ITEM_TYPE VALUES (SELECT '3', '1996', region_name, item_type, total_cost) FROM FACT_TABLE_3_96 GROUP BY region_name, item_type );
The sample REGION_ITEM_TYPE table might look like the one shown in Table 10.3.
DATE | REGION | TYPE | MONTHLY_SALES |
---|---|---|---|
3/96 | WEST | Clothes | $113,999 |
3/96 | WEST | Hardware | $56,335 |
3/96 | WEST | Food | $23,574 |
3/96 | EAST | Clothes | $45,234 |
3/96 | EAST | Hardware | $66,182 |
3/96 | EAST | Food | $835,342 |
3/96 | SOUTH | Clothes | $1,223 |
3/96 | SOUTH | Hardware | $56,392 |
3/96 | SOUTH | Food | $9,281 |
3/96 | NORTH | Clothes | $826,463 |
3/96 | NORTH | Hardware | $77,261 |
3/96 | NORTH | Food | $43,383 |
These aggregate tables can be built in the middle of the night if need beright after the master fact tables have been populated with the days sales. The next morning, the prior days sales will have been rolled up into these summaries, giving management an accurate, fast, and easy-to-use tool for decision support.
Previous | Table of Contents | Next |