Previous | Table of Contents | Next |
When embarking on a data warehousing project, many pitfalls can cripple the project. Characteristics of successful data warehouse projects generally include the following:
One of the most fundamental principles of the multidimensional database is the idea of aggregation. As we know, managers at different levels require different levels of summarization to make intelligent decisions. To allow the manager to choose the level of aggregation, most warehouse offerings have a drill-down feature, allowing the user to adjust the level of detail, eventually reaching the original transaction data. For obvious performance reasons, the aggregations are precalculated and loaded into the warehouse during off hours.
Of the several types of aggregation, the most common is called a roll-up aggregation. An example of this type of aggregation would be taking daily sales totals and rolling them up into a monthly sales tablemaking this type relatively easy to compute and run. The more difficult type is the aggregation of boolean and comparative operators. For example, assume that a salesperson table contains a boolean column called turkey. A salesperson is a turkey if his or her individual sales are below the group average for the day. A salesperson may be a turkey on 15 percent of the individual days; however, when the data is rolled up into a monthly summary, a salesperson may become a turkeyeven though they only had a few (albeit very bad) sales days. Table 10.2 shows the differences between the presentation and display styles for OLAP and MDDB servers.
OLAP | MDDB | |
---|---|---|
Presentation display | List | Crosstab |
Extraction | Select | Compare |
Series variable dimensions | Columns | User-defined |
The base rule is simple: If the data looks like it would fit well into a spreadsheet, it is probably well-suited to an MDDBor at least an MDDB representation.
Dr. Ralph Kimball, founder of Red Brick Systems, popularized the term STAR schema to describe a denormalization process that simulates the structure of a multidimensional database. With a STAR schema, the designer can simulate the functions of a multidimensional database without having to purchase expensive third-party software. Kimball describes denormalization as the prejoining of tables, such that the runtime application does not have to join tables. At the heart of the STAR schema is a fact table, usually comprised entirely of key values and raw data. A fact table is generally very long and may have millions of rows.
Surrounding the fact table is a series of dimension tables that serve to add value to the base information in the fact table. For example, consider the E/R model for a sales database shown in Figure 10.6.
Figure 10.6 A sample fully normalized schema design.
Here we see a standard Third Normal Form (3NF) database to represent the sales of items. No redundant information is given; therefore, salient data such as the total for an order would have to be computed from the atomic items that comprise the order. In this 3NF database, a list of line items would need to be created, multiplying the quantity ordered by the price for all items that belong in order 123.
In the following example, an intermediate table called TEMP is created to hold the result list:
CREATE TABLE TEMP AS SELECT (quantity.QUANTITY_sold * ITEM.list_price) line_total FROM QUANTITY, ITEM WHERE QUANTITY.order_nbr = 123 AND QUANTITY.item_nbr = ITEM.item_nbr; SELECT sum(line_total) FROM TEMP;
Also note that the state-city table hierarchy in this example is very deliberate. In order to be truly in Third Normal Form, we do not allow any redundant information (except, of course, foreign keys). Given that this example has been fully normalized into five tables, a query that would appear very simple to the end user would have relatively complex SQL.
Previous | Table of Contents | Next |