Previous Table of Contents Next


When embarking on a data warehousing project, many pitfalls can cripple the project. Characteristics of successful data warehouse projects generally include the following:

  Clear business justification for the project—Measurable benefits must be defined for the warehouse project (e.g., sales will increase by 10 percent, customer retention will increase by 15 percent). Warehouses are expensive, and the project must be able to measure the benefits.
  Staff is properly trained—Warehousing involves many new technologies, including SMP, MPP, and MDDB; the staff must be trained and comfortable with the new tools.
  Ensuring data quality and consistency—Since warehouses deal with historical data from a variety of sources, care must be taken to create a metadata manager that ensures common data definitions and records changes of historical data definitions.
  Subject privacy is ensured—Gathering data from many sources can lead to privacy violations. A good example is the hotel chain that targeted frequent hotel customers and sent a frequent-user coupon to their home addresses. Some spouses intercepted these mailings, leading to numerous divorces.
  The warehouse starts small and evolves—Some projects fail by defining too broad of a scope for the project. Successful projects consider their first effort a “prototype” and continue to evolve from that point.
  Intimate end-user involvement—Data warehouses cannot be developed in a vacuum. The system must be flexible to address changing end-user requirements, and the end users must understand the architecture so they are aware of the limitations of their warehouse.
  Properly planning the infrastructure—A new infrastructure must be designed to handle communications between data sources. Parallel computers must be evaluated and installed, and staff must be appropriately educated.
  Proper data modeling and stress testing—The data model must be validated and stress tested so that the finished system performs at acceptable levels. A model that works great at 10 gigabytes may not function as the warehouse grows to 100 gigabytes.
  Choosing the wrong tools—Many projects are led astray because of vendor hype. Unfortunately, many vendors inappropriately label their products as being for warehouse applications, or they exaggerate the functionality of their tools.

Data Aggregation And Drill-Down

One of the most fundamental principles of the multidimensional database is the idea of aggregation. As we know, managers at different levels require different levels of summarization to make intelligent decisions. To allow the manager to choose the level of aggregation, most warehouse offerings have a “drill-down” feature, allowing the user to adjust the level of detail, eventually reaching the original transaction data. For obvious performance reasons, the aggregations are precalculated and loaded into the warehouse during off hours.

Of the several types of aggregation, the most common is called a roll-up aggregation. An example of this type of aggregation would be taking daily sales totals and rolling them up into a monthly sales table—making this type relatively easy to compute and run. The more difficult type is the aggregation of boolean and comparative operators. For example, assume that a salesperson table contains a boolean column called “turkey.” A salesperson is a turkey if his or her individual sales are below the group average for the day. A salesperson may be a turkey on 15 percent of the individual days; however, when the data is rolled up into a monthly summary, a salesperson may become a turkey—even though they only had a few (albeit very bad) sales days. Table 10.2 shows the differences between the presentation and display styles for OLAP and MDDB servers.

Table 10.2 Differences between OLAP and MDDB.
OLAP MDDB
Presentation display List Crosstab
Extraction Select Compare
Series variable “dimensions” Columns User-defined

The base rule is simple: If the data looks like it would fit well into a spreadsheet, it is probably well-suited to an MDDB—or at least an MDDB representation.

Relational Answers To MDDB

Dr. Ralph Kimball, founder of Red Brick Systems, popularized the term STAR schema to describe a denormalization process that simulates the structure of a multidimensional database. With a STAR schema, the designer can simulate the functions of a multidimensional database without having to purchase expensive third-party software. Kimball describes denormalization as the prejoining of tables, such that the runtime application does not have to join tables. At the heart of the STAR schema is a fact table, usually comprised entirely of key values and raw data. A fact table is generally very long and may have millions of rows.

Surrounding the fact table is a series of dimension tables that serve to add value to the base information in the fact table. For example, consider the E/R model for a sales database shown in Figure 10.6.


Figure 10.6  A sample fully normalized schema design.

Here we see a standard Third Normal Form (3NF) database to represent the sales of items. No redundant information is given; therefore, salient data such as the total for an order would have to be computed from the atomic items that comprise the order. In this 3NF database, a list of line items would need to be created, multiplying the quantity ordered by the price for all items that belong in order 123.

In the following example, an intermediate table called TEMP is created to hold the result list:

CREATE TABLE TEMP AS
SELECT (quantity.QUANTITY_sold * ITEM.list_price) line_total
FROM QUANTITY, ITEM
WHERE
QUANTITY.order_nbr = 123
AND
QUANTITY.item_nbr = ITEM.item_nbr;
SELECT sum(line_total) FROM TEMP;

Also note that the state-city table hierarchy in this example is very deliberate. In order to be truly in Third Normal Form, we do not allow any redundant information (except, of course, foreign keys). Given that this example has been fully normalized into five tables, a query that would appear very simple to the end user would have relatively complex SQL.


Previous Table of Contents Next