Previous Table of Contents Next


Of course, these tables are two-dimensional, but they can easily be massaged by an application to provide a tabular representation of the variables. Table 10.4 presents this tabular form.

Table 10.4 REGION versus TYPE.
Clothes Food Hardware
WEST $113,999 $23,574 $56,335
EAST $45,234 $835,342 $66,182
NORTH $826,463 $43,383 $77,261
SOUTH $1,223 $9,281 $56,392


Note:  This technique replicates the functionality of a multidimensional database where an end user can specify the axis of interest and the multidimensional database (MDDB) will build a tabular representation of the data.

But what if management wants to look at quarterly summaries instead of monthly summaries? What about yearly summaries? Of course, this same technique can be used to roll up the monthly summary tables into quarterly summaries, yearly summaries, and so on, according to the demands of the end user.

History Of OLAP

Dr. E. F. (Ted) Codd first coined the term OLAP in a 1993 report that was sponsored by Arbor Software. In addition to coining the term, Codd also went on to create 12 rules for OLAP. Despite the claims of a new technology, some offerings such as IRI date to the early 1970s.


Note:  The Internet offers a popular forum that discusses OLAP issues. It is called comp.database.olap.

Simulation Of Cubic Databases (Dimensionality)

For an illustrative example, consider the sample customer table in Table 10.5. Assume that this table is physically stored in data order. We can imagine how this data might look as a cubic table by reviewing Figure 10.8.


Figure 10.8  Cubic representation of relational data.

Table 10.5 A sample customer table.
Customer Name # Sales YY-MM City State
Bob Papaj & Assoc. 300 91-01 NY NY
Mark Reulbach Inc. 400 91-01 San Fran CA
Rick Willoughby Co. 120 91-02 NY NY
Kelvin Connor Co. 300 91-02 San Fran CA
Jame Gaston Inc. 145 91-03 NY NY
Linda O’Dell Assoc. 337 91-03 Fairport NY
Rick Wahl & Assoc. 134 91-03 San Fran CA

Of course, the cubic representation would require that the data be loaded into a multidimensional database or a spreadsheet that supported pivot tables. When considering an MDDB, two arguments emerge. The relational database vendors point out that MDDBs are proprietary—they feel that the more open relational databases should be used. The MDDB vendors point out some serious inadequacies with SQL that make it very difficult to use a relational database.

Keep in mind that dimensions may be hierarchical in nature, adding further confusion. A time dimension, for example, may be represented as a hierarchy with year, quarter, month, and day. Each of these “levels” in the dimension hierarchy may have its own values. In other words, a cubic representation with time as a dimension may be viewed in two ways:

  A series of cubes—one for year, another for quarter, and another with full_date
  A five-dimension table

MDDBs are most commonly used with data that is a natural fit for pivot tables, and it should come as no surprise that most MDDB sites are used with finance and marketing applications. Unfortunately, most multidimensional databases do not scale up well for warehouse applications. For example, the largest supported database for Essbase is about 20 gigabytes, whereas data warehouses with sizes measured in terabytes are not uncommon.

It is important to note that defining aggregation of a multidimensional database is no different than defining aggregate tables to a relational database. At load time, the database will still need to compute the aggregate values. MDDBs also employ the concept of sparse data. Since data is aggregated and presliced, some cells on a cube may not contain data. For example, consider a cube that tracks sales of items across a large company. The cells representing sales of thermal underwear would be null for Hawaii, while the sales of surfboards in Wyoming would also be null. Nearly all of the product offerings are able to maintain a mechanism for compressing out these types of null values.

Alternatives To Cubic Data Representation

Many traditional database designs can be used to simulate a data cube. One alternative to the cubic representation would be to leave the table in linear form, using SQL to join the table against itself to produce a result, as shown in Figure 10.9.


Figure 10.9  Joining a relational table against itself.

Let’s take a look at a query that might require the self-joining of a table:

  Show all customers in Hawaii who purchased our product more than 500 times.
  Show all customers in L.A. who purchase less than 10 times/month.
  Show all large customers (buying more than 100 items per month) in Alaska whose usage has dropped more than 10 percent in 1990.
  Show all customers in New York whose usage in March of 1990 deviated more than 20 percent from their usage in March of 1991.
  Show all customers in California where the company name contains “Widget” and usage has dropped more than 20 percent in 1991.

In our example, we compare all user sites where usage of our product has experienced a negative usage variance of greater than 5 percent between October of 1990 compared to December of 1990. A subset of this data can easily be extracted such that only California sites with more than 100 uses per month are displayed. For display, the user chose percentage variance, number of requests, site number, ZIP code, and city. Note the sort order of the report: It is sorted first by ZIP, followed by city, and then by percentage variance within city:

SELECT INTEGER
(((E.NUMBER_OF_SALES - S.NUMBER_OF_SALES) / S.NUMBER_OF_SALES) * 100) ,
E.CUSTOMER_NAME , E.CITY_NAME , E.ZIP , S.NUMBER_OF_SALES , E.NUMBER_OF_SALES

FROM DETAIL S , DETAIL E

WHERE
S.CUSTOMER_NAME = E.CUSTOMER_NAME
 AND
E.STATE_ABBR = 'CA'
 AND
E.DATE_YYMM = 9101
 AND
S.DATE_YYMM = 9201
 AND
E.NUMBER_OF_SALES < S.NUMBER_OF_SALES - (.05 * S.NUMBER_OF_SALES)

ORDER BY E.ZIP ASC , E.CITY_NAME ASC , 1 ;


Previous Table of Contents Next