Previous | Table of Contents | Next |
Of course, these tables are two-dimensional, but they can easily be massaged by an application to provide a tabular representation of the variables. Table 10.4 presents this tabular form.
Clothes | Food | Hardware | |
---|---|---|---|
WEST | $113,999 | $23,574 | $56,335 |
EAST | $45,234 | $835,342 | $66,182 |
NORTH | $826,463 | $43,383 | $77,261 |
SOUTH | $1,223 | $9,281 | $56,392 |
Note: This technique replicates the functionality of a multidimensional database where an end user can specify the axis of interest and the multidimensional database (MDDB) will build a tabular representation of the data.
But what if management wants to look at quarterly summaries instead of monthly summaries? What about yearly summaries? Of course, this same technique can be used to roll up the monthly summary tables into quarterly summaries, yearly summaries, and so on, according to the demands of the end user.
Dr. E. F. (Ted) Codd first coined the term OLAP in a 1993 report that was sponsored by Arbor Software. In addition to coining the term, Codd also went on to create 12 rules for OLAP. Despite the claims of a new technology, some offerings such as IRI date to the early 1970s.
Note: The Internet offers a popular forum that discusses OLAP issues. It is called comp.database.olap.
Simulation Of Cubic Databases (Dimensionality)
For an illustrative example, consider the sample customer table in Table 10.5. Assume that this table is physically stored in data order. We can imagine how this data might look as a cubic table by reviewing Figure 10.8.
Figure 10.8 Cubic representation of relational data.
Customer Name | # Sales | YY-MM | City | State |
---|---|---|---|---|
Bob Papaj & Assoc. | 300 | 91-01 | NY | NY |
Mark Reulbach Inc. | 400 | 91-01 | San Fran | CA |
Rick Willoughby Co. | 120 | 91-02 | NY | NY |
Kelvin Connor Co. | 300 | 91-02 | San Fran | CA |
Jame Gaston Inc. | 145 | 91-03 | NY | NY |
Linda ODell Assoc. | 337 | 91-03 | Fairport | NY |
Rick Wahl & Assoc. | 134 | 91-03 | San Fran | CA |
Of course, the cubic representation would require that the data be loaded into a multidimensional database or a spreadsheet that supported pivot tables. When considering an MDDB, two arguments emerge. The relational database vendors point out that MDDBs are proprietarythey feel that the more open relational databases should be used. The MDDB vendors point out some serious inadequacies with SQL that make it very difficult to use a relational database.
Keep in mind that dimensions may be hierarchical in nature, adding further confusion. A time dimension, for example, may be represented as a hierarchy with year, quarter, month, and day. Each of these levels in the dimension hierarchy may have its own values. In other words, a cubic representation with time as a dimension may be viewed in two ways:
MDDBs are most commonly used with data that is a natural fit for pivot tables, and it should come as no surprise that most MDDB sites are used with finance and marketing applications. Unfortunately, most multidimensional databases do not scale up well for warehouse applications. For example, the largest supported database for Essbase is about 20 gigabytes, whereas data warehouses with sizes measured in terabytes are not uncommon.
It is important to note that defining aggregation of a multidimensional database is no different than defining aggregate tables to a relational database. At load time, the database will still need to compute the aggregate values. MDDBs also employ the concept of sparse data. Since data is aggregated and presliced, some cells on a cube may not contain data. For example, consider a cube that tracks sales of items across a large company. The cells representing sales of thermal underwear would be null for Hawaii, while the sales of surfboards in Wyoming would also be null. Nearly all of the product offerings are able to maintain a mechanism for compressing out these types of null values.
Many traditional database designs can be used to simulate a data cube. One alternative to the cubic representation would be to leave the table in linear form, using SQL to join the table against itself to produce a result, as shown in Figure 10.9.
Figure 10.9 Joining a relational table against itself.
Lets take a look at a query that might require the self-joining of a table:
In our example, we compare all user sites where usage of our product has experienced a negative usage variance of greater than 5 percent between October of 1990 compared to December of 1990. A subset of this data can easily be extracted such that only California sites with more than 100 uses per month are displayed. For display, the user chose percentage variance, number of requests, site number, ZIP code, and city. Note the sort order of the report: It is sorted first by ZIP, followed by city, and then by percentage variance within city:
SELECT INTEGER (((E.NUMBER_OF_SALES - S.NUMBER_OF_SALES) / S.NUMBER_OF_SALES) * 100) , E.CUSTOMER_NAME , E.CITY_NAME , E.ZIP , S.NUMBER_OF_SALES , E.NUMBER_OF_SALES FROM DETAIL S , DETAIL E WHERE S.CUSTOMER_NAME = E.CUSTOMER_NAME AND E.STATE_ABBR = 'CA' AND E.DATE_YYMM = 9101 AND S.DATE_YYMM = 9201 AND E.NUMBER_OF_SALES < S.NUMBER_OF_SALES - (.05 * S.NUMBER_OF_SALES) ORDER BY E.ZIP ASC , E.CITY_NAME ASC , 1 ;
Previous | Table of Contents | Next |