High-Performance Oracle Database Applications:Tuning Oracle Data Warehouse And OLAP Applications

Table of Contents

When we contrast the data warehouse with a transaction-oriented, online system, the differences become apparent. These differences are shown in Table 10.1.

**Table 10.1** Differences between OLTP and data warehouse.
	OLTP	Data Warehouse
Normalization	High (3NF)	Low (1NF)
Table sizes	Small	Large
Number of rows/table	Small	Large
Size/duration of transactions	Small	Large
Number of online users	High (1,000s)	Low (< 100)
Updates	Frequent	Nightly
Full-table scans	Rarely	Frequently
Historical data	< 90 days	Years

Aside from the different uses for data warehouses, many developers are using relational databases to build their data warehouses and simulate multiple dimensions. Design techniques are being used for the simulations. This push toward STAR schema design has been somewhat successful, especially since designers do not have to buy a multidimensional database or invest in an expensive front-end tool. In general, using a relational database for OLAP is achieved by any combination of the following techniques:

• Prejoining tables together—This is an obtuse way of saying that a denormalized table is created from the normalized online database. A large prejoin of several tables is sometimes called a fact table in a STAR schema.

• Presummarization—This prepares the data for any drill-down requests that may come from the end user. Essentially, the different levels of aggregation are identified, and the aggregate tables are computed and populated when the data is loaded.

• Massive denormalization—The side effect of very inexpensive disks has been the rethinking of the merits of Third Normal Form. Today, redundancy is widely accepted, as seen by the popularity of replication tools, snapshot utilities, and non–First Normal Form databases. If you can precreate every possible result table at load time, your end user will enjoy excellent response time when making queries. The STAR schema is an example of massive denormalization.

• Controlled periodic batch updating—New detail data is rolled into the aggregate table on a periodic basis while the online system is down, with all summarization recalculated as the new data is introduced into the database. While data loading is important, it is only one component of the tools for loading a warehouse. There are several categories of tools that are used to populate the warehouse, including:

• Data extraction tools—Different hardware and databases

• Metadata repository—Holds common definitions

• Data cleaning tools—Tools for ensuring uniform data quality

• Data sequencing tools—RI rules for the warehouse

• Warehouse loading tools—Tools for populating the data warehouse

As we know, most data warehouses are loaded in batch mode after the online system has been shut down. In this sense, a data warehouse is bi-modal, with a highly intensive loading window, and an intensive read-only window during the day. Because many data warehouses collect data from nonrelational databases such as IMS or CA-IDMS, no standard methods for extracting data are available for loading into a warehouse. However, we do have a few common techniques for extracting and loading data:

• Log “sniffing”—Apply the archived redo logs from the OLTP system to the data warehouse.

• Using update, insert, and delete triggers—Fire off a distributed update to the data warehouse.

• Use snapshot logs to populate the data warehouse—Update the replicated tables and change log files.

• Run nightly extract/load programs—Retrieve the operational data and load it into the warehouse.

There are several methods that can be used to aggregate data within OLAP servers, as shown in Figure 10.4. As you can see, this method extracts data from the relational engine, summarizing the data for display. Another popular method preaggregates the data and keeps the summarized data ready for retrieval.

Figure 10.4 Aggregation and OLAP servers.

Two major changes have occurred over the past several years that have driven the movement toward data warehousing:

• Disk space became inexpensive—One gigabyte of disk carried a price tag of $100,000 in 1988. Today, one gigabyte is less than $1,000. To support large data warehouses, it is not uncommon to require terabytes of disk storage.

• The movement into open systems—The migration away from centralized processors has led to data residing on a plethora of different computer and database architectures.

Since data is collected from a variety of sources, many warehouse installations find it necessary to create a metadata repository. But what is the role of a metadata repository? When data is consolidated from a variety of diverse systems, many intrasystem problems can arise. These issues include:

• Homonyms—Different columns with the same name

• Synonyms—The same column with different names

• Unit incompatibilities—Inches versus centimeters, dollars versus yen

• Enterprise referential integrity—Business rules that span operational systems

• The warehouse design rules—Determining how the tables will be built

• Horizontal denormalization (fat tables)

• Vertical denormalization (chunking tables based on time periods)

• Using multidimensional front ends

There are other alternatives than using a “pure” multidimensional database (MDDB). One common approach is to insert a metadata serve between the OLTP relational database and the query tool, as shown in Figure 10.5.

Figure 10.5 Using metadata repositories for multidimensional databases.

Examples of this approach include:

• DSS Agent by MicroStrategy

• MetaCube by Stanford Technology Group

• HOLOS by Holistic Systems

Table of Contents