Previous Table of Contents Next


Oracle Clusters

Clustering is a very important concept for improving client-server performance. When traversing a database, reducing I/O always improves throughput. The concept of clustering is very similar to the use of the VIA set in the CODASYL Network database model, where member records are stored physically near their parent records. For Oracle, clusters can be used to define common one-to-many access paths, and the member rows can be stored on the same database block as their owner row. For example, assume that we have a one-to-many relationship between customers and orders. If our application commonly access the data from customer to order, we can cluster the order rows on the same database block as the customer row. In this way, we’ll receive the list of all orders for a customer in a single I/O. (See Figure 2.5.) Of course, we will need to size the database blocks with db_block_size so that an entire order will fit onto a single database block. For more information on db_block_size, refer to Chapter 6, Oracle DBA Performance And Tuning.

We need to note one important issue, however: While a cluster will tremendously improve performance in one direction, queries in the other direction will suffer. For example, consider the many-to-many relationship between customers and orders. We have the junction table, ORDER_LINE, at the intersection of this many-to-many relationship and we need to decide which owner, ORDER or ITEM, will be the anchor for our cluster. If we commonly traverse from order to item (e.g., displaying an order form), it would make sense to cluster the ORDER_LINE records on the same database block as their ORDER owner. If, on the other hand, we commonly traversed from ITEM to ORDER (e.g., requesting the details for all orders containing widgets), we would cluster the ORDER_LINE rows near their ITEM owner. If we cluster on the ORDER owner, database queries that display order forms will be very fast, while queries in the other direction will have to do additional I/O.

Oracle Parallel Query

One of the most exciting performance features of Oracle version 7.3 and above is the ability to partition an SQL query into subqueries, and dedicate separate processors to concurrently service each subquery. At this time, parallel query is only useful for queries that perform full-table scans on long tables, but the performance improvements can be dramatic. Here’s how it works.

Instead of having a single query server to manage the I/O against the table, parallel query allows the Oracle query server to dedicate many processors to simultaneously access the data, as shown in Figure 2.6.


Figure 2.6  A sample parallel query.

In order to be most effective, the table should be partitioned onto separate disk devices, such that each process can do I/O against its segment of the table without interfering with the other simultaneous query processes. However, the client-server environment of the 1990s relies on RAID or a logical volume manager (LVM), which scrambles datafiles across disk packs in order to balance the I/O load. Consequently, full utilization of parallel query involves “striping” a table across numerous data files, each on a separate device.

Even if your system uses RAID or LVM, some performance gains are still available with parallel query. In addition to using multiple processes to retrieve the table, the query manager will also dedicate numerous processes to simultaneously sort the result set. (See Figure 2.7.)


Figure 2.7  A sample parallel sort.

However, parallel query works best with symmetric multiprocessor (SMP) boxes, which have more than one internal CPU. Also, it is important to configure the system to maximize the I/O bandwidth, either through disk striping or high-speed channels. Because of the parallel sorting feature, it is also a good idea to beef up the memory on the processor.

While sorting is no substitute for using a presorted index, the parallel query manager will service requests far faster than a single process. While the data retrieval will not be significantly faster because all of the retrieval processes are competing for a channel on the same disk, each sort process has its own sort area (as determined by the sort_area_size init.ora parameter)—which speeds along the sorting of the result set. In addition to full-table scans and sorting, the parallel query option also allows for parallel processes for merge joins and nested loops.

To invoke the parallel query option requires that all indexing is bypassed. The most important is that the execution plan for the query specifies a full-table scan. If the output of the explain plan does not indicate a full-table scan, the query can be forced to ignore the index by using query hints.

The number of processors that are dedicated to service an SQL request is ultimately determined by Oracle’s query manager, but the programmer can specify the upper limit on the number of simultaneous processes. When using the cost-based optimizer, the PARALLEL hint can be embedded into the SQL to specify the number of processes. For example:

SELECT /*+ FULL(employee_table) PARALLEL(employee_table,  4) */
        employee_name
        FROM
        employee_table
        WHERE
        emp_type = 'SALARIED';

If you are using SMP with many CPUs, you can issue a parallel request and leave it up to each Oracle instance to use their default degree of parallelism:

SELECT /*+ FULL(employee_table) PARALLEL(employee_table,  DEFAULT, DEFAULT) */
        employee_name
        FROM
        employee_table
        WHERE
        emp_type = 'SALARIED';

Several important init.ora parameters have a direct impact on parallel query:

  sort_area_size—The higher the value, the more memory available for individual sorts on each parallel process. Note that the sort_area_size parameter allocates memory for every query on the system that invokes a sort. For example, if a single query needs more memory and you increase the sort_area_size, all Oracle tasks will allocate the new amount of sort area, regardless of whether or not they will use all of the space.
  parallel_min_servers—The minimum number of query servers that will be active on the instance. System resources are involved in starting a query server, so having the query server started and waiting for requests will speed up processing. Note that if the actual number of required servers is less than the value of parallel_min_servers, the idle query servers will be consuming unnecessary overhead and the value should be decreased.
  parallel_max_servers—The maximum number of query servers allowed on the instance. This parameter will prevent Oracle from starting so many query servers that the instance is unable to service all of them properly.


Previous Table of Contents Next