Oracle7 Server Tuning

Tuning for the Parallel Query Option

Four questions are very important in tuning when using the parallel query option:

what systems are appropriate for the parallel query option?

how does I/O and file striping relate to the parallel query option?

how should I set the degree of parallelism?

how many query server processes do I need on my system?

This section details some of the factors to consider when answering the above questions. This section also details the EXPLAIN PLAN facility for examining a query's execution plan and parallelization scheme. Some factors involved in determining the answers to the questions above are operating system dependent.

Additional Information: See your operating system-specific Oracle documentation for more information about tuning while using the parallel query option.

What Systems Benefit?

The parallel query option can help the most on the following systems:

symmetric multiprocessors (SMP) or massively parallel systems

systems with high I/O bandwidth (that is, many datafiles on many different disk drives)

systems with underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)

systems with sufficient memory to devote to sorting space for queries

If any one of these conditions is not true for your system, the parallel query option may not significantly help performance. In fact, on over-utilized systems or systems with small I/O bandwidth, the parallel query option can impede system performance.

It is important to realize that Oracle cannot return results to a user process in parallel. Therefore, if a query returns a large number of rows, the execution of the query will be faster, but the user process receives the rows serially. The parallel query option is best for queries that involve computing summary information for a large amount of data.

What Are the I/O Considerations?

The parallel query option works best on files that are spread or striped across many disk drives. This allows the query servers to maximize concurrent access to the datafiles.

Automatic file striping by the operating system is nearly as good as manually striping your tables across datafiles, so take advantage of automatic file striping if supported in your operating system. Operating system striping is usually easy to configure and provides automatic I/O balancing. Operating system striping is also optimal for random I/O operations such as sorts using temporary segments. Manual table striping is optimal for sequential I/O and may improve the performance for table scans.

File stripe size should be a few multiples larger than the size of DB_FILE_MULTIBLOCK_READ_COUNT blocks (the amount of blocks that Oracle attempts to read with a single I/O).

Additional Information: For a discussion of manually striping tables across datafiles, refer to "Striping Table Data" . For more information about automatic file striping and tools to use to determine I/O distribution among your devices, refer to your operating system documentation.

It is also important to note that large contiguous extents can help the query coordinator break up scan operations more efficiently for the query servers. During a scan operation, the query coordinator identifies all of the contiguous ranges of blocks and then breaks up these ranges into large, medium, and small groups of blocks. Each query server is given a large group of contiguous blocks to begin with and then is given successively smaller groups of blocks until the scan is completed. This ensures that all of the query servers complete the scan at approximately the same time. If there are several large extents in a table, the query coordinator can easily find groups of contiguous blocks to dispatch to the query servers.

The temporary tablespace used for sorting should also be striped across several disks (by the operating system, if possible). This tablespace should also contain several large extents.

How Should I Set the Degree of Parallelism?

After handling the I/O distribution, the degree of parallelism is the most important factor in tuning the parallel query option. You must first determine the maximum number of query servers your system can support and then divide those query servers among the estimated number of concurrent queries. The maximum number of query servers your system can support depends upon combinations of the following factors:

the number and capacity of CPUs on your system

the limitations on the number of processes on your system

if tables are striped, the number of disk drives the table is striped across

the amount of query processing on your system

the location of the data (that is, whether it is cached or on disk)

the type of operations in a query (for example, sorts or full table scans)

While some of these factors are operating system dependent, most of these factors are dependent upon the characteristics of your specific system and the type of operations involved in each query. Therefore, it is difficult to make generic recommendations for the degree of parallelism. It is often best to pick a degree of parallelism for a query, determine the execution time, and then determine the execution time of several higher and lower degrees of parallelism to find the optimal number.

Additional Information: To determine whether your system is being fully utilized, there are several graphical system monitors available on most operating systems. These monitors often give you a better idea of CPU utilization and system performance than monitoring the execution time of a query. Consult your operating system documentation to determine if your system supports graphical system monitors.

For single-user parallel execution, one guideline for your initial estimate of the degree of parallelism is to use one or two times the number of CPUs on your system (depending on the capacity of your CPUs). This is a good guess for sort-intensive operations because those tend to be CPU-bound. If your query is I/O intensive (that is, more scanning of tables than sorting), you may want to use one or two times the number of disk drives involved in the scans as your initial guess at the degree of parallelism.

If you have several concurrent queries that all require parallel execution, you should consider adding CPUs and/or disk drives to your system to allow for a greater number of query servers per query. If you cannot expand your system to accommodate multiple users, you may need to consider limiting parallel query execution to fewer users or lowering the degree of parallelism for each user's query.

Guidelines for Creating and Populating Tables and Indexes in Parallel

When you create a table and populate it with a subquery (CREATE TABLE...AS SELECT) in parallel, you want to evenly divide the population operation among all available processors. The goal is to distribute the I/O among the processors so that there is little contention for the I/O devices. A guideline for setting the degree of parallelism is two to four query servers per processor on your system.

You also want to avoid fragmentation by having as many files in your tablespace as the degree of parallelism for index or table creation. Each query server allocates an extent in a file when building a table or index. If more than one query server uses the same file, there may be gaps where one query server did not fill an extent completely. Try to have the degree of parallelism for parallel index and table creation come as close as possible to the number of files in the tablespace.

Memory Requirements for Sort Operations

The SORT_AREA_SIZE initialization parameter specifies the amount of memory to allocate per query server for sort operations. Setting this parameter to a larger value can dramatically increase the performance of sort operations since the entire sort is more likely to be performed in memory. Keep in mind that the actual amount of memory used for a sort is the value of SORT_AREA_SIZE multiplied by the number of query servers (for example, 10 query servers each with 1M for sorts requires 10M of memory allocated for the sort).

If memory is abundant on your system, you can benefit from setting SORT_AREA_SIZE to a large value. However, if memory is a concern for your system, you may want to limit the amount of memory allocated for sorts and increase the size of the buffer cache so that data blocks from temporary sort segments can be cached in the buffer cache.

Using Direct Disk I/O for Sorts with the Parallel Query Option

If memory and temporary space are abundant on your system, and you perform many large sorts to disk, you can set the initialization parameter SORT_DIRECT_WRITES to increase sort performance. Note that each query server will allocate its own set of direct write buffers, so the total amount of memory allocated for buffers is the number of query servers multiplied by SORT_WRITE_BUFFERS, the number of direct write buffers, multiplied by SORT_WRITE_BUFFER_SIZE. In addition to the increased memory usage, sorts that use direct writes will tend to consume more temporary segment space on disk.

Tuning the Query Servers

The V$PQ_SYSSTAT view contains statistics that are useful for determining the appropriate number of query server processes for an instance. The statistics that are particularly useful are "Servers Busy", "Servers Idle", "Servers Started", and "Servers Shutdown".

Frequently, you will not be able to increase the maximum number of query servers for an instance because the maximum number is heavily dependent upon the capacity of your CPUs and your I/O bandwidth. However, if servers are continuously starting and shutting down, you should consider increasing the value of the parameter PARALLEL_MIN_SERVERS.

For example, if you have determined that the maximum number of concurrent query servers that your machine can manage is 100, you should set PARALLEL_MAX_SERVERS to 100. Next determine how many query servers the average query needs, and how many queries are likely to be executed concurrently. For this example, let's assume you'll have two concurrent queries with 20 as the average degree of parallelism. Thus, at any given point in time, there could be 80 query servers busy on an instance. Thus you should set the parameter PARALLEL_MIN_SERVERS to be 80.

Now you should periodically examine V$PQ_SYSSTAT to determine if the 80 query servers for the instance are actually busy. To determine if the instance's query servers are active, issue the following query:

SELECT * FROM V$PQ_SYSSTAT 
	WHERE statistic = "Servers Busy";

STATISTIC             VALUE
--------------------- -----------
Servers Busy          70

If you find that there are typically fewer than PARALLEL_MIN_SERVERS busy at any given time, your idle query servers are additional system overhead that is not being used. You should then consider decreasing the value of the parameter PARALLEL_MIN_SERVERS. If you find that there are typically more query servers active than the value of PARALLEL_MIN_SERVERS and the "Servers Started" statistic is continuously growing, then you should consider increasing the value of the parameter PARALLEL_MIN_SERVERS.

EXPLAIN PLAN

The EXPLAIN PLAN facility details a query's execution plan and how a query is parallelized. See Appendix A, "Performance Diagnostic Tools", for a detailed description of the EXPLAIN PLAN facility.

The OBJECT_NODE and OTHER columns of the PLAN_TABLE describe the parallelism (if any) of a given query. For non-distributed queries, the OBJECT_NODE column describes the order in which the output from operations is consumed. The OTHER column describes the text of the query that is used by query servers for each operation. For example, in the following query:

EXPLAIN PLAN FOR
	SELECT dname, MAX(sal), AVG(sal)
		FROM emp, dept
		WHERE emp.deptno = dept.deptno
		GROUP BY dname;

the contents of the plan table may look something like this:

QUERY PLAN            OBJECT_NODE   OTHER
--------------------- ------------- --------------------------
GROUP BY SORT         :Q704003      select c0, avg(c1), max(c1) from
                                       :Q704002 group by c0
    MERGE JOIN        :Q704002      select /*+ ordered use_merge(a2) */
                                       a2.c1 c0, a1.c1 c1 from
                                       :Q704001 a1, :Q704000 a2
                                       where a1.c0 = a2.c0
      JOIN SORT       :Q704002     operation combined with parent
         FULL SCAN    :Q704001     select /*+ rowid(a1) */ a1.deptno c0,
                                      a1.sal c1 where rowid between :1
                                      and :2
      JOIN SORT       :Q704002     operation combined with parent
         FULL SCAN    :Q704000     output consumed in parallel

From this output you can construct the data flow diagram for the query. Figure 6 - 6 illustrates the data flow diagram.

Figure 6 - 6. Data Flow Diagram with OBJECT_NODE Information

Combining Operations within a Tree

The EXPLAIN PLAN output for the preceding example join query describes that the MERGE JOIN operation also requires a JOIN SORT operation on both tables involved in the join. Because the MERGE JOIN operation requires the information from both JOIN SORT operations, the MERGE JOIN is considered the parent operation of the JOIN SORT. The query coordinator process compares the partitioning requirement of each operator with that of its parent.

If they are compatible, the operators can be combined and performed together as a single operator.

If they are incompatible, the query coordinator repartitions the rows returned by the operator so that the rows meet the partitioning requirements of its parent operator. The rows are then sent from the operator to its parent.

In the example above, the two JOIN SORT operations are compatible with their parent operation and therefore are combined into the MERGE JOIN. All other operations require communication of the results to the parent operation.