Additional Information: See your operating system-specific Oracle documentation for more information about tuning while using the parallel query option.
It is important to realize that Oracle cannot return results to a user process in parallel. Therefore, if a query returns a large number of rows, the execution of the query will be faster, but the user process receives the rows serially. The parallel query option is best for queries that involve computing summary information for a large amount of data.
Automatic file striping by the operating system is nearly as good as manually striping your tables across datafiles, so take advantage of automatic file striping if supported in your operating system. Operating system striping is usually easy to configure and provides automatic I/O balancing. Operating system striping is also optimal for random I/O operations such as sorts using temporary segments. Manual table striping is optimal for sequential I/O and may improve the performance for table scans.
File stripe size should be a few multiples larger than the size of DB_FILE_MULTIBLOCK_READ_COUNT blocks (the amount of blocks that Oracle attempts to read with a single I/O).
Additional Information: For a discussion of manually striping tables across datafiles, refer to "Striping Table Data" . For more information about automatic file striping and tools to use to determine I/O distribution among your devices, refer to your operating system documentation.
It is also important to note that large contiguous extents can help the query coordinator break up scan operations more efficiently for the query servers. During a scan operation, the query coordinator identifies all of the contiguous ranges of blocks and then breaks up these ranges into large, medium, and small groups of blocks. Each query server is given a large group of contiguous blocks to begin with and then is given successively smaller groups of blocks until the scan is completed. This ensures that all of the query servers complete the scan at approximately the same time. If there are several large extents in a table, the query coordinator can easily find groups of contiguous blocks to dispatch to the query servers.
The temporary tablespace used for sorting should also be striped across several disks (by the operating system, if possible). This tablespace should also contain several large extents.
Additional Information: To determine whether your system is being fully utilized, there are several graphical system monitors available on most operating systems. These monitors often give you a better idea of CPU utilization and system performance than monitoring the execution time of a query. Consult your operating system documentation to determine if your system supports graphical system monitors.
For single-user parallel execution, one guideline for your initial estimate of the degree of parallelism is to use one or two times the number of CPUs on your system (depending on the capacity of your CPUs). This is a good guess for sort-intensive operations because those tend to be CPU-bound. If your query is I/O intensive (that is, more scanning of tables than sorting), you may want to use one or two times the number of disk drives involved in the scans as your initial guess at the degree of parallelism.
If you have several concurrent queries that all require parallel execution, you should consider adding CPUs and/or disk drives to your system to allow for a greater number of query servers per query. If you cannot expand your system to accommodate multiple users, you may need to consider limiting parallel query execution to fewer users or lowering the degree of parallelism for each user's query.
You also want to avoid fragmentation by having as many files in your tablespace as the degree of parallelism for index or table creation. Each query server allocates an extent in a file when building a table or index. If more than one query server uses the same file, there may be gaps where one query server did not fill an extent completely. Try to have the degree of parallelism for parallel index and table creation come as close as possible to the number of files in the tablespace.
If memory is abundant on your system, you can benefit from setting SORT_AREA_SIZE to a large value. However, if memory is a concern for your system, you may want to limit the amount of memory allocated for sorts and increase the size of the buffer cache so that data blocks from temporary sort segments can be cached in the buffer cache.
Frequently, you will not be able to increase the maximum number of query servers for an instance because the maximum number is heavily dependent upon the capacity of your CPUs and your I/O bandwidth. However, if servers are continuously starting and shutting down, you should consider increasing the value of the parameter PARALLEL_MIN_SERVERS.
For example, if you have determined that the maximum number of concurrent query servers that your machine can manage is 100, you should set PARALLEL_MAX_SERVERS to 100. Next determine how many query servers the average query needs, and how many queries are likely to be executed concurrently. For this example, let's assume you'll have two concurrent queries with 20 as the average degree of parallelism. Thus, at any given point in time, there could be 80 query servers busy on an instance. Thus you should set the parameter PARALLEL_MIN_SERVERS to be 80.
Now you should periodically examine V$PQ_SYSSTAT to determine if the 80 query servers for the instance are actually busy. To determine if the instance's query servers are active, issue the following query:
SELECT * FROM V$PQ_SYSSTAT WHERE statistic = "Servers Busy";
STATISTIC VALUE --------------------- ----------- Servers Busy 70
If you find that there are typically fewer than PARALLEL_MIN_SERVERS busy at any given time, your idle query servers are additional system overhead that is not being used. You should then consider decreasing the value of the parameter PARALLEL_MIN_SERVERS. If you find that there are typically more query servers active than the value of PARALLEL_MIN_SERVERS and the "Servers Started" statistic is continuously growing, then you should consider increasing the value of the parameter PARALLEL_MIN_SERVERS.
The OBJECT_NODE and OTHER columns of the PLAN_TABLE describe the parallelism (if any) of a given query. For non-distributed queries, the OBJECT_NODE column describes the order in which the output from operations is consumed. The OTHER column describes the text of the query that is used by query servers for each operation. For example, in the following query:
EXPLAIN PLAN FOR SELECT dname, MAX(sal), AVG(sal) FROM emp, dept WHERE emp.deptno = dept.deptno GROUP BY dname;
the contents of the plan table may look something like this:
QUERY PLAN OBJECT_NODE OTHER --------------------- ------------- -------------------------- GROUP BY SORT :Q704003 select c0, avg(c1), max(c1) from :Q704002 group by c0 MERGE JOIN :Q704002 select /*+ ordered use_merge(a2) */ a2.c1 c0, a1.c1 c1 from :Q704001 a1, :Q704000 a2 where a1.c0 = a2.c0 JOIN SORT :Q704002 operation combined with parent FULL SCAN :Q704001 select /*+ rowid(a1) */ a1.deptno c0, a1.sal c1 where rowid between :1 and :2 JOIN SORT :Q704002 operation combined with parent FULL SCAN :Q704000 output consumed in parallel
From this output you can construct the data flow diagram for the query. Figure 6 - 6 illustrates the data flow diagram.
Figure 6 - 6. Data Flow Diagram with OBJECT_NODE Information