Oracle7 Server Tuning

Contents Index Home Previous Next

Parallelizing SQL Statements

When a statement is parsed, the optimizer determines the execution plan of a statement. Optimization is discussed in Chapter 5, "The Optimizer". After the optimizer determines the execution plan of a statement, the query coordinator process determines the parallelization method of the statement. Parallelization is the process by which the query coordinator determines which operations can be performed in parallel and then enlists query server processes to execute the statement. This section tells you how the query coordinator process decides to parallelize a statement and how the user can specify how many query server processes can be assigned to each operation in an execution plan (that is, the degree of parallelism).

To decide how to parallelize a statement, the query coordinator process must decide whether to enlist query server processes and, if so, how many query server processes to enlist. When making these decisions, the query coordinator uses information specified in hints of a query, the table's definition, and initialization parameters. The precedence for selecting the degree of parallelism is described later in this section. It is important to note that the optimizer attempts to parallelize a query only if it contains at least one full table scan operation.

Each query undergoes an optimization and parallelization process when it is parsed. Therefore, when the data changes, if a more optimal execution plan or parallelization plan becomes available, Oracle can automatically adapt to the new situation.

In the case of creating a table in parallel, the subquery in the CREATE TABLE statement is parallelized and the actual population of the table is parallelized, as well as any enforcement of NOT NULL or CHECK constraints.

Parallelizing Operations

Before enlisting query server processes, the query coordinator process examines the operations in the execution plan to determine whether the individual operations can be parallelized. The Oracle Server can parallelize these operations:

For a description of these and all operations, see Appendix A, "Performance Diagnostic Tools".

Partitioning Rows to Each Query Server

The query coordinator process also examines the partitioning requirements of each operation. An operation's partitioning requirement is the way in which the rows operated on by the operation must be divided, or partitioned, among the query server processes. The partitioning can be any of the following:

After determining the partitioning requirement for each operation in the execution plan, the query coordinator determines the order in which the operations must be performed. With this information, the query coordinator determines the data flow of the statement. Figure 6 - 4 illustrates the data flow of the following query:

SELECT dname, MAX(sal), AVG(sal)
	FROM emp, dept
		WHERE emp.deptno = dept.deptno
		GROUP BY dname;

Figure 6 - 4. Data Flow Diagram for a Join of the EMP and DEPT Tables

Operations that require the output of other operations are known as parent operations. In Figure 6 - 4 the GROUP BY SORT operation is the parent of the MERGE JOIN operation because GROUP BY SORT requires the MERGE JOIN output.

Parallelism Between Operations

Parent operations can begin processing rows as soon as the child operations have produced rows for the parent operation to consume. In the previous example, while the query servers are producing rows in the FULL SCAN DEPT operation, another set of query servers can begin to perform the MERGE JOIN operation to consume the rows. When the FULL SCAN DEPT operation is complete, the FULL SCAN EMP operation can begin to produce rows.

Each of the two operations performed concurrently is given its own set of query server processes. Therefore, both query operations and the data flow tree itself have degrees of parallelism. The degree of parallelism of an individual operation is called intra-operation parallelism and the degree of parallelism between operations in a data flow tree is called inter-operation parallelism.

Due to the producer/consumer nature of the Oracle Server's query operations, only two operations in a given tree need to be performed simultaneously to minimize execution time.

To illustrate intra-operation parallelism and inter-operator parallelism, consider the following statement:

SELECT * FROM emp ORDER BY ename;

The execution plan consists of a full scan of the EMP table followed by a sorting of the retrieved rows based on the value of the ENAME column. For the sake of this example, assume the ENAME column is not indexed. Also assume that the degree of parallelism for the query is set to four, which means that four query servers can be active for any given operation. Figure 6 - 5 illustrates the parallel execution of our example query.

Figure 6 - 5. Inter-Operator Parallelism and Dynamic Partitioning

As you can see from Figure 6 - 5, there are actually eight query servers involved in the query even though the degree of parallelism is four. This is because a parent and child operator can be performed at the same time. Also note that all of the query servers involved in the scan operation send rows to the appropriate query server performing the sort operation. If a row scanned by a query server contains a value for the ENAME column between A and G, that row gets sent to the first ORDER BY query server. When the scan operation is complete, the sorting query servers can return the sorted results to the query coordinator, which in turn returns the complete query results to the user.

Note: When a set of query servers completes its operation, it moves on to operations higher in the data flow. For example, in the previous diagram, if there was another ORDER BY operation after the ORDER BY, the query servers performing the table scan perform the second ORDER BY operation after completing the table scan.


Contents Index Home Previous Next