Previous | Table of Contents | Next |
To speed up the replication of two tables, a Unix shell script directs CPU-A to begin the copy of Table A as a background task. The script then directs CPU-B to issue a remote mount to Table B, making Table B addressable as if it were a local disk to CPU-B. The script then issues a copy of Table B, and the tables are copied simultaneously, reducing the overall processing time. (See Figure 7.6.)
Figure 7.6 Linear versus parallel processing.
Of course, the overall elapsed time will not be half of the time required for a linear processthe remote mount still requires the database on CPU-A to manage the I/O against Table B. The benefit lies in having the second processor (CPU-B) handle all of the processing for the unload of Table B.
SMP, or symmetrical multiprocessing, describes an architecture where many CPUs share a common memory area and I/O buffer. This type of architecture is not scalable, as additional processors must compete for the shared memory and I/O resources. On the other hand, massively parallel processors (MPP) describes an architecture where many independent processors share nothing, operating via a common I/O bus. An MPP system can add processors without impeding performance, and performance will actually increase as processors are added.
With Oracle version 7.2, some powerful new features have been introduced to allow parallel processes to be used against the Oracle database. These features include:
Please note that the new features of Oracle 7.2 and Oracle 7.3 will not be activated unless the following init.ora parameter has been used:
COMPATIBILITY=7.3.0.0.0
Also, note that it is not necessary to have parallel processors (SMP or MPP) in order to use and benefit from parallel processing. Even on the same processor, multiple processes can be used to speed up queries. Oracle parallel query option can be used with any SQL SELECT statementthe only restriction being that the query performs a full-table scan on the target table.
Parallel queries are most useful in distributed databases where a single logical table has been partitioned into smaller tables at each remote node. For example, a customer table that is ordered by customer name may be partitioned into a customer table at each remote database, such that we have a phoenix_customer, a los_angeles_customer, and so on. This approach is very common with distributed databases where local autonomy of processing is important. However, what about the needs of those in corporate headquarters? How can they query all of these remote tables as a single unit and treat the logical customer table as a single entity?
While this splitting of a table according to a key value violates normalization theory, it can dramatically improve performance for individual queries. For large queries that may span many logical tables, the isolated tables can be easily reassembled using Oracles parallel query facility:
CREATE VIEW all_customer AS SELECT * FROM phoenix_customer@phoenix UNION ALL SELECT * FROM los_angeles_customer@los_angeles UNION ALL SELECT * FROM rochester_customer@rochester;
Note: The @ references refer to SQL*Net service names for the remote hosts.
We can now query the all_customer view as if it were a single database table, and Oracle parallel query will automatically recognize the UNION ALL parameter, firing off simultaneous queries against each of the three base tables. It is important to note that the distributed database manager will direct each query to be processed at the remote location, while the query manager waits until each remote node has returned its result set. For example, the following query will assemble the requested data from the three tables in parallel, with each query optimized separately. The result set from each subquery is then merged by the query manager:
SELECT customer_name FROM all_customer WHERE total_purchases > 5000;
Note: For more details on using Oracles parallel query facility, refer to Chapter 2, Physical Performance Design For Oracle Databases.
One of the biggest problems in performance and tuning is the problem of planning for growth and ensuring that your distributed database continues to perform at an acceptable level. As we know, databases run within a well-defined domain of systems resources, and a shortage of these system resources can lead to performance degradation. The trick is to design a database system with the ability to add resources on an as-needed basis without interrupting processing.
Most databases have other high-impact resources, such as the recovery logs and transaction logs. Most systems have more than one recovery log. It is a good idea to locate these logs on different disk devices, as well as on devices that do not have any other high-impact data tables.
Growth of a database can occur in several areas. As the physical size of the database increases, so does the need for disk storage. As the volume of users increases, so does the need for increased buffer and lock pool storage. As network traffic increases, an increasing demand falls on the routers and bandwidth may need to be increased.
Unlike the CODASYL databases of the 1980s, todays relational databases allow for tables to grow according to specified rules and procedures. In the relational model, one or more tables may reside in a tablespace. A tablespace is a predefined container for the tables that map to fixed files of a finite size. Tables that are assigned to the tablespace may grow according to the growth rules that are specified, but the size of the tablespace supersedes the expansion rules. In other words, a table may, according to the table definition, have more extents available, but there may not be room in the tablespace to allocate those extents.
Several allocation parameters influence table growth:
Previous | Table of Contents | Next |