Previous Table of Contents Next


In the above example of count(*) with location transparency, the end user has no interest in any individual databases when servicing this request, defining it as a global transaction. The transaction manager has the responsibility to query all of the distributed INVENTORY tables and collect the counts from each table, merging them into a single result set.

In many relational databases, such as Oracle with SQL*Net, creating “database links” to the remote database and assigning a global synonym to the remote tables achieves transparency. Later in this chapter, all examples showing distributed database requests will use the Oracle database link approach to illustrate connecting remote databases.

A location suffix associated with a Telnet name creates database links in Oracle. At the lowest level, computers are assigned an internal address, or IP address, which gives the computer a distinct number. An example would be 150.223.90.27. The Telnet name translates into the IP (Internal Protocol) address for the computer. In the following example, london_unix translates into the IP address 143.32.142.3:1:

CREATE PUBLIC DATABASE LINK london.com
     CONNECT TO london_unix USING oracle_user_profile;

You can now include any tables from the London sites by qualifying the remote site name in the SQL query, as in the following:

SELECT     CUSTOMER.customer_name,
           ORDER.order_date
     FROM  customer@london.com,
           ORDER
     WHERE
     CUSTOMER.cust_number = ORDER.customer_number;

But where is the location transparency? To make the location of the CUSTOMER table transparent, the DBA assigns synonyms for the CUSTOMER table in London, giving the query the appearance of being local:

CREATE SYNONYM CUSTOMER for customer@london.com;

The query can now run with complete location transparency, since the SQL has no need to reference either the IP address of the computer or the name of the database on that computer:

SELECT      CUSTOMER.customer_name,
            ORDER.order_date
     FROM   CUSTOMER,
            ORDER
     WHERE
     CUSTOMER.cust_number = ORDER.customer_number;

Oracle stored procedures can also be defined for the remote table without any reference to its physical location. For example, this procedure can be called with the following statement:

add_customer("Burleson")
CREATE PROCEDURE add_customer (cust_name char(8)) AS
   BEGIN
      INSERT INTO CUSTOMER VALUES(cust_name);
   END;

Many sites recognize the need to track the locations of their remote database while still providing location transparency to the users and programmers. The concept of database domains and hierarchies of physical locations are especially important in situations of horizontal partitioning where tables with identical names are kept at numerous locations. Domains establish a logical hierarchy of physical locations for the enterprise. This sample database establishes a domain hierarchy for its remote databases (see Figure 1.4).


Figure 1.4  A sample hierarchy of database domains.

The DBA at each node in the network assigns synonyms for all unique tables within the distributed network, creating abbreviated domain names for the duplicate table structures that exist at remote locations. For example, assume both Japan and Ohio have a CUSTOMER table that is identical in structure but contains different rows. Assign Oracle synonyms as follows:

CREATE SYNONYM japan_customer FOR     customer@hq.sales.asia.japan;
CREATE SYNONYM ohio_customer  FOR     customer@hq.mtfg.ohio;

Distributed Database Performance And Tuning Methods

The ability to identify and correct performance problems has plagued distributed database systems since their genesis. Even within the context of a single transaction, distributed query optimization can be a formidable challenge. On a single database, SQL query tuning takes place by running an SQL EXPLAIN and performing the appropriate tuning. However, when a single query is split into distributed databases, the overall query tuning becomes far more complex. When a query spans hosts, several distributed database managers are required to take the distributed query and partition it into subqueries, which are then independently optimized and run (sometimes simultaneously) on the distributed databases. The query is considered complete when the last subquery has completed successfully and results are returned to the user. This approach is sometimes dubbed “the weakest link” architecture. The longest running of any number of partitioned subqueries determines the overall performance of the entire query. This is true despite the execution speed of any other partitioned subqueries involved. An excellent example of this approach is Oracle’s parallel query facility in version 7.3.

Tuning a distributed query requires the user to consider the following:

  The physical location of the database
  The availability of multiple CPUs

Today, tools are available to perform load balancing, whereby a processor may borrow CPU cycles from underutilized processors to balance the query and achieve maximum throughput.

Distributed databases need to be able to address information regardless of the hardware platform or the architecture of the database, especially in volatile environments where hardware and network configurations may change frequently. The two types of interoperability that emerge with distributed databases fall into the general categories of database and hardware:

  Database Interoperability—The ability of a database to function autonomously, allowing the distributed database to access numerous types of databases within the domain of a unified environment. The UniFace and PowerBuilder tools attempt to serve this market, automatically providing mechanisms for subtasking database queries and merging result sets. Chapter 2, Physical Performance Design For Oracle Databases, discusses how Oracle parallel query techniques partition a single Oracle query into multiple processes.
  Hardware Interoperability—The ability of the distributed system to address resources at many locations on an as-needed basis. At the hardware level, a single subquery of a distributed query runs on numerous processors, and load balancing tools assign multiple processors to a single database.


Previous Table of Contents Next