Previous | Table of Contents | Next |
In the above example of count(*) with location transparency, the end user has no interest in any individual databases when servicing this request, defining it as a global transaction. The transaction manager has the responsibility to query all of the distributed INVENTORY tables and collect the counts from each table, merging them into a single result set.
In many relational databases, such as Oracle with SQL*Net, creating database links to the remote database and assigning a global synonym to the remote tables achieves transparency. Later in this chapter, all examples showing distributed database requests will use the Oracle database link approach to illustrate connecting remote databases.
A location suffix associated with a Telnet name creates database links in Oracle. At the lowest level, computers are assigned an internal address, or IP address, which gives the computer a distinct number. An example would be 150.223.90.27. The Telnet name translates into the IP (Internal Protocol) address for the computer. In the following example, london_unix translates into the IP address 143.32.142.3:1:
CREATE PUBLIC DATABASE LINK london.com CONNECT TO london_unix USING oracle_user_profile;
You can now include any tables from the London sites by qualifying the remote site name in the SQL query, as in the following:
SELECT CUSTOMER.customer_name, ORDER.order_date FROM customer@london.com, ORDER WHERE CUSTOMER.cust_number = ORDER.customer_number;
But where is the location transparency? To make the location of the CUSTOMER table transparent, the DBA assigns synonyms for the CUSTOMER table in London, giving the query the appearance of being local:
CREATE SYNONYM CUSTOMER for customer@london.com;
The query can now run with complete location transparency, since the SQL has no need to reference either the IP address of the computer or the name of the database on that computer:
SELECT CUSTOMER.customer_name, ORDER.order_date FROM CUSTOMER, ORDER WHERE CUSTOMER.cust_number = ORDER.customer_number;
Oracle stored procedures can also be defined for the remote table without any reference to its physical location. For example, this procedure can be called with the following statement:
add_customer("Burleson") CREATE PROCEDURE add_customer (cust_name char(8)) AS BEGIN INSERT INTO CUSTOMER VALUES(cust_name); END;
Many sites recognize the need to track the locations of their remote database while still providing location transparency to the users and programmers. The concept of database domains and hierarchies of physical locations are especially important in situations of horizontal partitioning where tables with identical names are kept at numerous locations. Domains establish a logical hierarchy of physical locations for the enterprise. This sample database establishes a domain hierarchy for its remote databases (see Figure 1.4).
Figure 1.4 A sample hierarchy of database domains.
The DBA at each node in the network assigns synonyms for all unique tables within the distributed network, creating abbreviated domain names for the duplicate table structures that exist at remote locations. For example, assume both Japan and Ohio have a CUSTOMER table that is identical in structure but contains different rows. Assign Oracle synonyms as follows:
CREATE SYNONYM japan_customer FOR customer@hq.sales.asia.japan; CREATE SYNONYM ohio_customer FOR customer@hq.mtfg.ohio;
The ability to identify and correct performance problems has plagued distributed database systems since their genesis. Even within the context of a single transaction, distributed query optimization can be a formidable challenge. On a single database, SQL query tuning takes place by running an SQL EXPLAIN and performing the appropriate tuning. However, when a single query is split into distributed databases, the overall query tuning becomes far more complex. When a query spans hosts, several distributed database managers are required to take the distributed query and partition it into subqueries, which are then independently optimized and run (sometimes simultaneously) on the distributed databases. The query is considered complete when the last subquery has completed successfully and results are returned to the user. This approach is sometimes dubbed the weakest link architecture. The longest running of any number of partitioned subqueries determines the overall performance of the entire query. This is true despite the execution speed of any other partitioned subqueries involved. An excellent example of this approach is Oracles parallel query facility in version 7.3.
Tuning a distributed query requires the user to consider the following:
Today, tools are available to perform load balancing, whereby a processor may borrow CPU cycles from underutilized processors to balance the query and achieve maximum throughput.
Distributed databases need to be able to address information regardless of the hardware platform or the architecture of the database, especially in volatile environments where hardware and network configurations may change frequently. The two types of interoperability that emerge with distributed databases fall into the general categories of database and hardware:
Previous | Table of Contents | Next |