Oracle7 Server Distributed Systems Volume II: Replicated Data

Replication Models

With symmetric replication, applications can be built that employ both standard primary site replication techniques and advanced replication techniques.

Basic Replication

There are a variety of usage scenarios that can be implemented using read-only snapshots, or basic replication. Some examples are described below. Each of these scenarios describes a form of primary site data replication.

With primary site replication, each piece of information is owned by one site, and this ownership never changes. Other sites "subscribe" to the data owned by the primary site, which means that they have access to read-only copies of the replicated data. With primary site replication, you never need to worry about any discrepancies between data that is being updated at two different locations.

Information Offloading

Because snapshots can provide a local copy of your data, they can be accessed faster than remote data. For example, if your the order entry site requires fast online transaction processing (OLTP) capability for order entry against the inventory table, and the marketing/decision support site requires lengthy queries against the same table, you can use information offloading. Information offloading, in this example, would provide the decision support site with a separate and full copy of the inventory table as a read-only snapshot for local analysis (see Figure 2 - 1).

Figure 2 - 1. Information Offloading

Information Distribution

While in the above example, the read-only replica is a full copy of the original table, you may choose to replicate only selected portions of the table at each site. For example, suppose that a central copy of all of your customer information is maintained at your headquarters in New York.

Portions of this information might be used by your sales offices around the world. Instead of replicating the entire table at each sales office, you need only replicate the portions appropriate for that region, as shown in Figure 2 - 2.

Advanced Primary Site Models

Read-only snapshots support a primary site ownership model only. Other replication technologies can support this model by restricting updates at the application level to a single site.

For performance reasons, however, you might want to allow local updates of the data, thus allowing multiple sites to have access to a single table. You can still successfully avoid conflicts by implementing an advanced form of primary site ownership.

Instead of designating one site as the owner of the entire table, each site is allowed to "own" a distinct portion of this table. That is, each site would be allowed to modify only a subset of the rows or columns in each table. You might think of this as allowing each site to own a distinct horizontal or vertical partition of the data in a single table.

Ownership can either be enforced by your application, or can be enforced by using a combination of triggers, views, procedures, and horizontally partitioned updatable snapshots.

Horizontal Partitioning

For example, you could implement a distributed order entry system such that each order entry site in each sales office owned distinct horizontal partitions of tables (such as CUSTOMERS, ORDERS, and ITEMS) that contain the orders and customer information for the customers serviced by that office. Your central headquarters site could then maintain a read-only view of the master table containing all orders and customer information across all sales offices.

The CREATE statement for a snapshot of your CUSTOMERS table might look like

CREATE SNAPSHOT customers FOR UPDATE AS 
	SELECT * FROM customers@hq.com WHERE region = 'North East';

Vertical Partitioning

You can further subdivide the ownership of a table by allowing a site to modify only selected columns of a given row. For example, suppose you have a stock table. You might want to allow different regional sites to update the AMOUNT_AVAILABLE column for different items (rows), but only allow someone from your headquarters site to update the item description columns for every row.

Note: Ownership of vertical partitions requires the use of column groups. Column groups are described .

Dynamic Ownership

In addition to basing ownership on a static column as shown in the primary site ownership examples, you can also base ownership on a field that can be updated. This would result in a form of dynamic ownership of data. With dynamic ownership, the ability or right to update replicated data moves from site to site while ensuring that at any given point in time only one site may update the data.

Work Flow

One form of dynamic ownership is work flow. Work flow is a simple form of exclusive ownership commonly used by business applications. To implement a work flow model of conflict avoidance, your application must guarantee the following:

The control of ownership is ordered.

Each site can only update data that it owns; that is, only rows with a given status are updatable.

Each site must push the ownership to the next site by updating the status to the next state.

For example, within an order processing system, the processing of orders typically follows a well ordered series of steps such as: entered, approved, shipped, billed, collected, and accounted for. Sophisticated centralized systems allow the application modules that perform these steps to act on the same data contained in one integrated database.

Each application module acts on an order, that is, performs updates to the order data, when the state of the order indicates that the previous processing steps have been completed. For example, the application module that ships an order will do so only after the order has been entered and approved.

By employing a dynamic ownership replication technique, such a system can be distributed across multiple sites and databases. Application modules can reside on different systems. For example, order entry and approval can be performed on one system, shipping on another, billing on another, and so on. Order data is replicated to a site when its state indicates that it is ready for the processing step performed by that site. Data may also be replicated to sites that need read-only access to the data. For example, order entry sites may wish to monitor the progression of processing steps for the orders they enter.

Shared Ownership

All of the usage methodologies described thus far, that is, primary site ownership and dynamic ownership, share a common property -- at any given point in time, only one site may update the data while the other sites have read-only access to replicated copies of the data.

In some situations, however, it is desirable to allow multiple sites to update the same data, potentially at the same time. For example, it may be desirable to replicate customer data across multiple sites and systems rather than maintaining customer data centrally or maintaining it separately and redundantly within each system. Different sites, though, may need to update this data.

Update Conflicts

Suppose that you replicate customer data across sales office order entry sites and headquarters sites. One element of the customer data is the customer address. What happens if a customer's address is changed at both a sales office and a headquarters site at the same time?

This occurrence is known as an update conflict. Replicated data has become inconsistent because the replicated data was updated at multiple sites. If you cannot tolerate such inconsistencies, you must either carefully partition ownership of your data or only allow for synchronous propagation of changes between sites. If all sites in your replicated environment are propagating changes to one another synchronously, update conflicts cannot occur. If, however, you have even one site sending or receiving changes asynchronously (for example, if you have an updatable snapshot site), you have the potential for conflicts. For some applications, these temporary inconsistencies can be permitted as long as they can be detected and resolved to ensure that over time the replicated data converges to a consistent state at all sites.

Update Conflict Detection, Notification and Resolution

The symmetric replication facility supports these capabilities. For example, in the scenario described previously, Oracle detects that the update conflict on the customer's address has occurred and automatically invokes an application-specific conflict resolution routine to restore the replicated data to a consistent state. Oracle can also invoke a notification routine to send an alert that a conflict has occurred.

The symmetric replication facility provides a number of standard resolution routines from which the application developer can select. Standard resolution routines include: timestamp determined most recent update, commutative resolution of additive updates, applying the change from the site with the highest priority value, and min/max selection of updates. Alternatively, for more specialized cases, the application developer can write his or her own routines.

In the scenario above, a routine that uses timestamps to determine the most recent update can be employed so that the customer's address converges to the most recent update of the address at all sites. Update conflicts on the address will be automatically detected and immediately resolved at each site by selecting the most recent of the updates.

Sophisticated Uses of Shared Ownership

Shared ownership allows symmetric replication to be employed where primary site ownership and dynamic ownership methodologies would be too restrictive. As such, in those cases where temporary inconsistencies can be permitted and conflict resolution routines devised, it can offer greater flexibility.

For example, an earlier discussion described how a distributed order entry system could be implemented using primary site replication techniques with horizontal partitioning.

In this scenario each sales office owned a distinct horizontal partition of the tables containing orders and customer information for the customers serviced by that office. Each sales office entered orders for its customers, but no others.

For some businesses, though, this is not the model. For example, a retail chain may have several stores in a metropolitan area. Customers may frequent the store closest to where they live, but they will go into other stores; and these others stores will want to take their orders when they do. If multiple stores perform updates to the same customer and order data, as illustrated in Figure 2 - 4, update conflicts potentially could occur. Sophisticated application developers can identify these conflicts and either select standard resolution routines or devise their own to implement such systems.