Oracle7 Server Distributed Systems Volume I: Distributed Data

Contents Index Home Previous Next

Troubleshooting Distributed Update Problems

A network or system failure can cause the following types of problems:

The following sections describe these situations.

Failures that Interrupt Prepare/Commit

The user program that commits a distributed transaction is informed of a problem by one of the following error messages:

ORA-02050: transaction ID rolled back,
           some remote dbs may be in-doubt
ORA-02051: transaction ID committed,
           some remote dbs may be in-doubt
ORA-02054: transaction ID in-doubt

A robust application should save information about a transaction if it receives any of the above errors. This information can be used later if manual distributed transaction recovery is desired.

Note: The failure cases that prompt these error messages are beyond the scope of this book and are unnecessary to administer the system.

No action is required by the administrator of any node that has one or more in-doubt distributed transactions due to a network or system failure. The automatic recovery features of Oracle7 transparently complete any in-doubt transaction so that the same outcome occurs on all nodes of a session tree (that is, all commit or all roll back) once the network or system failure is resolved.

However, in extended outages, the administrator may wish to force the commit or rollback of a transaction to release any locked data. Applications must account for such possibilities.

Failures that Prevent Access of Data

When a user issues a SQL statement, Oracle7 attempts to lock the required resources to successfully execute the statement. However, if the requested data is currently being held by statements of other uncommitted transactions and continues to remained locked for an excessive amount of time, a time-out occurs. Consider the following two scenarios.

Transaction Time-Out

A DML SQL statement that requires locks on a remote database may be blocked from doing so if another transaction (distributed or non-distributed) currently own locks on the requested data. If these locks continue to block the requesting SQL statement, a time-out occurs, the statement is rolled back, and the following error message is returned to the user:

ORA-02049: time-out: distributed transaction waiting for lock

Because no data has been modified, no actions are necessary as a result of the time-out. Applications should proceed as if a deadlock has been encountered. The user who executed the statement can try to re-execute the statement later. If the lock persists, the user should contact an administrator to report the problem.

The timeout interval in the above situation can be controlled with the initialization parameter DISTRIBUTED_LOCK_TIMEOUT. This interval is in seconds. For example, to set the time-out interval for an instance to 30 seconds, include the following line in the associated parameter file:

DISTRIBUTED_LOCK_TIMEOUT=30

With the above time-out interval, the time-out errors discussed in the previous section occur if a transaction cannot proceed after 30 seconds of waiting for unavailable resources.

Additional Information: For more information about initialization parameters and editing parameter files, see the Oracle7 Server Reference.

Lock From In-Doubt Transaction

A query or DML statement that requires locks on a local database may be blocked from doing so indefinitely due to the locked resources of an in-doubt distributed transaction. In this case, the following error message is immediately returned to the user:

ORA-01591: lock held by in-doubt distributed transaction ID

In this case, the SQL statement is rolled back immediately. The user who executed the statement can try to re-execute the statement later. If the lock persists, the user should contact an administrator to report the problem, including the ID of the in-doubt distributed transaction.

The chances of the above situations occurring are very rare, considering the low probability of failures during the critical portions of the prepare/commit phases. Even if such a failure occurs and assuming quick recovery from a network or system failure, problems are automatically resolved without manual intervention. Thus problems usually resolve before they can be detected by users or database administrators.


Contents Index Home Previous Next