Previous Table of Contents Next


For uploads, the laptop tracks all changes since the last upload—first extracting and then transferring them to the centralized server. The synchronization manager then inserts new information and carefully checks the date-time stamp on all rows that are marked for update. If a row has been updated since the laptop user’s last time of synchronization, the data is handled according to the programmed rules in the synchronization manager. For most implementations, the safest method is to reject all potential anomalies, which triggers an exception report and requires manual resolution of the update. In practice, one of the largest problems is motivating the users of the mobile laptop to dial in frequently for their updates.

These types of systems recognize the problems that arise when trying to distribute data to mobile servers, and attempt to address the problem with custom rules and procedures. It is interesting to note that they have proven very successful in systems where up-to-the-minute information is not required for decision making, and the user required mobile access to information.

Parallelism And Client/Server

The widespread acceptance of distributed processing and multitasking operating systems has heralded a new mode of designing and implementing business systems. Instead of the traditional “linear” design of systems, tomorrow’s system will incorporate massively parallel processing. The result? Many tasks may be concurrently assigned to service a database request. Indeed, the entire definition of data processing is changing. The corporate data resource has been expanded to include all sources of information, not just databases. Corporate information lies within email, Lotus Notes, and many other nontraditional sources. Many companies are collecting this information without fully exploiting its value, and multiprocessing is an ideal technique for searching these huge amounts of free-form corporate information.

Multitasking And Multithreading

Before we even tackle this subject, a distinction needs to be made between multitasking and multiprocessing. Multitasking refers to the ability of a software package to manage multiple concurrent processes, thereby allowing simultaneous processing. Although OS/2 and Windows NT are good examples of this technology, multitasking can be found within all midrange and mainframe databases. Multiprocessing refers to the use of multiple CPUs within a distributed environment where a master program directs parallel operations against numerous machines. Two areas of multiprocessing are possible. The first is at the hardware level, where arrays of CPUs are offered. The second is at the software level, where a single CPU can be partitioned into separate “logical” processors. The Prism software on the IBM mainframe environment is an example of this technology.

In any case, programming for multiprocessors is quite different from linear programming techniques. Multiprocessing programming falls into two areas: data parallel programming and control parallel programming. Data parallel programming partitions the data into discrete pieces, running the same program in parallel against each piece. Control parallel programming identifies independent functions that are simultaneously solved by independent CPUs (Figure 7.3).


Figure 7.3  An example of a parallel query.

One of the greatest problems with implementing parallel processing systems is the identification of parallelism. Parallelism refers to the ability of a computer system to perform processing on many data sources at the same instant in time. Whereas many of the traditional database applications were linear in nature, today’s systems have many opportunities for parallel processing.

Parallelism is especially important to scientific applications that could benefit from the opportunity to have hundreds—or even thousands—of processors working together to solve a problem. But the same concept of parallelism applies to very large databases. If a query can be split into subqueries where each subquery is assigned to a processor, the response time for the query can reduce by a factor of thousands (Figure 7.4).


Figure 7.4  The performance benefits of adding processors.

A review of the past 30 years makes it clear that tremendous improvements have been made in the speed of processors. At the same time, the prices of processors have continued to decline. However, this trend cannot continue forever. The physical nature of silicon processors has been pushed to its limit and is now reaching a diminishing point of return. In order to continue to enjoy increases in performance, we either need to replace silicon as a medium or to devise ways to exploit parallelism in processing.

Parallelism is an issue of scale. Where a linear process may solve a problem in one hour, a parallel system with 60 processors should be able to solve the problem in 1 minute, as demonstrated in Figure 7.5. This is analogous to the statement that 1 woman takes nine months to have a baby, so nine women should be able to produce a baby in one month. Speed can only be improved in those situations where parallel processing is appropriate, which excludes traditional linear systems where one process may not begin until the preceding one ends.


Figure 7.5  Parallelism across multiple CPUs.

Other facets to parallel processing can be extremely valuable. A query against a very large database can be dramatically improved if the data is partitioned. For example, if a query against a text database takes 1 minute to scan a terabyte, then partitioning the data and processing into 60 pieces will result in a retrieval time of 1second. Another key issue is the balancing of the CPU processing with the I/O processing. In a traditional data processing environment, the systems are not computationally intensive, and most of the elapsed time is spent waiting on I/O. However, this does not automatically exclude business systems from taking advantage of multiprocessing.

A continuum of processing architecture exists for parallel processing. On one end of the spectrum we find a few powerful CPUs that are loosely connected, while on the other end we see a large amount of small processors that are tightly coupled.

Parallelism can be easily identified in a distributed database environment. For the database administrator, routine maintenance tasks such as export/import operations can be run in parallel, reducing the overall time required for system maintenance.

In an open systems environment, parallelism may be easily simulated by using a remote mount facility. With a remote mount, a data file may be directly addressed from another processor, even though the data file physically resides on another machine. This can be an especially useful technique for speeding up table replication to remote sites, as shown in Figure 7.5.


Previous Table of Contents Next