Previous Table of Contents Next


While each vendor offers different tools to achieve CPU load sharing, the Load Sharing Facility (LSF) provides insight into the internal functioning of these types of tools. LSF is a distributed computing system that turns a cluster of Unix computers from several vendors into a “virtual supercomputer.”

Platform’s LSF supports fully transparent load sharing across Unix systems from different vendors, representing the enabling technology for the rapidly emerging cluster computing market. “There is a tremendous movement across the industries to downsize from mainframes and supercomputers to RISC-based open systems,” said Songnian Zhou, President of Platform Computing. “Users’ resources are transparently accessible to users. With LSF, we have seen interactive response time of key applications reduced by 30 to 40 percent and batch job throughput doubled at such large corporate sites.”

The performance of low-cost workstations has been improving rapidly, and a cluster of workstations represents a tremendous amount of computing power. Up to now, however, such computing resources have been scattered over the network. Harnessing them to run user jobs has proved to be difficult.

Platform’s LSF automates cluster computing by hiding the network and heterogeneous computers from users. Instead of running all compute jobs on the local computers like most Unix networks, LSF transparently distributes the jobs throughout the network—taking into consideration the architecture, the operating system, and the amount of resources required by the jobs, such as memory, disk space, and software licenses. LSF supports all types of applications—parallel and serial—submitted either interactively or in batch mode.

Distributed computing has been gaining importance over the last decade as a mode preferred over centralized computing. It has been widely observed that usage of computing resources in a distributed environment is usually “bursty” over time and uneven among the hosts. A user of a workstation may not use the machine all the time, but may require more than it can provide while actively working. Some hosts may be heavily loaded, whereas others remain idle. Along with the dramatic decrease in hardware costs, resource demands of applications have been increasing steadily, with new, resource-intensive applications being introduced rapidly. It is now (and will remain) too expensive to dedicate a sufficient amount of computing resource to each and every user.

Load sharing is the process of redistributing the system workload among the hosts to improve performance and accessibility to remote resources. Intuitively, avoiding the situation of load imbalances and exploiting powerful hosts may lead to better job response times and resource utilization. Numerous studies on load sharing in the l980s have confirmed such an intuition. Most of the existing work, however, has been confined to the environment of a small cluster of homogeneous hosts, and has focused on the sharing of the processing power (CPU). With the proliferation of distributed systems supporting medium to large organizations, system scale has grown from a few time-sharing hosts to tens of workstations supported by a few server machines—and to hundreds and thousands of hosts. For effective load sharing, computing resources beyond the processing power of memory frames, disk storage, and I/O bandwidth should also be considered.

Heterogeneity

Another important development in distributed systems is heterogeneity, which may take a number of forms. In configurational heterogeneity, hosts may have different processing power, memory space, disk storage—or execute the same code on different hosts. Operating system heterogeneity occurs when the system facilities on different hosts vary and may be incompatible. Although heterogeneity imposes limitations on resource sharing, it also presents substantial opportunities. First, even if both a local workstation and a remote, more powerful host are idle, the performance of a job may still be improved if executed on the remote host rather than the local workstation. Secondly, by providing transparent resource locating and remote execution mechanisms, any job can be initiated from any host without considering the location of the resources needed by the task. Thus, a CAD package that can only be executed on a Sun host can now be initiated from an HP workstation.

Little research has been conducted regarding issues of large scale and heterogeneity in load sharing, yet they represent two of the most important research problems in load sharing in current and future distributed systems. As system scales for load sharing become inadequate, new research issues emerge.

Besides demonstrating the feasibility of a general purpose load-sharing system for large heterogeneous distributed systems by building a system that is usable in diverse system and application environments, the two main contributions of our research to the field of resource sharing in distributed systems are:

  The algorithms for distributing load information in systems with thousands of hosts, as well as task placements based on tasks’ resource demands and hosts’ load information.
  The collection of remote execution mechanisms that are highly flexible and efficient, thus enabling interactive tasks that require a high degree of transparency (as well as relatively fine-grained tasks of parallel applications) to be executed remotely and efficiently.

Summary

Considering the myriad factors that contribute to system performance, it is not surprising that we are missing the magic formula that can be applied to distributed databases to ensure acceptable performance. In a sense, performance and tuning for distributed systems is easier than centralized systems, since each remote site can be isolated and analyzed independently of the other remote sites. The complex nature of distributed processing ensures that performance and tuning will remain a complex endeavor, and it is only with a complete understanding of the nature of performance that effective measurement methods can be devised.


Previous Table of Contents Next