Previous | Table of Contents | Next |
Todays relational database environment offers a wide number of books with tips on the effective design of relational database systems. Unfortunately, many of these texts are purely theoretical and fail to take into account the real-world issues that are involved in designing a high-performance Oracle database. At the same time, dozens of Computer Aided Systems Engineering (CASE) vendors claim that their tools are indispensable for database design. In reality, taking a design through the performance stage is the only way to fully understand the issues. Real-world experience has no substitute in this realm.
It is critical to remember that database distribution is a design issuenot an analysis issue. The structured specification for the systems analysis phase specifies the logical data stores, but has no reference to the physical placement or type for the data stores. In the analysis phase, it doesnt matter whether data storage is in a VSAM file on a mainframe, an Oracle database on a VAX, or on a Rolodex file on the users desk. The concern is documenting the data flows between the logical processes and showing how data interacts with the logical data repositories. While it can be very tempting for some analysts to address physical design issues in the middle of an analysis, the disconnection of analysis and design is critical to the development of an effective Oracle system. In fact, this contamination of a logical analysis with a physical design issue is one of the main reasons that projects suffer from analysis paralysis. Analysis paralysis usually occurs when tasks overwhelm the analysis team, shifting the focus from How does it function? to How can we implement the function?
If we accept the premise that creating a high-performance database is a physical design issue, it follows that any generally accepted analysis methodology will suffice for documenting the requirements of the system. Structured specifications with data flow diagrams (for example, the popular DeMarco style or Gane & Sarson methods of systems analysis), along with a data dictionary and process logic specifications, represent an excellent starting point for system design.
One of the primary reasons that companies abandon centralized mainframes is the promise of running open systems with cheaper hardware and software. With the costs of an IBM mainframe data center approaching $500,000 per month, it is not surprising that many top IS managers force organizations to undertake the long march into open, distributed databases. In addition to hardware costs, database software costs are dramatically more for mainframe systems. A large mainframe DBMS package easily reaches the $250,000 price range, yet a good relational database for a Unix-based midrange computer costs as little as $10,000.
As attractive as these savings may sound, they still must be balanced against the costs of maintaining the distributed systems. As processors are added to remote locations, human functionssystem administration, LAN management, and database administrationmust also be replicated. And while the costs of a database server may be low, many companies lapse into shock when they are ready to attach 1,000 PC workstations to the server and find the cost for PC seats soaring up to $2,000 per database desktop client. PC workstations using a GUI with multiple database connections carry a price tag in excess of $5,000 per PC, a figure that often prompts managers to question the economic justification for downsizing. Performing a valid cost-benefit analysis (see Figure 1.1) requires the identification and quantification of all of the potential costs and savings for hardware, software, and human resources.
Figure 1.1 Downsizing cost-factor analysis for database systems.
Many IS managers do not realize that their staff sizes may more than double, depending upon the type of open systems database migration. The open system approach has a tremendous initial cost, and the savings accrue several years after implementation. When a mainframe database is partitioned into 20 remote Oracle servers, the database administration and system administration support staff may actually triple in size.
Saving money is not the only reason for downsizing, however. Companies relying on competitive information are often forced to the new midrange platforms in order to use the most advanced software. For example, the advanced object-oriented databases are found almost exclusively on midrange Unix systems. Consequently, no company can capitalize on object-oriented technology while using mainframe technology.
Location transparency refers to the distributed databases ability to function as a unified whole, appearing to end users as a single unified system. Where the database resides, what type of database is used, or the required database access method are all unimportant to the end user. With Oracle, this type of transparency is definitely possiblebut not without expense. Even a distributed system composed entirely of relational databases must deal with different dialects of SQL (see Figure 1.2).
Figure 1.2 An example of single-architecture database queries.
Managing the various dialects of SQL is one of the most pressing problems with distributed databases. Each major database vendor (ostensibly to improve its implementation of SQL) adds its own features and extensions. Oracle is one of the most prolific at adding SQL extensions. It is true that Oracle implements some outstanding and useful extensions such as the DECODE function, but queries to non-Oracle databases using these features can mean failure.
These SQL dialect problems are even more aggravating when the distributed database is composed of databases from nonrelational architectures. In order to achieve transparency, sophisticated techniques are necessary to interrogate the distributed query, identifying which data components reside in what architecture and decomposing the subqueries into the appropriate access language (see Figure 1.3).
Figure 1.3 An example of multi-architecture database queries.
To understand the concept of location transparency, consider this example: Assume a part inventory system with separate databases in Washington, Boise, New York, and Albuquerque. The manager needs the number of widgets in stock at all the locations. The manager issues the following SQL command to the distributed database manager:
SELECT count(*) FROM INVENTORY WHERE partname = 'widget';
Previous | Table of Contents | Next |