Oracle7 Server Administrator's Guide
Guidelines for Managing Clusters
A cluster provides an optional method of storing table data. A cluster is comprised of a group of tables that share the same data blocks, which are grouped together because they share common columns and are often used together. For example, the EMP and DEPT table share the DEPTNO column. When you cluster the EMP and DEPT tables (see Figure 14 - 1), Oracle physically stores all rows for each department from both the EMP and DEPT tables in the same data blocks. You should not use clusters for tables that are frequently accessed individually.
Because clusters store related rows of different tables together in the same data blocks, properly used clusters offer two primary benefits:
- Disk I/O is reduced and access time improves for joins of clustered tables.
- The cluster key is the column, or group of columns, that the clustered tables have in common. You specify the columns of the cluster key when creating the cluster. You subsequently specify the same columns when creating every table added to the cluster. Each cluster key value is stored only once each in the cluster and the cluster index, no matter how many rows of different tables contain the value.
Therefore, less storage might be required to store related table and index data in a cluster than is necessary in non-clustered table format. For example, notice how each cluster key (each DEPTNO) is stored just once for many rows that contain the same value in both the EMP and DEPT tables.
After creating a cluster, you can create tables in the cluster. However, before any rows can be inserted into the clustered tables, a cluster index must be created. Using clusters does not affect the creation of additional indexes on the clustered tables; they can be created and dropped as usual.
Figure 14 - 1. Clustered Table Data
The following sections describe guidelines to consider when managing clusters, and includes the following topics:
See Also: For more information about clusters, see the Oracle7 Server Concepts manual.
Cluster Appropriate Tables
Use clusters to store one or more tables that are primarily queried (not predominantly inserted into or updated) and for which the queries often join data of multiple tables in the cluster or retrieve related data from a single table.
Choose Appropriate Columns for the Cluster Key
Choose cluster key columns carefully. If multiple columns are used in queries that join the tables, make the cluster key a composite key. In general, the characteristics that indicate a good cluster index are the same as those for any index.
A good cluster key has enough unique values so that the group of rows corresponding to each key value fills approximately one data block. Having too few rows per cluster key value can waste space and result in negligible performance gains. Cluster keys that are so specific that only a few rows share a common value can cause wasted space in blocks, unless a small SIZE was specified at cluster creation time (see below).
Too many rows per cluster key value can cause extra searching to find rows for that key. Cluster keys on values that are too general (for example, MALE and FEMALE) result in excessive searching and can result in worse performance than with no clustering.
A cluster index cannot be unique or include a column defined as LONG.
See Also: For information about characteristics of a good index, see "Guidelines for Managing Indexes" .
Specify Data Block Space Use
By specifying the PCTFREE and PCTUSED parameters during the creation of a cluster, you can affect the space utilization and amount of space reserved for updates to the current rows in the data blocks of a cluster's data segment. Note that PCTFREE and PCTUSED parameters set for tables created in a cluster are ignored; clustered tables automatically use the settings set for the cluster.
See Also: For more information about setting PCTFREE and PCTUSED, see "Managing the Space Usage of Data Blocks" .
Specify the Space Required by an Average Cluster Key and Its Associated Rows
The CREATE CLUSTER command has an optional argument, SIZE, which is the estimated number of bytes required by an average cluster key and its associated rows. Oracle uses the SIZE parameter when performing the following tasks:
- estimating the number of cluster keys (and associated rows) that can fit in a clustered data block
- limiting the number of cluster keys placed in a clustered data block; this maximizes the storage efficiency of keys within a cluster
SIZE does not limit the space that can be used by a given cluster key. For example, if SIZE is set such that two cluster keys can fit in one data block, any amount of the available data block space can still be used by either of the cluster keys.
By default, Oracle stores only one cluster key and its associated rows in each data block of the cluster's data segment. Although block size can vary from one operating system to the next, the rule of one key per block is maintained as clustered tables are imported to other databases on other machines.
If all the rows for a given cluster key value cannot fit in one block, the blocks are chained together to speed access to all the values with the given key. The cluster index points to the beginning of the chain of blocks, each of which contains the cluster key value and associated rows. If the cluster SIZE is such that more than one key fits in a block, blocks can belong to more than one chain.
Specify the Location of Each Cluster and Cluster Index
If you have the proper privileges and tablespace quota, you can create a new cluster and the associated cluster index in any tablespace that is currently online. Always specify the TABLESPACE option in a CREATE CLUSTER/INDEX statement to identify the tablespace to store the new cluster or index.
The cluster and its cluster index can be created in different tablespaces. In fact, creating a cluster and its index in different tablespaces that are stored on different storage devices allows table data and index data to be retrieved simultaneously with minimal disk contention.
Estimate Cluster Size and Set Storage Parameters
The benefits of estimating a cluster's size before creating one follow:
- You can use the combined estimated size of clusters, along with estimates for indexes, rollback segments, and redo log files, to determine the amount of disk space that is required to hold an intended database. From these estimates, you can make correct hardware purchases and other decisions.
- You can use the estimated size of an individual cluster to better manage the disk space that the cluster will use. When a cluster is created, you can set appropriate storage parameters and improve I/O performance of applications that use the cluster.
Whether or not you estimate table size before creation, you can explicitly set storage parameters when creating each non-clustered table. Any storage parameter that you do not explicitly set when creating or subsequently altering a table automatically uses the corresponding default storage parameter set for the tablespace in which the table resides. Clustered tables also automatically use the storage parameters of the cluster.
See Also: For information about estimating the size of schema objects, including clusters, see Appendix A.