Oracle7 Server SQL Reference

CREATE CLUSTER

Purpose

To create a cluster. A cluster is a schema object that contains one or more tables that all have one or more columns in common.

Prerequisites

To create a cluster in your own schema, you must have CREATE CLUSTER system privilege. To create a cluster in another user's schema, you must have CREATE ANY CLUSTER system privilege. Also, the owner of the schema to contain the cluster must have either space quota on the tablespace containing the cluster or UNLIMITED TABLESPACE system privilege.

If you are using Trusted Oracle7 in DBMS MAC mode, your DBMS label must dominate the label of the tablespace to contain the cluster. To create a cluster in another user's schema, your DBMS label must dominate the creation label of the owner of the schema.

Syntax

Keywords and Parameters

schema

is the schema to contain the cluster. If you omit schema, Oracle7 creates the cluster in your current schema.

cluster

is the name of the cluster to be created.

column

is the name of a column in the cluster key.

datatype

is the datatype of a cluster key column. A cluster key column can have any datatype except LONG or LONG RAW. You cannot use the HASH IS clause if any column datatype is not INTEGER or NUMBER with scale 0. For information on datatypes, see the section "Datatypes" .

PCTUSED

specifies the limit that Oracle7 uses to determine when additional rows can be added to a cluster's data block. The value of this parameter is expressed as a whole number and interpreted as a percentage.

PCTFREE

specifies the space reserved in each of the cluster's data blocks for future expansion. The value of the parameter is expressed as a whole number and interpreted as a percentage.

INITRANS

specifies the initial number of concurrent update transactions allocated for data blocks of the cluster. The value of this parameter for a cluster cannot be less than 2 or more than the value of the MAXTRANS parameter. The default value is the greater of the INITRANS value for the cluster's tablespace and 2.

MAXTRANS

specifies the maximum number of concurrent update transactions for any given data block belonging to the cluster. The value of this parameter cannot be less than the value of the INITRANS parameter. The maximum value of this parameter is 255. The default value is the MAXTRANS value for the tablespace to contain the cluster.

For a complete description of the PCTUSED, PCTFREE, INITRANS, and MAXTRANS parameters, see the CREATE TABLE command .

SIZE

specifies the amount of space in bytes to store all rows with the same cluster key value or the same hash value. You can use K or M to specify this space in kilobytes or megabytes. If you omit this parameter, Oracle7 reserves one data block for each cluster key value or hash value.

TABLESPACE

specifies the tablespace in which the cluster is created.

STORAGE

specifies how data blocks are allocated to the cluster. See the STORAGE clause .

INDEX

creates an indexed cluster. In an indexed cluster, rows are stored together based on their cluster key values.

HASHKEYS

creates a hash cluster and specifies the number of hash values for a hash cluster. Oracle7 rounds the HASHKEYS value up to the nearest prime number to obtain the actual number of hash values. The minimum value for this parameter is 2. If you omit both the INDEX option and the HASHKEYS parameter, Oracle7 creates an indexed cluster by default.

HASH IS

specifies a expression to be used as the hash function for the hash cluster.

The expression must:

evaluate to a positive value

contain one or more columns of datatype INTEGER or datatype NUMBER with scale 0.

The expression:

cannot reference user defined PL/SQL functions

cannot reference the following: SYSDATE, USERENV, TO_DATE, UID, USER, LEVEL, ROWNUM

cannot evaluate to a constant value

cannot contain a subquery

cannot contain columns qualified with a schema or object name (other than the cluster name)

If you omit the HASH IS clause, Oracle7 uses an internal hash function for the hash cluster.

The cluster key of a hash column can have one or more columns of any datatype. Hash clusters with composite cluster keys or cluster keys made up of non-integer columns must use the internal hash function.

PARALLEL

specifies the degree of parallelism to use when creating the cluster and the default degree of parallelism to use when querying the cluster after creation. See the parallel_clause .

CACHE

specifies that the blocks retrieved for this table are placed at the most recently used end of the LRU list in the buffer cache when a full table scan is performed. This option is useful for small lookup tables.

NOCACHE

specifies that the blocks retrieved for this table are placed at the least recently used end of the LRU list in the buffer cache when a full table scan is performed. This is the default behavior.

Usage Notes

A cluster is a schema object that contains one or more tables that all have one or more columns in common. Rows of one or more tables that share the same value in these common columns are physically stored together within the database.

Clustering provides more control over the physical storage of rows within the database. Clustering can reduce both the time it takes to access clustered tables and the space needed to store the table. After you create a cluster and add tables to it, the cluster is transparent. You can access clustered tables with SQL statements just as you can non-clustered tables.

If you cannot fit all rows for one hash value into a data block, do not use hash clusters. Performance is very poor in this circumstance because an insert or update of a row in a hash cluster with a size exceeding the data block size fills the block and row chaining to contain the rest of the row.

Generally, you should only cluster tables that are frequently joined on the cluster key columns in SQL statements. While clustering multiple tables improves the performance of joins, it is likely to reduce the performance of full table scans, INSERT statements, and UPDATE statements that modify cluster key values. Before clustering, consider its benefits and tradeoffs in light of the operations you plan to perform on your data. For more information on the performance implications of clustering, see the "Tuning SQL Statements" chapter of Oracle7 Server Tuning.

When you create a cluster in Trusted Oracle7, it is labeled with your DBMS label.

Cluster Keys

The columns defined by the CREATE CLUSTER command make up the cluster key. These cluster columns must correspond in both datatype and size to columns in each of the clustered tables, although they need not correspond in name.

You cannot specify integrity constraints as part of the definition of a cluster key column. Instead, you can associate integrity constraints with the tables that belong to the cluster.

Types of Clusters

A cluster can be one of the following types:

indexed cluster

hash cluster

Indexed Clusters

In an indexed cluster, Oracle7 stores rows having the same cluster key value together. Each distinct cluster key value is stored only once in each data block, regardless of the number of tables and rows in which it occurs. This saves disk space and improves performance for many operations.

You may want to use indexed clusters in the following cases:

Your queries retrieve rows over a range of cluster key values.

Your clustered tables may grow unpredictably.

After you create an indexed cluster, you must create an index on the cluster key before you can issue any Data Manipulation Language statements against a table in the cluster. This index is called the cluster index. For information on creating a cluster index, see the CREATE INDEX command

. As with the columns of any index, the order of the columns in the cluster key affects the structure of the cluster index.

A cluster index provides quick access to rows within a cluster based on the cluster key. If you issue a SQL statement that searches for a row in the cluster based on its cluster key value, Oracle7 searches the cluster index for the cluster key value and then locates the row in the cluster based on its ROWID.

Hash Clusters

In a hash cluster, Oracle7 stores together rows that have the same hash key value. The hash value for a row is the value returned by the cluster's hash function. When you create a hash cluster, you can either specify a hash function or use the Oracle7 internal hash function. Hash values are not actually stored in the cluster, although cluster key values are stored for every row in the cluster.

You may want to use hash clusters in the following cases:

Your queries retrieve rows based on equality conditions involving all cluster key columns.

Your clustered tables are static or you can determine the maximum number of rows and the maximum amount of space required by the cluster when you create the cluster.

The hash function provides access to rows in the table based on the cluster key value. If you issue a SQL statement that locates a row in the cluster based on its cluster key value, Oracle7 applies the hash function to the given cluster key value and uses the resulting hash value to locate the matching rows. Because multiple cluster key values can map to the same hash value, Oracle7 must also check the row's cluster key value. Note that this process often results in less I/O than the process for the indexed cluster because the index search is not required.

Oracle7's internal hash function returns values ranging from 0 to the value of HASHKEYS - 1. If you specify a column with the HASH IS clause, the column values need not fall into this range. Oracle7 divides the column value by the HASHKEYS value and uses the remainder as the hash value. The hash value for null is HASHKEYS - 1. Oracle7 also rounds the HASHKEYS value up to the nearest prime number to obtain the actual number of hash values. This rounding reduces the likelihood of hash collisions, or multiple cluster key values having the same hash value.

You cannot create a cluster index for a hash cluster, and you need not create an index on a hash cluster key.

Cluster Size

Oracle7 uses the value of the SIZE parameter to determine the space reserved for rows corresponding to one cluster key value or one hash value. This space then determines the maximum number of cluster or hash values stored in a data block. If the SIZE value is not a divisor of the data block size, Oracle7 uses the next largest divisor. If the SIZE value is larger than the data block size, Oracle7 uses the operating system block size, reserving at least one data block per cluster or hash value.

Oracle7 also considers the length of the cluster key when determining how much space to reserve for the rows having a cluster key value. Larger cluster keys require larger sizes. To see the actual size, query the KEY_SIZE column of the USER_CLUSTERS data dictionary view. This does not apply to hash clusters because hash values are not actually stored in the cluster.

Although the maximum number of cluster and hash key values per data block is fixed on a per cluster basis, Oracle7 does not reserve an equal amount of space for each cluster or hash key value. Varying this space stores data more efficiently because the data stored per cluster or hash key value is rarely fixed.

A SIZE value smaller than the space needed by the average cluster or hash key value may require the data for one cluster key or hash key value to occupy multiple data blocks. A SIZE value much larger results in wasted space.

When you create a hash cluster, Oracle7 immediately allocates space for the cluster based on the values of the SIZE and HASHKEYS parameters. For more information on how Oracle7 allocates space for clusters, see the "Schema Objects" chapter of Oracle7 Server Concepts.

Adding Tables to a Cluster

You can add tables to an existing cluster by issuing a CREATE TABLE statement with the CLUSTER clause. A cluster can contain as many as 32 tables, although the performance gains of clustering are often negated in clusters of more than four or five tables.

All tables in the cluster have the cluster's storage characteristics as specified by the PCTUSED, PCTFREE, INITRANS, MAXTRANS, TABLESPACE, and STORAGE parameters.

Example I

The following statement creates an indexed cluster named PERSONNEL with the cluster key column DEPARTMENT_NUMBER, a cluster size of 512 bytes, and storage parameter values:

CREATE CLUSTER personnel 
	( department_number  NUMBER(2) ) 
	SIZE 512 
	STORAGE (INITIAL 100K NEXT 50K PCTINCREASE 10)

The following statements add the EMP and DEPT tables to the cluster:

CREATE TABLE emp 
	(empno     NUMBER        PRIMARY KEY, 
	 ename     VARCHAR2(10)  NOT NULL 
	                         CHECK (ename = UPPER(ename)), 
	 job       VARCHAR2(9), 
	 mgr       NUMBER        REFERENCES scott.emp(empno), 
	 hiredate  DATE          CHECK (hiredate >= SYSDATE), 
	 sal       NUMBER(10,2)  CHECK (sal > 500), 
	 comm      NUMBER(9,0)   DEFAULT NULL, 
	deptno   NUMBER(2)    NOT NULL ) 
	CLUSTER personnel (deptno) 
 
CREATE TABLE dept 
	(deptno  NUMBER(2), 
	 dname   VARCHAR2(9), 
	 loc     VARCHAR2(9))
	CLUSTER personnel (deptno)

The following statement creates the cluster index on the cluster key of PERSONNEL:

CREATE INDEX idx_personnel ON CLUSTER personnel

After creating the cluster index, you can insert rows into either the EMP or DEPT tables.

Example II

The following statement creates a hash cluster named PERSONNEL with the cluster key column DEPARTMENT_NUMBER, a maximum of 503 hash key values, each of size 512 bytes, and storage parameter values:

CREATE CLUSTER personnel

	( department_number  NUMBER )

	SIZE 512  HASHKEYS 500 
	STORAGE (INITIAL 100K  NEXT 50K  PCTINCREASE 10)

Because the above statement omits the HASH IS clause, Oracle7 uses the internal hash function for the cluster.

Example III

The following statement creates a hash cluster named PERSONNEL with the cluster key comprised of the columns HOME_AREA_CODE and HOME_PREFIX, and uses a SQL expression containing these columns for the hash function:

CREATE CLUSTER personnel 
 ( home_area_code  NUMBER,
   home_prefix     NUMBER ) 
  HASHKEYS 20
  HASH IS MOD(home_area_code + home_prefix, 101)