The Parallel Query Option (PQO) is best used for systems where large, intense queries are being performed or where large amounts of data are being loaded, indexed and queried, such as in data warehousing.
In general, 80% of the work involved with the parallel query option usage will come on the front end of the process, the design and physical implementation of the data base itself.
The remaining 20% of the work will be the actual application and its use of hints and the PARALLEL table and cluster creation options.
Parallel Query Option
Oracle can use the parallel query option for:
- SELECT statements
- Subqueries in UPDATE and DELETE statements
- Subqueries in INSERT and CREATE TABLE statements
- CREATE INDEX statements
- When doing parallel loads of data
The system consists of a Query Coordinator (QC) and a set of query servers. All queries are resolved through the QC and it makes the decision whether the query should be split or left alone. The number of query servers is dynamic and controlled by the load as monitored by the QC. Oracle also uses dynamic partitioning for the CPUs and load balancing.
Under some parallel architectures CPU A manages rows 1-10,000, CPU B manages 10,001-20,000, etc. If you need data from rows not managed by a specific CPU, it sits idle. On Oracle all CPUs participate in a parallel operation, speeding up all operations.
The QC process breaks the query into parallel pieces which are then sent to the query servers. The query servers perform their part of the transaction and then return the results to the QC for assembly into the finished results. The QC handles the parsing of the statement.
The number of query servers utilized by the QC in processing a single query is the parallelism of the query. This is determined by:
- Hints specified in the query
- Table definition
- Initialization parameters
Generally speaking the query must involve at least one full table scan before it is considered for parallel processing. Once a statement is parsed and the execution plan is determined, certain parts of an execution plan are candidates for parallel execution. These are:
- AGGREGATE (GROUP BY)
- MERGE JOIN
- NESTED LOOPS
- SORT (GROUP BY)
- SORT (JOIN)
- SORT (ORDER BY)
- SORT (UNIQUE)
- TABLE ACCESS (FULL)
The QC looks at partitioning requirements as well as execution plans. Based on partitioning requirements, the order of operations is determined by the QC. Each major operation such as a full table scan or merge join may have multiple servers assigned to it, thus each level of an execution plan my have its own degree of parallelization.
A developer can force parallelization through the use of the PARALLEL hint.
An additional hint NOPARALLEL is also provided which equates to:
/*+ PARALLEL (table_name, 1, 1) */
In addition to hints a table can be created with a specific degree of parallelism by using the PARALLEL clause in the CREATE TABLE, ALTER TABLE, CREATE CLUSTER, and ALTER CLUSTER commands.
Within the PARALLEL clause, the INSTANCES parameter is used to split the table blocks among the SGAs of a set of parallel instances in the parallel server. The number of instances participating in this parallel caching activity are specified with the new initialization parameters PARALLEL_MAX_SERVERS and PARALLEL_DEFAULT_MAX_INSTANCES or with the ALTER SYSTEM command option SET SCAN_INSTANCES and SET CACHE_INSTANCES.
The following initialization parameters determine the number of query servers and when the servers are reduced in number:
If an insufficient number of servers is specified in the PARALLEL_MAX_SERVERS parameter, a new query will be processed sequentially.
The parallel query option is most beneficial for:
- SMP or MPP systems
- Systems with high I/O bandwidth (many datafiles on many disks)
- Systems where CPU usage is generally under 30%
- Systems with sufficient memory to support multiple sorts
The amount of memory consumed by sorts for a parallel query is the number of servers multiplied by the SORT_AREA_SIZE parameter. This is important when sizing your system memory.
The parallel query option is monitored through the view V$PQ_SYSSTAT.
When using the parallel load utility through SQL*Loader, disable the automatic indexing features and create the indexes in parallel, then enable the constraints for the indexes. This will dramatically increase the index build speeds for primary and unique key constraints.