The purpose of this post is to show the relationship between the above 3 and the adverse effects it can have on database performance in Exadata Machines.
Situation - Client brings in a new Exadata Engineered System Box and based on the POC conducted is expecting a leap in batch processing performance. It's a x5-2 one eight rack with 2 node RAC.
Reality - Batch Processes go into endless loop taking more time then their current production box.
How we came to such a performance dip on Exadata -
In the initial rounds of testing when generating explain plans for certain queries, we came to know that the AUTO DOP feature was not being used by the optimizer.
How - You get the following line at the end of explain plan output
automatic DOP: skipped because of IO calibrate statistics are missing
On referring Oracle Support, the doc id 1269321.1 highlights the steps to take to generate calibration statistics.
The same were generated for diskgroup data, by finding its underlying disk count through v$asm_disk view.
Post This Operation, the parameter PARALLEL_DEGREE_POLICY was set to AUTO.
refer - http://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams181.htm
Now the client was asked to run the batch process again and BOOM ! suddenly there are 800 plus parallel threads combined, running on both instances of the database.
1st Round of Performance Tuning -
we suspected the sequences and ITL waits to be causing the issue. So they were increased in cache value and initrans respectively.
2. Round of Performance Tuning -
The only waits observed this time were parallel processing waits. Still the process was taking more time than production.
Why a 4 year old box is giving better performance than Exadata
The answer was quite astonishing, It just had a cpu_count of 10.
Wow ! so you ask how a 10 CPU machine can finish jobs in less than the current generation Exadata ?
The answer was that It could just not afford to run queries in excessive parallel threads.
Lets simulate something similar on Exadata and see what happens
Step 1 - remove IO calibration statistics. How ?
delete from resource_io_calibrate$
Step 2 - change init parameters responsible for parallelism
parallel_max_servers set to 400 - original value 970
parallel_servers_target set to 200 - original value 576.
parallel_degree_policy left to AUTO.
The Batch Processing was run again and here is a sample of time comparison
Process Production Time Exadata Time
pre-process1 2hr 33 min 40 min
lp 5hr 26 min 1hr 14 min
Note - The stark performance improvement above was without doing any kind of query tuning/ indexing etc.
So Who is the main culprit ? Exadata ? Parallelism ?
The answer is neither Exadata nor Parallelism. Its the excessive parallelism caused by queries with just parallel hint (without any degree limit) fired on Tables with parallel enabled when combined with AUTO DOP feature of Exadata and a large CPU Count.
Think of it as Mentos popped in Cola bottles !
Situation - Client brings in a new Exadata Engineered System Box and based on the POC conducted is expecting a leap in batch processing performance. It's a x5-2 one eight rack with 2 node RAC.
Reality - Batch Processes go into endless loop taking more time then their current production box.
How we came to such a performance dip on Exadata -
In the initial rounds of testing when generating explain plans for certain queries, we came to know that the AUTO DOP feature was not being used by the optimizer.
How - You get the following line at the end of explain plan output
automatic DOP: skipped because of IO calibrate statistics are missing
On referring Oracle Support, the doc id 1269321.1 highlights the steps to take to generate calibration statistics.
The same were generated for diskgroup data, by finding its underlying disk count through v$asm_disk view.
Post This Operation, the parameter PARALLEL_DEGREE_POLICY was set to AUTO.
refer - http://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams181.htm
Now the client was asked to run the batch process again and BOOM ! suddenly there are 800 plus parallel threads combined, running on both instances of the database.
1st Round of Performance Tuning -
we suspected the sequences and ITL waits to be causing the issue. So they were increased in cache value and initrans respectively.
2. Round of Performance Tuning -
The only waits observed this time were parallel processing waits. Still the process was taking more time than production.
Why a 4 year old box is giving better performance than Exadata
The answer was quite astonishing, It just had a cpu_count of 10.
Wow ! so you ask how a 10 CPU machine can finish jobs in less than the current generation Exadata ?
The answer was that It could just not afford to run queries in excessive parallel threads.
Lets simulate something similar on Exadata and see what happens
Step 1 - remove IO calibration statistics. How ?
delete from resource_io_calibrate$
Step 2 - change init parameters responsible for parallelism
parallel_max_servers set to 400 - original value 970
parallel_servers_target set to 200 - original value 576.
parallel_degree_policy left to AUTO.
The Batch Processing was run again and here is a sample of time comparison
Process Production Time Exadata Time
pre-process1 2hr 33 min 40 min
lp 5hr 26 min 1hr 14 min
Note - The stark performance improvement above was without doing any kind of query tuning/ indexing etc.
So Who is the main culprit ? Exadata ? Parallelism ?
The answer is neither Exadata nor Parallelism. Its the excessive parallelism caused by queries with just parallel hint (without any degree limit) fired on Tables with parallel enabled when combined with AUTO DOP feature of Exadata and a large CPU Count.
Think of it as Mentos popped in Cola bottles !