Oracle Performance Tuning — Trust but Verify

Oracle Performance Tuning — Trust but Verify

During an Oracle Performance Tuning engagement in the past, I learned a valuable lesson about validating the information that you are given during a database evaluation.

About ten years ago, a client called me with a serious problem. They had just completed development on a new Oracle application at a large pharmaceutical company and the new system had been in production for a week or two.

They could not understand why performance had degraded so rapidly during that period and the system was almost unusable by the time that I became involved .

The developers had been working in the development environment with similar data volumes on a system that was a mirror of production, and yet the performance on the production machine was very poor and was getting worse by the day. They called us in to help them find out what was wrong.

Of course, one of the first things that I asked about was the gathering of database statistics. Stale statistics can cause very poor Oracle query performance and the fact that the performance was getting worse daily was a pretty good indication of bad statistics.

I asked the DBA whether statistics were up to date and he assured me that they were. He said that he ran a job daily to collect statistics and that he had confirmed that it was running.

With that assurance, I went about seeking other causes for the problems. I checked the SGA size, index usage, and other factors that can lead to poor Oracle Database Performance.

After some time, I decided to check the stats myself because things just were not adding up. I found by looking in the Oracle Data Dictionary that on one of the key tables, the statistics had never been taken!

With this information, I asked the DBA again to confirm that stats were being collected. He confirmed that they were and even showed me the log file of the job that runs each night to collect statistics. This was before Oracle had the dba_stats package and he was using the Analyze command to collect statistics.

He was correct, there was an Analyze command that was running every night. However, the command that he was running included the ‘validate structure’ option. When this option is used, the Analyze command does just that, it validates the structure of the object, but does not collect optimizer statistics.

We removed the ‘validate structure’ option, reran the job to gather statistics, and the system performance returned to the expected level.

So the DBA answered in good faith that the stats were being collected because he thought they were. But they were not because he was using the wrong command.

This story illustrates the importance when doing a database performance review, or a database security audit, that even though you trust the persons responsible for the system, you must verify the results yourself.

Oracle Performance Tuning: Periodically reorganize database objects

One of our Oracle Consultants shared a story with me today that illustrates the importance of periodically reorganizing objects in an oracle database.

He was examining a performance problem with a very small oracle table. The table had only 154 rows!. Hardly a large table, and yet it was taking over a minute to do a full table scan.

After some investigation, he found that in the past the table had over a million rows. These rows had been deleted but the table was never reorganized to remove the unused space. The table was taking up over 130,000 blocks of space, for 154 rows.

Not only was this a great deal of wasted space that could have been used for something else, it also caused a serious performance problem.

He copied the table to a temp table, truncated the original, and then reimported the table data to reorganize the table, and the performance problem was solved.

There are a few methods that can be used to reorganize an Oracle table. These include:

  1. Using ‘Create table as select * from …’, truncating the table, and then re-inserting from the copy, as was done in this case.
  2. Using export/import to dump the table out to an external file and then bring it back in.
  3. Unload the data to a flat file and use sqlloader to bring it back in.

Each of these options has its pros and cons and we will explore these in more detail in an upcoming white paper.

The important thing to remember is that no matter how you reorganize your database objects, best practice is to be sure that this is done on a periodic basis to avoid space and performance problems.

Oracle Performance Tuning — Log Sync Waits

Oracle Performance Tuning — Log Sync Waits

I recently worked with a client to resolve a significant performance issue related to painfully slow queries in an Oracle database. One of the queries that they wished to execute was taking over five hours to run.

This is an Oracle 10g database running on a Solaris platform. The server has 8 processors so there is plenty of processing power. The database is very large with a very high insertion rate. The database is about 10 Terabytes and most of the data gets inserted into one large table and is partitioned by date, with a few partitions per day.  Therefore the query, which specifies a small date range should only need to look in one or two partitions, so queries should be relatively fast.

Before I became involved, the in-house dba looked at a number of the typical issues. He confirmed that there were appropriate indexes on the table, and that the query was expected to access only one partition. The cause of the significant execution time was not clear.

When I became involved, I was given limited access to the server. I was not given DBA access but I did have select access to the catalog tables so I was able to poke around a bit.

I also received execute privilege to dbms_workload_repository so I could create AWR snapshots and reports. Fortunately, the client had set up hourly snapshots so I was able to select snapshots before and after the query ran. The resulting report is where I found the clue to one cause of the problem.

I ran a report for an 9 hour period during which the query executed.

The top wait event in the AWR report for the day was log sync waits. These were very high, 1,104,069 with a total wait time of 72,781 seconds.

There are a number of causes of log file wait events. Some of these are described in this excellent post.

However, in this case, the cause was something else.

Because of a similar situation that I saw in the past with very high insertion rates, I thought that frequent log file switches may be the cause. I checked v$loghist and found that the logs were switching every five seconds! The optimal frequency of log switches is twenty minutes so this clearly a problem.

The log size was set at 131M which was fine in the past when the insertion rates were lower, but as the rates grew, the logs were never increased to handle the higher rates.

We first increased the logs to 1G and the log switches dropped to every 2-3 minutes. Much better but still far too high. We increased them again to 4G and now the logs switch every 30-40 minutes. This is higher than it needs to be and we will now reduce the size a bit in order to reduce the recovery time in the event of instance failure.

After this change, the log file sync rates disappeared from the top wait events and the system performance has improved significantly. The query that was taking 4-5 hrs is now taking about 3 minutes.

I cannot be certain that insufficient log file size was the only cause of the performance problems because other steps were taken simultaneously including refreshing statistics which can have a significant impact as well.

However, I am certain that the insufficient log size was the cause of the log file sync waits And with a total of 72,781 seconds of wait time over nine hour period (that is about 2.25 hours of wait per processor if it is distributed evenly), it very likely had a significant impact on overall system performance as well.

Five reasons why I avoid database query hints

I know that some will disagree with me on this but I have never been a fan of database query hints.

Most database vendors give you the ability to override the query optimizer with hints that you specify in a query. Each query can be executed in a variety of ways by the database engine. Normally, the optimizer evaluates the statistics about the data in each of the tables involved in the query and makes a decision about the best way to execute that query. Hints enable you to direct the database engine to use a specific approach, even if it prefers a different one.

Here are five reasons that I avoid hints.

Reason Number 1. I view hints as a workaround to a problem that should be solved in other ways. There are a few problems that may lead to the need to use a hint:

One is that the database statistics may not be up to date which prevents the optimizer from making the best choices. The obvious solution to this problem is to ensure that all appropriate statistics are collected frequently.
Another problem may be with the database or application design. If the query optimizer cannot choose a path that will execute the query quickly enough, we may need to look at the use of indexes or even the data model itself. Does the data model actually represent the true relationship between data or is the model causing the application to do more work than it should?
Finally, all major database vendors upgrade and improve their optimizers over time. For example, Oracle’s optimizer has improved very significantly over the past few major releases. If you are not on a recent release of you database software, Upgrade! When you do, rip out your hints and see how the new optimizer does, you may be surprised.
Reason Number 2: Database vendors change their optimizers with new releases. Hints that have good results in one release can have poor results or even cause a query to break in later releases.

Reason Number 3: As data volumes grow and statistics are updated in the database, hints prevent the optimizer from selecting new and better execution paths as the result of the change in the characteristics of the data.

Reason number 4: As much as we would like to think that we are smarter than Microsoft or Oracle, most of us have not put millions of dollars and many years into query optimization technology as they have. Since we are paying for this through our license fees, we should take advantage of that investment.

Reason number 5: Now the reason that led me to write this post: This excellent blog post describes situations where a hint can cause a query to succeed when one set of criteria is given and to fail when other criteria is given. This type of problem is very difficult to diagnose because it is sporadic and depends on specific criteria. This problem occurs on SQL Server and in that environment, this issue alone is enough to rule out the use of query hints.

Bottom line: use query hints only as a last resort. Getting to and fixing the root cause of the problem will ensure that the query will continue to perform well into the future.