Total System Failure: Why you should validate your database backup

Total System Failure: Why you should validate your database backup

Every now and then things happen that make us realize how important double checking things can be.

I had a recent experience in which four unrelated parties (including me) did the wrong thing regarding one business transaction. If any one of us had done the right thing, there would have been no problem. But all four parties dropped the ball, causing potentially serious problem. This is why it is important to leave nothing to chance. Here is the story.

A few months ago, our health insurance company asked us to send it a report detailing who was on our payroll last year. The following sequence of mistakes turned this simple request into potentially serious problem for our business.

Mistake number 1: I put that request that I received from the insurance company to the side and did not send the report to them by the due date. This would normally be OK, because the insurance company sends out a certified letter in the event that they do not receive the report in time.

But…

Mistake number 2: We have a mailing address and a different physical address. For some inexplicable reason, the insurance company sent out the certified letter to our physical address instead of our mailing address, where they send all other correspondence, and where we would expect important business mail to arrive. This would normally be OK because if we receive mail to our physical address, it will eventually get to our bookkeeper and be addressed.

But…

Mistake number 3: The US Postal service delivered the certified letter notices to wrong suite They delivered it to the neighboring business instead of to us. This happens occasionally. I dont understand why this happens but since most of our mail comes to our mailing address, and our neighbor usually brings our mail over when this happens, it is normally not a problem.

But…

Mistake number 4: For some reason that puzzles me most of all, the neighboring business never bothered to tell me or the mailman that they received two certified letter notices for me over a month ago! While I acknowledge that this party had the least obligation in the transaction, I am shocked at the lack of consideration that they showed.

So, what does all this have to do with your Database?

All four parties involved in the above scenario had a well defined job to do, or at least a well understood and expected course of action. All are trusted parties that usually do what is expected of them. We had every reason to believe that all four parties involved would perform their jobs properly. And yet not one of the parties involved did the right thing.

The same can be true of your disaster recovery processes. Just because you have specified what should happen with your database backup, and who should do it, does not mean it us actually being done!

As I mentioned in an earlier post, when we examine a client’s disaster recovery posture, we find in a large percentage of systems that the backups are not working the way the client thinks they are, and in many cases, they are not working at all!

So be sure to have an independent party validate that you are really as protected as you think you are.

Important Steps to take to validate database backups.

  1. Validate Database Backup Schedules.
  2. Validate that ALL required databases and Schemas are being backed up
  3. Validate that ALL database stored procedures, etc are being backed up
  4. Validate that all external files (such as Oracle control and redo log files and SQL Server transaction logs) are included properly in the backup.
  5. Validate that the database files are placed into a backup-safe state (eg, Hot Backups in Oracle) prior to backing them up.
  6. Perform Test Restores using actual backup files from the nightly production backup routine to ensure that the backups are usable in the event of a disaster.

Bottom line: Validate that your backups really do what you think they are doing, and that they will enable you to recover fully when the inevitable system failure occurs.

Total System Failure: Why you should validate your database backup

Database Downtime: Prepare For The Unexpected

Test your Assumptions: Database Backup and Recovery

Every now and then something happens that really drives home the need to test and validate the assumptions that we have about the availability of our IT assets. I had an experience this week that brought to mind the importance of testing backup and recovery and business continuity plans.

Planning

At the beginning of each week, I look at each day’s activities and any travel that I need to do that week, and I plan out how I can be most productive each day. I find myself on the train often now between our offices in New Jersey and Boston and I have come to take my wifi service for granted.  I rarely have down time when traveling that way any longer.

Last month, while traveling to San Antonio by air, I was able to use wifi in the air, just as I can on the ground on the train.

Then last week, while planning a trip by air to Austin from Philadelphia,  I decided to make good use of the flight time. I planned to use the roughly four hours of flight time to get some work done that I will need for the next day.

Assuming

After I boarded the flight however, I found out that a huge assumption that I made was not correct. I found that not all airlines have wifi!

So now as I sit on the plane writing this post into a word document,  I am completely disconnected from the web, from my office, from my clients!

The problem here is not that I am not connected for a few hours,  it is that I did not anticipate that this might happen, and so I planned to use that time to get some important work done.  I assumed that the flight had wifi, and I did not validate that assumption!

Think about what will happen if you (or your customers) don’t have access to your servers for a few hours.  It can be that the connectivity to the servers was interrupted, as in my case,  or that the servers are completely down, or that your database software is hanging.  Ask yourself what will happen during those hours, and what you can do to avoid them in the first place.

Validating

Validating your assumptions is key to productivity. In this case, it is only one person whose productivity is compromised today, but consider the cost if your whole company is down for a few hours. What are you taking for granted?

So what does this have to do with the database?

This is a database blog, so you might be asking what this has to do with the database.  In the database business, we see people fail to validate assumptions all the time. A typical (and very dangerous) assumption people make is that their database is properly backed up and ready to restore in a disaster. As I describe in this blog post, this is rarely the case.  This is one of the most important assumptions for any company to validate.

If you haven’t tested your backup procedures lately we can help you validate that your database is indeed protected the way you assume it is.

Query Rewrite: What is MySQL Doing To My Queries?

The Impact of Query Rewrite on MySQL Query Optimization

When processing a query, every relational database vendor takes a number of similar steps to parse, validate, and optimize a query. MySQL does the following:

  1. Validates the query syntax — does it use valid operators, are the clauses in the proper order, etc.
  2. Validates the query semantics — are the the objects (tables, views, columns) valid database objects in the database, etc.
  3. Transforms the query — rewrites the query so the optimizer can more easily choose the best path.
  4. Optimizes the query — determines the best execution path based on a large number of factors including statistics about the table and index data.

When analyzing a database performance problem, it helps to have an understanding of what is happening during step 3 and 4 above.

Occasionally I find a blog article that does a great job of explaining one these processes for a particular database vendor and I would like to share one that I found today. This excellent post was written by Guilhem Bichot and does a great job of illustrating the transformation (query rewrite) step and the impact of this step on the execution path selection.

This post describes the process in MySQL but the principles are similar for any database vendor.

Note that the example that Guilhem uses shows how the database itself may introduce a query hint while rewriting the query. I have mentioned in the past that I avoid hints because over time the usefulness and necessity of the hint may change and it may indeed hurt you. Of course, since the query rewrite process is executed in real time it is not subject to this problem.

Advantages and Disadvantages of Disk-Based Backup and Recovery

Advantages and Disadvantages of Disk-Based Backup and Recovery

Data Backup Advantages & Disadvantages

While tape-based backup approaches may still be optimal for long-term data retention, more and more database storage/backup administrators are moving from tape to disk technology for everyday data backup and data recovery. This is not surprising as disk is faster to write to and faster to recover from, as well as more reliable overall.

A further advantage of disk backup is the ability to perform deduplication on the data being backed up, which reduces both backup/recovery times and – perhaps most importantly – data volumes. Deduplication usually takes place during data transfer to disk. Some tools ship with the disk hardware and others with data backup software.

Many products are currently available that offer deduplication at the backup server level, which reduces network traffic between the backup server and the backup target but not between the database server and the backup server. Some products still perform deduplication post-transfer, at the backup target. While slower overall, this approach eliminates the chance that CPU-intensive deduplication processes will create a bottleneck between the backup server and the secondary storage target.

Emerging technology enables deduplication at the source (that is, on the server that hosts the database).  Ultimately, deduplication will likely take place as a function of primary storage rather than as a backup/archiving function.

Like many new technologies before them, deduplication and disk-based backup/recovery fill a long-standing need for improved speed and reduced cost. But many DBAs remain skeptical, and debate exists about how best to use these solutions and their impact on data protection and the rest of the IT environment.

Data Recovery Advantages & Disadvantages

The problem many DBAs have with disk-based backup is that it is not specific to the database. Unlike traditional logical/physical, file-based backups performed with database tools, with disk-based backups the database is not “aware” that a backup is occurring, so its state at the time of the backup may be unknown. Therefore, it is important to test and confirm that your databases can recover and come back up after being backed up in this new way.

While cutting-edge disk-based recovery approaches can restore gigabytes of data in seconds, they generally can’t restore a single database table. In other words, disk-based backup doesn’t (yet) replace the logical backup methods that DBAs have used for years.

At least one product family, EMC RecoverPoint, addresses the need for logical backups by enabling “any-point-in-time recovery to protect data against loss or corruption.” RecoverPoint’s features include policy-based management. Data files can be isolated in logical groupings within the backup, to facilitate recovery of specific data.

A holistic approach to protecting your data can include both logical backups at the database level, combined with disk-based backups of your raw data.

For more information on what disk-based backup and deduplication approach is right for your business, contact Buda Consulting.

Five Leading Causes of Oracle Database Performance Problems

Five Leading Causes of Oracle Database Performance Problems

As we evaluate our clients’ Oracle databases and help them solve problems relating to performance and stability, we see many causes for these issues.

However, there are a small number of basic issues that tend to cause problems on the majority of systems.

Our Tech Tips document describes these issues and discusses methods of identifying them and resolving them.

Topics include:

  • Statistics Gathering
  • Sga Sizing
  • Redo Log Sizing
  • Index Usage
  • Extent Management