Database architecture is critical to achieving many business objectives. These include application performance, business continuity, security, and geographic distribution. This article will explore another often overlooked objective that can be influenced by the database architecture: backup and recovery efficiency.
Very large databases are common in today’s business environment. Multi-terabyte databases are prevalent in all but the smallest organizations. Despite these large sizes, we tend to find that data that pertains to earlier time periods tends not to change, and tends to be used less frequently than more current data. When architecting the database, the age of the data, and therefore the possibility of data changing can be used as a factor in the physical database design that can improve backup and recovery efficiency and performance.
It is a given that all data must be backed up. But taking that backup puts load on the database that impacts application performance during the time that the backup is running. Common approaches to mitigating this impact include backing up from a standby database rather than the production database, taking offline database backups when the application is not available, and restricting the backup time to non-peak times so the machine resource usage is minimized.
However, in some environments, those options are not available. The first option, backup up from a standby database, may not be an option of you don’t have a high availability environment. Bringing the database down is not an option in a 24×7 production environment. And there are many databases that are so large that the time it takes to back up the entire database is simply too long and exceeds the non-peak times for the application.
Partitioning and Archiving
Another technique that may be used is to build partitioning and archiving into the database architecture. We can partition the data into physically separate tablespaces, and place each partition into a separate tablespace. This allows us to isolate data from past time periods that are kept for archiving purposes but are not frequently queried and are never going to be updated. These separate tablespaces can be backed up when the data reaches the point that it will not be changed again, and then it can be excluded from the normal backup routine. In many databases, older data represents a very large portion of the overall database, so such a scheme can significantly reduce the backup time, thereby significantly reducing the impact on the application.
There are a number of ways to exclude tablespaces from the backup after they have reached the point where they will not be updated again, including:
- Making the tablespaces readonly, and configuring Backup Optimization in Oracle RMAN. After this, RMAN will backup the the tablespace enough times to satisfy the retention policy and then will exclude them on subsequent backups.
- Using the RMAN command CONFIGURE EXCLUDE FOR TABLESPACE command. Once configured, the specified tablespace will be excluded from future backups. These tablespaces can be manually included explicitly in other backup sets to ensure that the data is backed up but they can be excluded from full backups.
Here is an example of how we might use this: lets say that we have an Oracle Database Application that collects traffic sensor data. Each day we collect a large set of data from traffic sensors from municipalities around the country. We have very large tables that contain hundreds of datapoints per sensor. Each table contains hundreds of gigabytes of data stretching back 10 years. The tables are partitioned so that a new partition is created for each month, and as the data is collected, it is automatically placed into the proper partition. At the beginning of each year, we can take a single backup of all the tablespaces that hold the data from the prior year. We know that data will never change, so we do not have to include those tablespaces in future backups. We can set these tablespaces as readonly, and then with backup optimization turned on, RMAN will then exclude them from subsequent backups, but will still enforce the backup retention policy so you wont lose backup sets that are necessary to restore those tablespaces. An added benefit is that the backup set each week will be significantly smaller thereby reducing disk requirements for the ongoing backup sets.
In addition to significantly reduced backup time, partitioning the data in this way also improves the efficiency of the restore process because if one partition fails, the others do not need to be restored. This can result in significant time savings during a restore.
There are other benefits to partitioning your data beyond the improvements to the backup and restore process. By separating older data which typically does not change, and is accessed less frequently, from the newer data, we have the ability to place the older data on less costly media. And regardless of the media type, there are performance benefits to separating data onto different drives/controllers (particularly useful when using separate storage arrays as opposed to SAN environments).
When architecting database, think about what the impact of the backup and recovery process will look like after 10 years. Architecting the backup and restore efficiency into the database design at that time will save lots of redesign later on.
If you are struggling with cumbersome backup and restore processes or are about to do a database design or redesign, please give us a call at (888) 809-4803 x 700 and if you have further thoughts on the topic, please add comments!
If you enjoyed this article please like and share!