Donuts and Data Backups

The year was 1982. I was a Computer Science student by day and a baker by night (well, early morning). My boss Al at the Tiffany’s Bakery in the Staten Island Mall had asked me to help him figure out how much each of his products actually cost him, so he could understand which were the most profitable.

With my TRS-80 Model II computer and its 8-inch floppy hard drive, I got to work developing a program to provide the information Al needed.

It was a pretty cool program—the user would enter all the suppliers and the prices for the ingredients they supply, and the program would calculate an average price for a pound of flour, etc., across all suppliers.

The user would also enter the recipe and the yield for all products; i.e., how much sugar, flour, etc. went into the cake recipe and how many cakes the recipe yielded.

Out would pop the cost of each cake, Danish, cupcake and donut that the bakery sold.

It was a great little program built (I think) in Pascal. This was before database management systems like Oracle or SQL Server, even before DBASE and RBASE, so I built my own database into the application.

I was so proud of my creation. Then the day came for me to demonstrate the product to the boss. I still remember vividly how the night before I was working feverishly in my mother’s basement on a few last-minute touchups, getting everything ready for the big reveal.

But then…

I accidentally pressed Delete instead of Save. Sheer panic! That moment is seared into my memory.

I had no backup. I don’t even know if backup was “a thing” at the time.  I didn’t even have an old copy of the software named something different on the floppy drive anywhere. My program was gone, the whole thing! Weeks, maybe months of hard work disappeared in an instant.

Worse yet, I had already missed the first, and maybe even the second deadline. Al had been very patient but I had promised him it was really done this time, and now I had nothing to give him!

A wise friend once told me that nothing is ever as good or as bad as it seems. That was true of this disaster.  Al was very understanding and though it took many more hours of my time than I would have liked, I was able to rebuild the application, probably better than it was before. And it turned out to be very valuable to the bakery.

But I would not wish that feeling of dread on anyone. Ever since then, it has been my passion to make sure that everyone is protected against losing the applications or the data they spent their precious time creating.

So backup your work, double-check your backups, and test them on a regular basis.

Then go have a cup of coffee and a donut and think of this story with a smile knowing you are safe.

 

 

 

Why Your Database Management Team Should Regularly Double-Check Your Backups

I’ve blogged before about the importance of checking database backups. Over 90% of new clients that we assess have backups that are either incomplete or totally unusable (true statistic!).

The following story of a backup failure that happened recently—and the successful double-check by our DBA Scott)—provides a noteworthy example of how bad backups can happen even when you think your process is working.

Recently we had a client that was looking to reduce storage costs for their Oracle RDBMS system while still meeting a legally mandated seven-year retention requirement. They were also looking to leverage AWS S3 and Glacier.

The majority of their Oracle data resided in a single history table partitioned on date; and this data was rarely, if ever, accessed once it was more than one year old. Thus S3 and Glacier were a perfect fit for this use case.

It was decided that data would be retained in the RDBMS until it was three years old.  After that, the data would be exported via Oracle Data Pump and zipped via the Linux zip utility. (A side note: In case you’re wondering why we didn’t just go with Data Pump’s native compression functionality, testing the export/zip process via Data Pump yielded a 20% lower compression ratio versus the Linux zip utility.)

Scott set about finding a way to automate the process, using Oracle Data Pump to export partitions starting with the oldest partitions first. To get started, he did what any good DBA does—he scoured the internet using Google and came up with this great example code posted by Michael Dinh to use as a starting point.

The process we decided to use was to export the identified partition, check the return code from dbms_datapump.wait_for_job.job_state to ensure the export completed successfully, and then drop the partition from the database.

After many modifications to the example code, it was time to test. Scott tested what would happen if everything went well. He also tested what would happen:

    • If the utility tried to export to a missing OS directory
    • If the directory was full
    • If the directory was read-only
    • If the export dump file was removed during the export process
    • If the export process was killed while the export job was running

The testing went smoothly, and in each case dbms_datapump.wait_for_job always returned a code other than COMPLETED.  The only time the process would drop the partition was when the return code was equal to COMPLETED, so it appeared we were ready to put this process into production use.

What we did not account for was the possibility that an Oracle bug would somehow cause dbms_datapump to fail to export the table partition rows but still return a COMPLETED code to the calling process—which is exactly what was about to happen.

The routine ran perfectly for a few weeks. Fortunately, Scott continued to closely monitor the job runs. All of a sudden he noticed that all of the export files started to report the exact same size, which was very small.

After checking the process, we found the issue and opened a ticket with Oracle support. They informed us that they believed an Oracle bug was to blame and recommended we upgrade the database to a later release.

No reason was ever found for why this suddenly started to happen after weeks of continuous use.  We did, however, learn an important lesson: when it comes to dropping table partitions after a data pump export, never rely solely on the dbms_datapump.wait_for_job return code. Always take the extra step of interrogating the export log file for the number of rows exported and completed successfully output.

In addition, the fact that we had a good current RMAN backup of the database that contained the data in the dropped partitions made this a good learning experience instead of a tragedy.  Nevertheless, this experience illustrates the importance of frequently checking your exports.

By checking to make sure that the exports were good, Scott was able to ensure that we saved the most recent backup prior to the pruning so that we could recover the data in those partitions if we ever needed to.

This is an extreme case because Oracle actually provided an incorrect success code. In most cases, the backups were either:

    • Not configured to run at all, or
    • Configured to exclude part of the database, or
    • Backup files were relocated to new volumes that were not included in the operating system backup process, or
    • The operating system backup was used without informing the DBMS that this was taking place, rendering the backup unusable.

Whatever the reasons, they are very hard to predict. The only safeguard is to check weekly (or daily, in some mission-critical cases) to make sure that what you think is happening really is.

If you don’t double-check your backups, the false sense of security you have today will end in sudden panic when you need to restore the database and can’t.

Why You Should Not Thin Provision Your Production Storage

Why You Should Not Thin Provision Your Production Storage

Virtualization brought with it some other amazing technologies, one of which was thin provisioning for storage. Thin provisioning offers significant value by allowing an administrator to create a server with a large supply of storage, while actually allocating only what is currently needed.

Thin provisioning is a good option when used in a development environment or other scenario where only test data would reside on the virtual server. But there are some thin provisioning disadvantages for production environments because of the high potential for downtime and data loss.

Over-Subscription

Thin provisioning causes trouble when more storage is provisioned than is available in the underlying hardware. For instance, when you thin provision, you can store a 4 TB virtual disk on a 400 GB physical volume. As long as you use less than 400 GB of space inside the virtual volume, the setup works well. But what happens when you use 401 GB?

Over-subscription is what happens when you subscribe more storage than is available. This will cause I/O errors on your server, which can lead to irreparable damage to your virtual server and/or application. The damage is exacerbated in a production environment because it can result in partial or total data loss.

Two Ways To Remedy Over-Subscription

There are two approaches that help to remedy an over-subscription problem.

Increase Your Virtual Disk Repository

The first is to increase the over-subscribed virtual disk repository. This usually requires a SAN administrator to increase the repository volume in the SAN storage manager and rescan the volume via the SCSI bus.

Create A New Virtual Disk Repository

If you cannot extend the initial volume, the second approach is to create a new virtual disk repository and migrate some of your virtual servers to this new storage.

One of our Oracle clients is using Oracle virtualization software for storage. The original DBA (before we were involved), created 30 TB of thinly provisioned vdisks. But the entire size of the volume on which these vdisks were subscribed was only 20 TB. Everything worked fine until the total actual usage of those vdisks reached 20 TB, at which point I/O errors signaled that something was very wrong.

After gaining access to Oracle’s VM manager, we were able to see that the volume was oversubscribed. We corrected the problem using the second approach described above. We created a new disk repository and moved data files one at a time until everything was moved over and the space issue was resolved.

Conclusion

In summary, using thin provisioning can be wonderful if managed correctly with proper forecasting. But if not managed carefully, it can lead to negative disadvantages that outweigh the storage saving benefits.

If you’re using thin provisioning today, or looking for other ways to make the best use of your physical and virtual storage, contact Buda Consulting to talk over potential options and what’s right for your environment.

A Simple Way to Improve Data Security

Shrink Your Sensitive Data Footprint

Protecting data is hard. Knowing for sure that you have identified and mitigated every vulnerability takes a lot of work and constant vigilance.  The more servers you have to harden, the more databases you have to protect, the more work it is, and the more likely you will leave a hole somewhere. So it makes sense to reduce your sensitive data footprint as much as possible.

Data Masking

The concept of data masking has been around for a long time, but we have found that many companies don’t yet use it in their development environments. By having full production data sets in dev and QA environments,  we needlessly expand our sensitive data footprint and add significant vulnerability.

Think about it, our development environments are the least protected of all of our systems, and they are the most exposed. Developers frequently have full access to all the data in the development environment, and even have the ability to extract data, typically loading subsets of it into their own, even less protected, systems, in order to do local development and testing.  Many companies are now using cloud technology for development environments as well, which are outside of their corporate firewalls, adding even more risk.

This significant risk is relatively easily mitigated by masking all data in development and QA environments.

How it works

Typically, data masking vendors will have a tool that enables the discovery of sensitive data. Discovering this data is helpful even if you don’t plan to mask your data. The tools have default, expected, search patterns for sensitive data, and you can add your own as well if you feel that your system contains sensitive data that is not represented by the default patterns.

After discovering the data, you can specify a pattern or function that will be used to alter or obfuscate the data as it is stored in the database (or in the case of SQL Server, will alter it as it is extracted from the database). For example, you can choose to randomize the numbers within a phone number field in order to change the values but keep the formatting so development and testing can still be done using the data.

Data Masking offerings by Oracle and SQL Server

Oracle has a very robust Data Masking solution called Data Masking and Subsetting.  It is not cheap but the cost is not significant compared to the costs associated with a data breach.  And after you thoroughly mask the data, you no longer have to spend as much time and effort hardening the target environments, so the costs are further mitigated.  It works by identifying and then masking data that will be placed into development environments. So the development environments never have the real data.

SQL server’s data masking offering, Dynamic Data Masking (DDM), masks data as it is retrieved by sql queries.  This is less secure because the target databases do have the real data in them. Only users with unmask privileged users can view the real data but that does require additional security administration on the development environments. So while this is an improvement over having no masking, it is not as secure as using Oracle’s approach.

Third Party Data Masking Offerings

There are also third party tools for data masking such as DataVeil, which has a free version that includes some standard masking functions and a paid version that includes some more advanced masking functions. Data Veil works on both Oracle and SQL Server, as well as other databases.

Just Mask It

So the bottom line is that there are good options available for masking data in your development environments and doing so can significantly reduce your vulnerability to data leaks .  With any of these tools, it will take a little work up front to discover and to determine the proper masking functions for the data, but once done, it can be re-applied easily each time you refresh your dev environments and it is definitely worth the effort.

If you know of any other great data masking tools please leave a comment. If you would like to chat about data masking or have any questions you can find us at www.budaconsulting.com.

 

 

Time to Validate Your Database Backup

Another new client, Another bad backup

I originally wrote this post in July but I am updating it today because it is so important.  Yesterday we did an initial Oracle Database Health Check for another new client and found yet again that their backups are useless. This time the client was taking backups regularly, but was not preparing the database first and was just copying the database files. Unfortunately this renders the backup useless.

Here is the original post:

One of the first things that we do when engaging a new client for Oracle or SQL Server Database Administration support is to do an inventory of their current databases and to validate that they have a good backup and recovery scheme in place.  The vast majority of the time, we find that there are either no backups being taken at all, or that not all of the data and configuration files are being backed up properly, despite the fact that the client thinks they are.

Often, there was a proper backup in place at some time in the past, but over time, the configuration of the system changed rendering the backup useless. These changes include changes in disk configuration, addition of data files, and other changes, that were made at the operating system level, configured properly in the database, but never modified in the backup configuration.

It is a simple oversight that can have catastrophic effects. We recently spent over three months helping a new client recover many years worth of data that they thought they were backup up properly, but they were wrong!

A regularly scheduled review of the backup scheme and a test recovery is the best way to ensure that your Oracle or SQL Server database is being properly backed up.

Contact Buda Consulting today for a comprehensive Disaster Recovery review of your Oracle or SQL Server Databases.