Why Your Database Management Team Should Regularly Double-Check Your Backups

I’ve blogged before about the importance of checking database backups. Over 90% of new clients that we assess have backups that are either incomplete or totally unusable (true statistic!).

The following story of a backup failure that happened recently—and the successful double-check by our DBA Scott)—provides a noteworthy example of how bad backups can happen even when you think your process is working.

Recently we had a client that was looking to reduce storage costs for their Oracle RDBMS system while still meeting a legally mandated seven-year retention requirement. They were also looking to leverage AWS S3 and Glacier.

The majority of their Oracle data resided in a single history table partitioned on date; and this data was rarely, if ever, accessed once it was more than one year old. Thus S3 and Glacier were a perfect fit for this use case.

It was decided that data would be retained in the RDBMS until it was three years old.  After that, the data would be exported via Oracle Data Pump and zipped via the Linux zip utility. (A side note: In case you’re wondering why we didn’t just go with Data Pump’s native compression functionality, testing the export/zip process via Data Pump yielded a 20% lower compression ratio versus the Linux zip utility.)

Scott set about finding a way to automate the process, using Oracle Data Pump to export partitions starting with the oldest partitions first. To get started, he did what any good DBA does—he scoured the internet using Google and came up with this great example code posted by Michael Dinh to use as a starting point.

The process we decided to use was to export the identified partition, check the return code from dbms_datapump.wait_for_job.job_state to ensure the export completed successfully, and then drop the partition from the database.

After many modifications to the example code, it was time to test. Scott tested what would happen if everything went well. He also tested what would happen:

    • If the utility tried to export to a missing OS directory
    • If the directory was full
    • If the directory was read-only
    • If the export dump file was removed during the export process
    • If the export process was killed while the export job was running

The testing went smoothly, and in each case dbms_datapump.wait_for_job always returned a code other than COMPLETED.  The only time the process would drop the partition was when the return code was equal to COMPLETED, so it appeared we were ready to put this process into production use.

What we did not account for was the possibility that an Oracle bug would somehow cause dbms_datapump to fail to export the table partition rows but still return a COMPLETED code to the calling process—which is exactly what was about to happen.

The routine ran perfectly for a few weeks. Fortunately, Scott continued to closely monitor the job runs. All of a sudden he noticed that all of the export files started to report the exact same size, which was very small.

After checking the process, we found the issue and opened a ticket with Oracle support. They informed us that they believed an Oracle bug was to blame and recommended we upgrade the database to a later release.

No reason was ever found for why this suddenly started to happen after weeks of continuous use.  We did, however, learn an important lesson: when it comes to dropping table partitions after a data pump export, never rely solely on the dbms_datapump.wait_for_job return code. Always take the extra step of interrogating the export log file for the number of rows exported and completed successfully output.

In addition, the fact that we had a good current RMAN backup of the database that contained the data in the dropped partitions made this a good learning experience instead of a tragedy.  Nevertheless, this experience illustrates the importance of frequently checking your exports.

By checking to make sure that the exports were good, Scott was able to ensure that we saved the most recent backup prior to the pruning so that we could recover the data in those partitions if we ever needed to.

This is an extreme case because Oracle actually provided an incorrect success code. In most cases, the backups were either:

    • Not configured to run at all, or
    • Configured to exclude part of the database, or
    • Backup files were relocated to new volumes that were not included in the operating system backup process, or
    • The operating system backup was used without informing the DBMS that this was taking place, rendering the backup unusable.

Whatever the reasons, they are very hard to predict. The only safeguard is to check weekly (or daily, in some mission-critical cases) to make sure that what you think is happening really is.

If you don’t double-check your backups, the false sense of security you have today will end in sudden panic when you need to restore the database and can’t.

An Overlooked Oracle Management Task—Support Identifier Management

An Overlooked Oracle Management Task—Support Identifier Management

One of the most overlooked tasks that can become an issue outside of your RDBMS environment is managing your Oracle customer support identifier (CSIs).  CSIs are added to your support.oracle.com account. They allow you to perform activities from creating a service request or looking up information in a global knowledge database. When a CSI is added to an account for the first time, the person applying automatically becomes the main administrator for the CSI.

While it makes sense that a manager/supervisor would become the administrator for a CSI, this is where things often take a turn for the worse.  This is because a manager/supervisor is less likely to be the one actually creating a service request or assigning accounts to services such as Oracle Enterprise Manager or Oracle Secure Backup Web Services.  Also, what happens when a manager/supervisor retires or leaves their position for a new one?  Often the administrator privilege is not passed to a new employee, which puts your account at risk.  How?  It leaves you unable to make account information changes to your systems or users in Oracle’s CSI Administrator portal.

If there is no one with administrator access, new employees cannot be granted access to your CSI, old employees cannot be removed from a CSI and changes to SR level data cannot be modified.  Allowing old employees access to your CSI also poses a security threat if said user has administrative rights.

Oracle Customer Support Identifier Best Practices

To mitigate these issues, I recommend the following best practices to make sure you do not run into any CSI issues:

      • Make all users of a database administration group CSI administrators – This makes sure your company will always have access to make CSI changes even if any one of the employees leave.
      • Set account expiration dates on your accounts if a user is leaving – This will automatically disable a user’s access to your CSI.
      • Create a service account that only has USER level access to your CSI. This account can be used to connect such services as Oracle’s Enterprise Manager or Oracle Secure Backup Web Services module. A service account allows first- and third-party services to continue to operate even when an employee leaves.

If you are looking for a best-practice approach to on-site or remote DBA services, contact Buda Consulting. Our experienced staff of certified Oracle professionals can address all your DBA requirements.

This post was written by Willie Gray, a member of Buda Consulting’s DBA team.

Does Data Governance Make You Shudder?

At a recent vendor conference, I found myself talking with a varied group of technology professionals. Two were technology generalists, one was a data engineer, one was responsible for transportation technology at a major university (think autonomous vehicles, traffic sensors, etc.), another was responsible for university student and teacher data (lesson plans, research findings, etc.), and one was responsible for his organization’s IT security. 

During the conversation, someone mentioned data governance. Immediately there was a conspicuous and collective sigh around the table.
Our group clearly found the subject intimidating and uncomfortable.

Why does the mere mention of data governance invoke that kind of response? 

One reason is probably that the potential scope of a data governance effort is so wide. It could basically involve every possible task associated with data management. 

Further, the word “governance” emphasizes the importance of taking those tasks seriously, and getting them right. So when you combine “there’s a lot to do” with “and it’s all important,” fear kindles in the hearts of those responsible.

And rightly so: the consequences of poor data governance are significant. They range from regulatory fines and sanctions for failing to adequately protect data or for noncompliance, to the insidious costs of bad data quality, such as missed business opportunities due to poor decision-making or lost customers due to low service levels.

But there are a lot of “big and important” topics in IT, and they don’t all make a diverse group of seasoned professionals wince. I decided to do some research and dig a little deeper into why data governance seems to be outside our collective comfort zone.

One thing that came up right away is that data governance is defined and described in diverse ways. Moreover, the terms used to describe the activities or responsibilities that comprise data governance aren’t defined or used the same way by everyone. Anytime I tried to define a term, I’d find another term that meant the same thing… sometimes, depending on context. In other words, the definitions tend to morph according to where one looks at them from (our viewpoint).

That variability and inconsistency made just framing this blog post difficult—never mind a program that “…includes the people, processes and technologies needed to manage and protect the company’s data assets…” and impacts an organization at strategic, tactical and operational levels. 

Indeed, there’s an axiom in management theory that “You can’t manage what you can’t name.” Further, “You can’t properly manage what you don’t define explicitly.” In other words, how you define a data governance program will significantly impact your ability to manage it successfully.

Given that a key element of data governance is ensuring the consistency of data definitions across an organization, I find it ironic that we don’t have consistent, agreed definition of terms for the components of data governance itself.

Normally when I write about a complex topic, I break it down into a list of subtopics and then decompose each of those—similar to how I would attack a complex software development project or database design endeavor. But all the variability and overlap among terms that I encountered around data governance forced me to change not only my approach to writing this post, but the whole focus of the post. 

Instead of working top-down, I had to work bottom-up. Below I listed some subheadings that are parts of data governance, and then I listed all the tasks or responsibilities that relate to all the subheadings. Your mission—if you choose to accept it—is to take a few minutes to decide under which subheading you would place each task. 

So here are the subheadings that I started with:

  • Data Management (aka Database Management)
  • Data Security
  • Data Stewardship
  • Data Quality
  • Master Data Management
  • Regulatory Compliance (GDPR, PCI, HIPAA)

Here is my list of many (but by no means all) of the critical tasks that need to be completed in order to ensure that your data is relevant, available, secure, and optimized (i.e., “governed”). 

Under which subheading would you put each of these tasks if you were to document your data governance activities?

  • Data Encryption
  • Data Masking
  • Data Access Control
  • High Availability
  • Disaster Recovery
  • Data Lifecycle Management
  • Data Version Tracking
  • Data Custody Tracking and Control
  • Data Provenance Tracking
  • Change Tracking and Management
  • Data Access Auditing
  • Data Update Auditing
  • Data Validation
  • Define Business Rules for Data
  • Meta Data Management and managing consistent data definitions
  • Managing Taxonomies and Naming Conventions

Some of the tasks seem to relate to obvious subheading, such as Meta Data Management and Taxonomies and Naming Conventions being grouped under Master Data Management. Or grouping Data Encryption, Data Masking and Data Access Control under Data Security. 

But you could group Data Access Control under Data Stewardship as well, along with many other tasks. In fact, Data Stewardship is used somewhat interchangeably with Data Governance… sometimes. And which tasks fit under Compliance? Maybe all of them? 

My personal takeaway from all this is that it may be better to look at this particular issue from the bottom up of instead of the top down. When wrapping our minds around data governance, we might want to look at all the relevant lower-level tasks (lower in this hierarchy, not in importance), and think about what is involved in each and what tools can help us implement them.

Don’t get too caught up with definition of terms or with categorizing tasks into subgrouping, as I did for the purposes of discussion. At least when it came to writing this blog post, I found that to be the most intimidating part.

Are you looking for strategic, tactical and/or operational support around a data governance program or related initiative? Contact Buda Consultingand let’s talk about where you are, where you need to be and how we can help.

SQL Server Contained Databases and Oracle Multi-tenant: Advantages Compared

SQL Server Contained Databases and Oracle Multi-tenant: Advantages Compared

Oracle introduced multi-tenant databases in version 12c. SQL Server answered with container databases in version 2012.

These capabilities add efficiencies and reduces deficiencies, but in different ways. Pre-multitenant Oracle required more administration activities for individual databases than SQL Server, but offered more isolation and control for each database. Pre-container SQL Server required less administration activities for individual databases than Oracle, but offered less isolation and control for each database.

Essentially, these new functionalities moved both Oracle and SQL Server to a more satisfying middle ground.

SQL Server Vs. Oracle Multi-tenant

With the new changes:

  • They both allow individual databases to stand alone and be ported from one environment to another more easily.
  • They isolate much of the admin work that needs to be done on the Oracle Software level (patching, configuration, etc.) from the work that is more database-specific (backup/recovery, performance tuning, etc.).
  • They isolate all of the metadata about a particular database within the database itself, as opposed to intermingling the metadata from multiple databases.

Oracle Multi-tenant Advantages

Each approach also touts some specific advantages. For example, Oracle Multi-tenant is said to be “designed for the cloud” because it simplifies database consolidation. A multi-tenant container database can hold multiple pluggable databases, and application code connects to specific pluggable databases. This makes it easier to consolidate multiple databases onto private clouds without changing their applications, yet still control resource prioritization as you had previously.

Another advantage of the Oracle Multi-tenant approach is easier provisioning and cloning. Moving and cloning databases is an everyday admin task that multitenancy can help accelerate. For example, you can quickly copy a database and “plug it in” to a different container. Filesystems that support copy on write allow you to clone pluggable databases almost instantly.

SQL Server Advantages

SQL Server’s contained database feature offers advantages in the area of user authentication. A contained database is isolated from other databases and from the SQL Server instance that hosts it. Support for user authentication at the database level for contained databases reduces a database’s dependency on the logins of the SQL Server instance. You can also have separate firewall rules for contained database users versus users who authenticate at the server level. This user model makes it much easier to move databases between systems, including Microsoft Azure.

In short, the main goal of multitenancy is to facilitate database consolidation, while the main purpose of contained databases is to enable isolation and thus portability. Yet each approach addresses both these concerns.

If you are considering using either or both of these models, or would benefit from guidance on how to use either or both more efficiently or effectively, contact Buda Consulting to schedule a free consultation.

How to Resolve Oracle Database Licensing Issues with Processor Cores in Virtualized Environments

How to Resolve Oracle Database Licensing Issues with Processor Cores in Virtualized Environments

Virtualization technology revolutionized the RDBMS world by allowing database administrators and system administrations to utilize 90% of a server’s hardware resources, versus on average between 10%-30% utilization in non-virtualized environments. But with the push from a physical to a virtualized environment comes a host of new management tasks. One of those tasks is trying to demystify your licensing. 

Oracle Virtual Server Licensing 

In most circumstances, your virtualized environment would be running inside a virtual hypervisor. Your virtual servers would then be allocated a set of resources from a pool set in the hypervisor manager. One of the most important settings is how many virtual CPUs (vCPUs) your server is allocated.

The reason vCPUs are such an important virtual resource is because this setting directly relates to the physical processor and its cores. Processor cores is the metric many vendors like Oracle and Microsoft use to license some of their software stacks. This post focuses on Oracle virtual server licensing, which can be licensed by processor core, and explains one of the most common mistakes administrators make when setting up an Oracle environment in a virtualized environment.

A key point to understand is how the database software is licensed. Most servers purchased these days are running Intel Xeon processors containing huge core counts of 12 or higher. Let’s say you purchased a server with two Intel Xeon 4-Core processors. Oracle grades most Intel Xeon processors as 0.5 of a core license each. That means that 2 Xeon cores will equal 1 core license. So, for a system with two Xeon 4-core processors, which is 8 processors total, you will need to purchase an Oracle Database core license count of 4.

But what happens when you decide to upgrade to new servers with two Xeon 12-core processors for less than the original cost of the two Xeon 4-core processors? Oracle requires that all processor cores contained in a server are licensed for their database product when using core licensing. If you were running your database environment in a physical format, you would need to purchase 8 more core licenses, causing your operating cost to increase. 

The only way to continue with your 4 core licenses would be to virtualize your database environment—and the only way to comply with Oracle in an on-premise setting outside of an Oracle engineered system is to run Oracle VM.

Oracle VM is Oracle’s own virtualization hypervisor, and it is the only hypervisor where they honor the CPU pinning option, also called processor affinity. This option allows you to permanently allocate a physical processor’s core to a vCPU on a virtual server. Once a core is “pinned” to a vCPU on a virtual server, no other virtual server will use that processing core.

This last point is very important, because most administrators who are familiar with creating virtual servers know that you will be asked the following question to determine if you are compliant with your Oracle virtual server licensing:

  • How many vCPUs would you like to allocate to the virtual server?

vCPU Allocation

In most environments, 1 vCPU is equal to 0.5 of a physical core. Do you notice a trend yet? If we wanted to create a virtual server that complied with an Oracle Database licensing core count of 4, the virtual server would need 8 vCPUs. 

But what happens if we create the virtual machine with the correct amount of cores and never enable the CPU pinning option? According to Oracle documentation, this means you are not in compliance with your core licensing; and, in order to be in compliance, you will need to guarantee that this virtual machine only has access to the amount of cores specified in your core licensing.

To correct this problem, you will need to go into Oracle VM’s VM Tool software and assign physical CPU cores 0 to 7 (core numbering starts at 0) in order to lock the virtual server to only be able to use 8 cores. Keep in mind that this virtual server will be the only server allowed to use cores 0 to 7, and any other virtual servers hosted by the hypervisor will use physical cores 8 to 23 (remember you upgraded your servers to two 12-core Xeon processors).

After pinning the licensed amount of cores to the virtual server hosting your database environment, you are now in compliance with Oracle licensing and will avoid any costly penalties resulting from an Oracle software audit where you are found to be out of compliance. 

So when building out your virtual environments, make sure you know the requirements of your software’s licensing. Forgetting to pin CPUs to your virtual servers could end up costing your company significant money.

If you like this article, please share it with your colleagues and subscribe to our blog to get the latest updates.