Database Architecture | Buda Consulting

Is a Cloud Data Warehouse Right for My Company?

by Robert Buda | May 29, 2020 | Data Warehouse, Database Architecture

You’ve probably heard the catchphrase, “Data is the new oil.” You need it to run your business, and its potential value is huge… But first you have to drill for it.

Companies that can collect, process and analyze their data resources efficiently and effectively will gain competitive advantage. A robust data warehouse is often critical to success.

Data warehouses have traditionally been on-premises systems. But a cloud-based or hybrid alternative may now be a better choice for many organizations. This post compares on-premises versus cloud data warehouse architectures and explains the pros and cons of each.

What is a data warehouse?

A data warehouse is a central repository that stores and processes both historical and new data from multiple databases and applications (including potentially SaaS platforms) to improve the speed and scope of problem-solving and decision-making. Data scientists and other roles direct queries, reporting and analytics requests to the data warehouse via business intelligence (BI) and analytics tools and dashboards. Data warehouses support business decisions in many areas, from how to price products to where to cut costs to whether to launch a new service.

Columnar versus row-based data storage

Why not just query your relational databases (RDBMS) directly? One reason is that BI/analytics requests could significantly impact performance of business-critical RDBMS and associated “systems of record.”

Also, the row-based structure of RDBMS, which is ideal for transactional usage, isn’t as well suited to analytical processing. This is why many data warehouses, especially cloud-based offerings, store data in a columnar format.

For example, aggregation queries (like “What is the average age of customers who bought widgets?”) can be much faster in a columnar database because the data elements involved are stored closer together on the storage media, making read times (latency) lower. But if you need to query multiple fields per record (like “What is the name, email address and date of birth of each person who bought a widget in 2018?”) the advantages of columnar databases diminish.

Similarly, write times can be much slower in columnar databases. This is less problematic in data warehouses versus transactional applications because new data is generally added less frequently, via so-called “data dumps.”

Data warehouse structures

The more data sources you need to access and the more kinds of questions you want to answer, the more complex your data warehouse structure might need to become.

A basic data warehouse provides summary data derived from similar types of operational systems, like RDBMS. As you add more data sources, you may need to setup a “staging area” to convert diverse data into a more homogeneous structure that is easier to query. Some solutions then add “data marts,” which tailor data from individual lines of business to support the specific analysis needs of different user roles.

ELT or ETL?

Most data warehouses pull data from different line of business sources, such as marketing, sales, accounting, etc. and/or from the organization overall. All this data can either be aggregated within a central repository and then transformed into a suitable format (an Extract-Load-Transform (ELT) approach); or initially pulled into a temporary database to be transformed before being moved to a central repository (an Extract-Transform-Load (ETL) process).

The ETL approach is typically the choice for on-premises data warehouses, as hardware limitations dictate processing the data off-platform. ETL can also help ensure privacy and compliance by cleansing the data before loading. The massive scalability of cloud data warehouse platforms makes them amenable to ELT, which can enable users to transform raw data according to their specific needs after it’s loaded.

On-premises data warehouse architectures

An on-premises data warehouse requires dedicated storage, servers and administrative resources. As such, it can be slow and costly to scale up as data volumes expand.

Traditional data warehouses have a three-tier architecture:

1. The bottom tier consists of a database server that extracts data from multiple sources; e.g., RDBMS and SaaS
2. The middle tier consists of an online analytical processing (OLAP) server that transforms the data into a more analytics-friendly structure
3. The upper tier is the client-facing applications, which provide tools for data analysis, reporting and data mining

Cloud data warehouse architectures

A cloud data warehouse, also called Data Warehouse-as-a-Service (DWaaS), is a SaaS offering. You pay as you go for storage and compute power, and your data is stored, queried and analyzed over the internet.

Since the service provider manages the physical hardware and IT resources, cost-effective scalability and much lower setup costs are among the key advantages of cloud data warehouses. Gartner estimates that 75% of all databases will be deployed on or migrated to the cloud by 2022, with cost and flexibility being perhaps the primary drivers for this rapid shift.

Some generic architecture components common to many cloud data warehouse architectures include:

- Clusters, which are large groupings of shared computing resources called nodes
- Nodes are discrete computing resources with dedicated CPU, RAM and storage. Queries are executed across multiple nodes in a cluster, depending on what data is stored on them.
- Slices. Each node has multiple partitions, called slices, which operate in parallel to accelerate query execution.

These elements together enable massively parallel processing, another big performance advantage of cloud data warehouses.

Cloud data warehouse architectures can vary significantly. Amazon Redshift uses a model similar to the above, and requires users to provision resources more like an on-premises data warehouse. Google BigQuery, in contrast, is serverless; Google manages system resources dynamically with no user input or visibility.

Pros and cons of on-premises and cloud data warehouses

Where to host your data warehouse? That depends on a balance of cost, scalability/growth, query efficiency and other factors that are unique to your business. Here are the top issues to consider:

Cost

Because you pay only for the storage and CPU time you actually use, plus there are no physical machines to buy and no maintenance, admin or upgrade costs, cloud data warehouses are much less expensive—especially upfront. According to Forrester Research, most companies save between 20% and 80% on cost with a cloud data warehouse solution.

Security and compliance

Security and compliance concerns are the top roadblocks to moving data warehouses to the cloud. Leading DWaaS providers offer robust security controls that are probably better than many businesses could deliver on their own. However, in virtualized cloud environments you may not know where your data resides or where and when it is moved. You also may not know when a vendor has trusted your data to a “fourth party” whose security posture might be suspect. Concerns like these can make it hard to prove compliance.

Some of the security topics to explore with DWaaS providers include:

- Data encryption at rest and in transit
- User identity management and authentication
- Security and privacy policies for replication and archiving of data
- Compliance with regulations like HIPAA and PCI DSS, and with cybersecurity frameworks like SOC 2 and ISO 27001

Scalability

To scale an on-premises data warehouse solution, you need to buy new hardware and/or software (often more than you need right away). In the cloud, you get the capacity you need on-demand to support growth and across both peak and off-peak times.

Reliability and availability

With on-premises systems, reliability and availability depend on your IT resources. In cloud environments, you can hold the service provider to SLAs, which are typically pretty stringent (e.g., 99.9% uptime). DWaaS providers may also be able to offer high availability or improved resilience by replicating data across different geographical regions.

Data loading

Third-party tools can make data loading faster and easier. However, it is generally more of a challenge with cloud platforms because data has to travel over external networks. Further, cloud data loading tools are often vendor-specific, which can make moving to a new DWaaS a technological and contractual struggle.

Time to market

Thanks to faster setup and rollout times, DWaaS typically offers faster time to market (and faster time to value) than an on-premises data warehouse you must stand up from scratch.

Data latency

The volume, type(s) and location of data all influence latency, which is the time required to store or retrieve data. Because of their distributed architectures, cloud data warehouses generally have more latency issues than onsite data warehouses. The fresher you need your data to be (one minute versus one hour, for instance), the bigger problem latency can present. Especially if you need near real-time data analysis, it’s important to test and measure latency before choosing a cloud provider.

Integrations

The leading cloud service providers increasingly offer pre-built integrations with growing ecosystems of cloud services that could help you load or process your data (assuming no compliance problems); e.g., identity and access management and BI/big data analytics tools. In an on-premises scenario, comparable integrations could be largely on you.

Choosing the right data warehouse solution

Before you implement a data warehouse, carefully consider your unique technical, business and user requirements and research likely options in detail. Given the high levels of time, effort and cost involved you don’t want any surprises.

If you don’t have database architecture and design expertise in-house, a trusted advisor can help ensure you ask all the right questions and make all the right moves.

For a free consultation to discuss your needs and questions in-depth, contact Buda Consulting.

Schemas, Databases and Instances—Defined and Discussed

by Robert Buda | May 12, 2020 | Database Architecture, Oracle, SQL Server

What is a schema as opposed to a database as opposed to an instance? And how do schemas differ between Oracle and MSSQL? Or between Oracle and PostgreSQL? Or between PostgreSQL and MongoDB?

These terms can be confusing, but they are very important when planning a database architecture. So let’s define these terms and discuss conceptually how they are similar and how they differ between database software implementations. This post will focus mostly on schemas, with some references to the other terms for context.

What is a database ?

A database is the collection of database files that contain the data being stored. These files hold both the user data and the metadata (data dictionary) that the database needs to make sense of the user data. The metadata includes the schema definitions (where applicable) as described below.

What is a database instance?

A database instance is the collection of all of the database software processes plus any memory structures required by those processes, plus the database files where the database data is stored. (See diagram)

The different software vendors treat the relationship between databases and instances in different ways.

Oracle supports one database per instance unless you are working with 12c and above and using Oracle Multitenant.

PostgreSQL supports multiple databases per instance. Some system catalogs are shared across all databases in an instance.

MSSQL supports multiple databases per instance. Each instance has a set of system databases that are shared across all databases served by that instance.

MongoDB supports multiple databases per instance.

What is a schema?

The concept of a schema can be a little confusing because there are three different relevant uses of the word “schema” in the context of an IT project.

1. Mirriam Webster defines a schema as “a structured framework or plan, an outline.”
2. In the realm of database technology, a schema means a structural definition of the data that you are storing. This essentially defines the datatypes of the data you are storing, and the organization of that data (into tables, documents, indexes and constraints, etc). This can be expressed in the form of a diagram such as an entity relationship diagram (ERD), or in a set of data dictionary language (DDL) statements, or in a JSON object.
3. Some database vendors have extended the concept of a schema to include not just a definition of the structure of a set of data, but also a particular collection of objects that contain the data (tables, etc), and even the data itself. This is sometimes a named collection and is typically based on one of these factors:
  - Who owns the objects (a database user)
  - Who should have access to the objects (e.g., a database role that may be assigned to users)
  - What the objects are used for (e.g., all objects for a given application or function within an application)

When implemented in this fashion, a schema can also be thought of as a namespace. An object can have the same name in two different schemas and the two objects will be distinct from each other.

It is interesting to note that MongoDB, which is a document database as opposed to a relational database, is sometimes called a schema-less database. MongoDB also has the concept of a schema, but it is purely a description of the structure of the data, more like definition 2 above than 3. A MongoDB schema does not represent the actual instance of the data, as it does with the relational databases mentioned.

To summarize, within the context of database management software, a schema is either a set of objects that contain data that is related in some logical way (user, access, application), or simply a definition of the structure of data.

Examples

Here are some example of schemas (see the diagram below):

1. Schema JSMITH: A schema that contains all of the tables that belong to user Jsmith. This schema would typically simply be named the same as the user, and is often created automatically when that database user account is created. When the user connects to the database, this will typically be his default schema. So any objects that he creates will automatically be part of that schema. When he issues a query, unless he specifies a schema name as part of the name of the object he is querying, or changes his schema search path (this is done differently by each database vendor), the result set will come from the object by that name that exists in his default schema.
2. Schema PAYROLL: A schema that contains all of the tables for the payroll application. This schema would typically be named for the application or a functional area within the application. When accessing data from the Payroll schema, users will need to either set their schema search path to the Payroll schema, or prefix all object names in the query with PAYROLL.
3. Schema DBO: This built-in schema in MSSQL is the default for all users unless otherwise specified. In many SQL Server databases, almost all objects end up here. This is similar to the public schema in PosgtreSQL.

How does a schema differ between database vendors?

MSSQL and PostgreSQL have an actual object in the database called a schema. You can create and drop a schema, and you can assign access rights and ownership to a schema as a whole. In these environments, there is a loose connection between a schema and a database user. A schema may be owned by a database user. But a database user does not have to own any schemas. A schema may also be owned by a role instead of an individual user. If you want to drop a user that owns a schema, they can and must first transfer ownership of the schema to another user.

Oracle has the concept of a schema but it does not really have an object in the database called a schema. It is more conceptual. In Oracle, each database user may be an owner of objects, and the collection of objects owned by a given user is considered a schema. If a database user is dropped, all objects owned by that user (in that user’s schema) must be dropped first. A schema in Oracle does not exist independently of a database user. There is a command in Oracle called Create Schema that is essentially a wrapper that lets you create a database user and a set of objects to be owned by that user all at once. The Drop Schema command is really the same thing as Drop User. One cannot transfer ownership of an object from one user to another. The new user would need to recreate the object. (A CTAS query may be helpful here).

MongoDB, as mentioned earlier, uses the concept of a schema in database design and in the validation of the structure of incoming data. (Nice blog on this here.) But there is no object in the database known as a schema.

I hope this post helps pull together the concept of a schema and the way the different vendors have implemented schemas.

Please comment with any questions or examples that you think might be helpful, including for database vendors that are not listed here. Also, if you disagree with the way I defined schema, please let me know how you see it.

To talk over any questions you may have around schemas or database architecture in general, contact Buda Consulting.

Does Data Governance Make You Shudder?

by Robert Buda | Jun 24, 2019 | Database, Database Architecture, Oracle DBA

At a recent vendor conference, I found myself talking with a varied group of technology professionals. Two were technology generalists, one was a data engineer, one was responsible for transportation technology at a major university (think autonomous vehicles, traffic sensors, etc.), another was responsible for university student and teacher data (lesson plans, research findings, etc.), and one was responsible for his organization’s IT security.

During the conversation, someone mentioned data governance. Immediately there was a conspicuous and collective sigh around the table.
Our group clearly found the subject intimidating and uncomfortable.

Why does the mere mention of data governance invoke that kind of response?

One reason is probably that the potential scope of a data governance effort is so wide. It could basically involve every possible task associated with data management.

Further, the word “governance” emphasizes the importance of taking those tasks seriously, and getting them right. So when you combine “there’s a lot to do” with “and it’s all important,” fear kindles in the hearts of those responsible.

And rightly so: the consequences of poor data governance are significant. They range from regulatory fines and sanctions for failing to adequately protect data or for noncompliance, to the insidious costs of bad data quality, such as missed business opportunities due to poor decision-making or lost customers due to low service levels.

But there are a lot of “big and important” topics in IT, and they don’t all make a diverse group of seasoned professionals wince. I decided to do some research and dig a little deeper into why data governance seems to be outside our collective comfort zone.

One thing that came up right away is that data governance is defined and described in diverse ways. Moreover, the terms used to describe the activities or responsibilities that comprise data governance aren’t defined or used the same way by everyone. Anytime I tried to define a term, I’d find another term that meant the same thing… sometimes, depending on context. In other words, the definitions tend to morph according to where one looks at them from (our viewpoint).

That variability and inconsistency made just framing this blog post difficult—never mind a program that “…includes the people, processes and technologies needed to manage and protect the company’s data assets…” and impacts an organization at strategic, tactical and operational levels.

Indeed, there’s an axiom in management theory that “You can’t manage what you can’t name.” Further, “You can’t properly manage what you don’t define explicitly.” In other words, how you define a data governance program will significantly impact your ability to manage it successfully.

Given that a key element of data governance is ensuring the consistency of data definitions across an organization, I find it ironic that we don’t have consistent, agreed definition of terms for the components of data governance itself.

Normally when I write about a complex topic, I break it down into a list of subtopics and then decompose each of those—similar to how I would attack a complex software development project or database design endeavor. But all the variability and overlap among terms that I encountered around data governance forced me to change not only my approach to writing this post, but the whole focus of the post.

Instead of working top-down, I had to work bottom-up. Below I listed some subheadings that are parts of data governance, and then I listed all the tasks or responsibilities that relate to all the subheadings. Your mission—if you choose to accept it—is to take a few minutes to decide under which subheading you would place each task.

So here are the subheadings that I started with:

Data Management (aka Database Management)
Data Security
Data Stewardship
Data Quality
Master Data Management
Regulatory Compliance (GDPR, PCI, HIPAA)

Here is my list of many (but by no means all) of the critical tasks that need to be completed in order to ensure that your data is relevant, available, secure, and optimized (i.e., “governed”).

Under which subheading would you put each of these tasks if you were to document your data governance activities?

Data Encryption
Data Masking
Data Access Control
High Availability
Disaster Recovery
Data Lifecycle Management
Data Version Tracking
Data Custody Tracking and Control
Data Provenance Tracking
Change Tracking and Management
Data Access Auditing
Data Update Auditing
Data Validation
Define Business Rules for Data
Meta Data Management and managing consistent data definitions
Managing Taxonomies and Naming Conventions

Some of the tasks seem to relate to obvious subheading, such as Meta Data Management and Taxonomies and Naming Conventions being grouped under Master Data Management. Or grouping Data Encryption, Data Masking and Data Access Control under Data Security.

But you could group Data Access Control under Data Stewardship as well, along with many other tasks. In fact, Data Stewardship is used somewhat interchangeably with Data Governance… sometimes. And which tasks fit under Compliance? Maybe all of them?

My personal takeaway from all this is that it may be better to look at this particular issue from the bottom up of instead of the top down. When wrapping our minds around data governance, we might want to look at all the relevant lower-level tasks (lower in this hierarchy, not in importance), and think about what is involved in each and what tools can help us implement them.

Don’t get too caught up with definition of terms or with categorizing tasks into subgrouping, as I did for the purposes of discussion. At least when it came to writing this blog post, I found that to be the most intimidating part.

Are you looking for strategic, tactical and/or operational support around a data governance program or related initiative? Contact Buda Consultingand let’s talk about where you are, where you need to be and how we can help.

Architect Your Oracle Database for Efficient Backup and Recovery

by Robert Buda | Aug 3, 2017 | Backup and Recovery, Best Practices, Database Architecture, Oracle

Architecting for Backup and Recovery Efficiency with Oracle RMAN

Database architecture is critical to achieving many business objectives. These include application performance, business continuity, security, and geographic distribution. This article will explore another often overlooked objective that can be influenced by the database architecture: backup optimization and recovery efficiency.

Very large databases are common in today’s business environment. Multi-terabyte databases are prevalent in all but the smallest organizations. Despite these large sizes, we tend to find that data that pertains to earlier time periods tends not to change, and tends to be used less frequently than more current data. When architecting the database, the age of the data, and therefore the possibility of data changing can be used as a factor in the physical database design that can optimize backup and recovery efficiency and performance.

RMAN Backup Optimization

It is a given that all data must be backed up. But taking that backup puts load on the database that impacts application performance during the time that the backup is running. Common approaches to mitigating this impact include backing up from a standby database rather than the production database, taking offline database backups when the application is not available, and restricting the backup time to non-peak times so the machine resource usage is minimized.

However, in some environments, those options are not available. The first option, backup up from a standby database, may not be an option of you don’t have a high availability environment. Bringing the database down is not an option in a 24×7 production environment. And there are many databases that are so large that the time it takes to back up the entire database is simply too long and exceeds the non-peak times for the application.

Partitioning and Archiving

Another technique that may be used is to build partitioning and archiving into the database architecture. We can partition the data into physically separate tablespaces, and place each partition into a separate tablespace. This allows us to isolate data from past time periods that are kept for archiving purposes but are not frequently queried and are never going to be updated. These separate tablespaces can be backed up when the data reaches the point that it will not be changed again, and then it can be excluded from the normal backup routine. In many databases, older data represents a very large portion of the overall database, so such a scheme can significantly reduce the backup time, thereby significantly reducing the impact on the application.

There are a number of ways to exclude tablespaces from the backup after they have reached the point where they will not be updated again, including:

Making the tablespaces readonly, and configuring Backup Optimization in Oracle RMAN. After this, RMAN will backup the the tablespace enough times to satisfy the retention policy and then will exclude them on subsequent backups.
Using the RMAN command CONFIGURE EXCLUDE FOR TABLESPACE command. Once configured, the specified tablespace will be excluded from future backups. These tablespaces can be manually included explicitly in other backup sets to ensure that the data is backed up but they can be excluded from full backups.

Here is an example of how we might use this: lets say that we have an Oracle Database Application that collects traffic sensor data. Each day we collect a large set of data from traffic sensors from municipalities around the country. We have very large tables that contain hundreds of datapoints per sensor. Each table contains hundreds of gigabytes of data stretching back 10 years. The tables are partitioned so that a new partition is created for each month, and as the data is collected, it is automatically placed into the proper partition. At the beginning of each year, we can take a single backup of all the tablespaces that hold the data from the prior year. We know that data will never change, so we do not have to include those tablespaces in future backups. We can set these tablespaces as readonly, and then with backup optimization turned on, RMAN will then exclude them from subsequent backups, but will still enforce the backup retention policy so you wont lose backup sets that are necessary to restore those tablespaces. An added benefit is that the backup set each week will be significantly smaller thereby reducing disk requirements for the ongoing backup sets.

Restore Efficiency

In addition to significantly reduced backup time, partitioning the data in this way also improves the efficiency of the restore process because if one partition fails, the others do not need to be restored. This can result in significant time savings during a restore.

Other Benefits

There are other benefits to partitioning your data beyond the improvements to the backup and restore process. By separating older data which typically does not change, and is accessed less frequently, from the newer data, we have the ability to place the older data on less costly media. And regardless of the media type, there are performance benefits to separating data onto different drives/controllers (particularly useful when using separate storage arrays as opposed to SAN environments).

Thinking Ahead

When architecting database, think about what the impact of the backup and RMAN database recovery process will look like after 10 years. Architecting the backup and restore efficiency into the database design at that time will save lots of redesign later on.

If you are struggling with cumbersome backup optimization and restore processes or are about to do a database design or redesign, please give us a call at (888) 809-4803 x 700 and if you have further thoughts on the topic, please add comments!

If you enjoyed this article please like and share!

Using Oracle GoldenGate for real time data replication and integration.

by Robert Buda | Sep 23, 2016 | Best Practices, Database, Database Architecture, Oracle, Oracle DBA

Oracle GoldenGate is Oracle’s best-of-breed, strategic solution for real-time data replication and integration. It offers a rapid, robust and low-overhead approach for moving small or large volumes of transactional data between diverse operational and analytical systems—including not just various Oracle Database versions but also non-Oracle databases like SQL Server, Sybase, IBM DB2 and others.

Though extremely powerful and flexible, Oracle GoldenGate is straightforward to install and configure. Its transformation, filtering and custom processing features make it suitable for a wide range of use cases, including:

Data warehousing and analytics/decision support
High availability and disaster recovery (HA/DR) scenarios
Database loading and migration

One of our DBA consultants has been working with a complex, mission-critical, enterprise-wide GoldenGate implementation that illustrates how much you can do with this offering. It is being used to enable a “Know Your Customer” application, where data is replicated in real-time between production and DR (“hot-hot”) sites. The client opted to use GoldenGate over Oracle’s physical standby capability because they needed read-write access at both sites.

GoldenGate replicates data among four main regions and three satellite regions worldwide in a complex, multi-master source/target architecture. Data can flow in real-time from any site to any site. Intercontinental and regional data restrictions mean that sensitive data must be masked before sending it out of some regions. This requirement ruled out most competing solutions, which provide only storage-level replication with little flexibility for transforming or filtering the data in between sites.

This implementation also leverages GoldenGate’s DDL replication support, which further supports high availability. Anytime a table is created or modified, the DDL (and hence the modified data) is replicated across the topology. For even greater data reliability, the implementation has been extended via a “self-healing” agent that identifies any data inconsistencies and attempts to fix them automatically, with no human intervention.

As you can see, GoldenGate has a very flexible architecture. It can handle many data extraction/transformation or conflict resolution scenarios out-of-the-box or with minimal customization using built-in functions. It plays very well with Oracle RDBMS, and offers endless possibilities for writing custom PL/SQL functions or native C language routines to get the functionality you need.

Click Here to find out more about Oracle GoldenGate use cases and benefits, and how to get started.

Architecting to Maximize Recovery Options in Oracle

by Robert Buda | Apr 21, 2016 | Backup and Recovery, Best Practices, Database Architecture, Oracle DBA

I recently received a frantic call from a client that believed they have been hacked and had to quickly recover data. They said that data belonging to two or more of their customers had been lost.

Our customer functions essentially as an application service provider (ASP). Their customers’ data is in an Oracle database that our client manages. Our client organizes this database such that each of its customers’ applications is served by a separate schema and they all share one database.

We have advised this client on numerous occasions to separate each of their customers’ data into separate databases, or at least separate tablespaces. This is a good idea for several reasons, one of which is recoverability. Unfortunately, they resisted our suggestions and today are probably regretting that decision.

Oracle Recovery Manager (RMAN) offers a few different options for recovery. You can recover an entire database, an individual tablespace or an individual data file. But you cannot recover an individual schema (table owner) and its objects unless they are in their own tablespace.

In the case of our client, it seems that the tables were lost at some time on a Sunday night, just prior to the nightly logical backup (export). The last good logical backup was from Saturday night.

The database is in ARCHIVELOG mode, meaning that RMAN could restore the database to any point in time, including right up to the point of the data loss. However, since the schemas (each of which serves a different customer) all share the same set of tablespaces, this type of recovery would wipe out any data since that point in time—even for the schemas (customers) that were not impacted by the loss.

Because our client’s customers that were not impacted had activity since the data loss event, we had one less tool in our recovery arsenal. If our clients’ customer data had been separated into separate tablespaces or databases, we could have recovered data for their customers that suffered loss without impacting the others at all.

We are now in the process of recovering the lost data from the Saturday logical backups. When that is complete, we will be doing a full RMAN restore to another location, where we will attempt to recover any lost data since the logical backup was taken. This will be a very arduous and time-consuming process.

The moral of the story is to consider recoverability when architecting your Oracle database. If you have users or applications that stand alone and may need to be recovered without impacting others, separate them at least by tablespace; and, if possible, use a separate database. The more you separate, the greater your recoverability options are.

It’s worth noting that the pluggable database option in Oracle 12c might assist in recoverability even further, if the reason for not separating your schemas into separate databases was ease of maintenance or resource concerns. With 12c you can create a pluggable database for each of your logically separate applications or users, while keeping the administration centralized and the overhead minimized. Check it out.

If you have had similar difficulties with restoring data due to the database architecture, please share your experiences. To talk over your database architecture considerations, contact Buda Consulting.

« Older Entries