Schemas, Databases and Instances—Defined and Discussed

What is a schema as opposed to a database as opposed to an instance? And how do schemas differ between Oracle and MSSQL? Or between Oracle and PostgreSQL? Or between PostgreSQL and MongoDB?

These terms can be confusing, but they are very important when planning a database architecture. So let’s define these terms and discuss conceptually how they are similar and how they differ between database software implementations. This post will focus mostly on schemas, with some references to the other terms for context. 


What is a database ?

A database is the collection of database files that contain the data being stored. These files hold both the user data and the metadata (data dictionary) that the database needs to make sense of the user data.  The metadata includes the schema definitions (where applicable) as described below.


What is a database instance?

A database instance is the collection of all of the database software processes plus any memory structures required by those processes, plus the database files where the database data is stored. (See diagram)

The different software vendors treat the relationship between databases and instances in different ways.

Oracle supports one database per instance unless you are working with 12c and above and using Oracle Multitenant.

PostgreSQL supports multiple databases per instance.  Some system catalogs are shared across all databases in an instance.

MSSQL supports multiple databases per instance. Each instance has a set of system databases that are shared across all databases served by that instance. 

MongoDB supports multiple databases per instance.


What is a schema?  

The concept of a schema can be a little confusing because there are three different relevant uses of the word “schema” in the context of an IT project. 

    1. Mirriam Webster defines a schema as “a structured framework or plan, an outline.”  
    2. In the realm of database technology, a schema means a structural definition of the data that you are storing. This essentially defines the datatypes of the data you are storing, and the organization of that data (into tables, documents, indexes and constraints, etc). This can be expressed in the form of a diagram such as an entity relationship diagram (ERD), or in a set of data dictionary language (DDL) statements, or in a JSON object. 
    3. Some database vendors have extended the concept of a schema to include not just a definition of the structure of a set of data, but also a particular collection of objects that contain the data (tables, etc), and even the data itself. This is sometimes a named collection and is typically based on one of these factors: 
      • Who owns the objects (a database user)
      • Who should have access to the objects (e.g., a database role that may be assigned to users)
      • What the objects are used for (e.g., all objects for a given application or function within an application)   

When implemented in this fashion, a schema can also be thought of as a namespace. An object can have the same name in two different schemas and the two objects will be distinct from each other. 

It is interesting to note that MongoDB, which is a document database as opposed to a relational database, is sometimes called a schema-less database. MongoDB also has the concept of a schema, but it is purely a description of the structure of the data, more like definition 2 above than 3. A MongoDB schema does not represent the actual instance of the data, as it does with the relational databases mentioned. 

To summarize, within the context of database management software, a schema is either a set of objects that contain data that is related in some logical way (user, access, application), or simply a definition of the structure of data. 

Examples

Here are some example of schemas (see the diagram below):

    1. Schema JSMITH:  A schema that contains all of the tables that belong to user Jsmith. This schema would typically simply be named the same as the user, and is often created automatically when that database user account is created. When the user connects to the database, this will typically be his default schema. So any objects that he creates will automatically be part of that schema. When he issues a query, unless he specifies a schema name as part of the name of the object he is querying, or changes his schema search path (this is done differently by each database vendor),  the result set will come from the object by that name that exists in his default schema. 
    2. Schema PAYROLL: A schema that contains all of the tables for the payroll application. This schema would typically be named for the application or a functional area within the application. When accessing data from the Payroll schema, users will need to either set their schema search path to the Payroll schema, or prefix all object names in the query with PAYROLL.
    3. Schema DBO: This built-in schema in MSSQL is the default for all users unless otherwise specified. In many SQL Server databases, almost all objects end up here.  This is similar to the public schema in PosgtreSQL.


How does a schema differ between database vendors? 

MSSQL and PostgreSQL have an actual object in the database called a schema. You can create and drop a schema, and you can assign access rights and ownership to a schema as a whole. In these environments, there is a loose connection between a schema and a database user. A schema may be owned by a database user. But a database user does not have to own any schemas. A schema may also be owned by a role instead of an individual user. If you want to drop a user that owns a schema, they can and must first transfer ownership of the schema to another user. 

Oracle has the concept of a schema but it does not really have an object in the database called a schema. It is more conceptual. In Oracle, each database user may be an owner of objects, and the collection of objects owned by a given user is considered a schema. If a database user is dropped, all objects owned by that user (in that user’s schema) must be dropped first. A schema in Oracle does not exist independently of a database user. There is a command in Oracle called Create Schema that is essentially a wrapper that lets you create a database user and a set of objects to be owned by that user all at once.  The Drop Schema command is really the same thing as Drop User. One cannot transfer ownership of an object from one user to another. The new user would need to recreate the object. (A CTAS query may be helpful here).

MongoDB, as mentioned earlier, uses the concept of a schema in database design and in the validation of the structure of incoming data. (Nice blog on this here.) But there is no object in the database known as a schema. 

I hope this post helps pull together the concept of a schema and the way the different vendors have implemented schemas.

Please comment with any questions or examples that you think might be helpful, including for database vendors that are not listed here. Also, if you disagree with the way I defined schema, please let me know how you see it.

To talk over any questions you may have around schemas or database architecture in general, contact Buda Consulting.

Architecting to Maximize Recovery Options in Oracle

I recently received a frantic call from a client that believed they have been hacked and had to quickly recover data. They said that data belonging to two or more of their customers had been lost.

Our customer functions essentially as an application service provider (ASP). Their customers’ data is in an Oracle database that our client manages. Our client organizes this database such that each of its customers’ applications is served by a separate schema and they all share one database.

We have advised this client on numerous occasions to separate each of their customers’ data into separate databases, or at least separate tablespaces. This is a good idea for several reasons, one of which is recoverability. Unfortunately, they resisted our suggestions and today are probably regretting that decision.

Oracle Recovery Manager (RMAN) offers a few different options for recovery. You can recover an entire database, an individual tablespace or an individual data file. But you cannot recover an individual schema (table owner) and its objects unless they are in their own tablespace.

In the case of our client, it seems that the tables were lost at some time on a Sunday night, just prior to the nightly logical backup (export). The last good logical backup was from Saturday night.

The database is in ARCHIVELOG mode, meaning that RMAN could restore the database to any point in time, including right up to the point of the data loss. However, since the schemas (each of which serves a different customer) all share the same set of tablespaces, this type of recovery would wipe out any data since that point in time—even for the schemas (customers) that were not impacted by the loss.

Because our client’s customers that were not impacted had activity since the data loss event, we had one less tool in our recovery arsenal. If our clients’ customer data had been separated into separate tablespaces or databases, we could have recovered data for their customers that suffered loss without impacting the others at all.

We are now in the process of recovering the lost data from the Saturday logical backups. When that is complete, we will be doing a full RMAN restore to another location, where we will attempt to recover any lost data since the logical backup was taken. This will be a very arduous and time-consuming process.

The moral of the story is to consider recoverability when architecting your Oracle database. If you have users or applications that stand alone and may need to be recovered without impacting others, separate them at least by tablespace; and, if possible, use a separate database. The more you separate, the greater your recoverability options are.

It’s worth noting that the pluggable database option in Oracle 12c might assist in recoverability even further, if the reason for not separating your schemas into separate databases was ease of maintenance or resource concerns. With 12c you can create a pluggable database for each of your logically separate applications or users, while keeping the administration centralized and the overhead minimized. Check it out.

If you have had similar difficulties with restoring data due to the database architecture, please share your experiences. To talk over your database architecture considerations, contact Buda Consulting.

3 Oracle DBA Strategic Responsibilities to Outsource for Maximum Business Value

3 Oracle DBA Strategic Responsibilities to Outsource for Maximum Business Value

Many organizations have limited their Oracle DBA outsourcing to tactical/operational functions like monitoring, backup and patching. But the changing technology landscape is transforming the DBA role and requiring greater expertise.

Today’s enterprise data stores are increasingly massive and architecturally complex. They must support e-commerce and other critical applications that are web-based and demand continuous availability, on-demand scalability, robust security, maximum performance and mobile device support. Further, more and more of the data that drives these complex, critical applications reside in the cloud.

To meet these new demands, organizations of all sizes increasingly need strategic Oracle DBA capabilities in addition to tactical DBA roles. But it can be hard to hire and retain expert Oracle DBAs with these skills, especially when IT budgets are constrained and the skills are in short supply.

In this increasingly common scenario, outsourcing expert Oracle DBA skills to a trusted, onshore partner can be a lifesaver. Not only can you potentially save money, but also you can improve the availability, security and business value of your most valuable asset—your data. This can give you an edge in the marketplace by improving decision-making, shortening time-to-market for new applications and delivering better performance and service to customers.

What strategic Oracle DBA services should you consider outsourcing? Start with the capabilities that are the most in demand, the hardest to cultivate in-house and offer the greatest benefit. These three will top the list for many organizations:

One: Data integration

Big data and analytics are all about integrating multiple data sources to streamline access while reducing management and storage complexity. At many companies data is siloed, making it difficult to get a comprehensive view of the business. When competitors are successfully leveraging more accurate and comprehensive intelligence, can your business afford not to take these steps? Hiring an expert Oracle DBA consultant can reduce cost, risk and time-to-value.

Two: Database architecture and design

Virtualization, cloud services and clustering offer new ways to derive greater value from existing infrastructure investments. An expert Oracle DBA can help you design and implement a massively scalable and available database system that meets the dynamic needs of a global business. The ability to understand, articulate and address the requirements of both the business and its customers is key to this strategic role.

Three: Cloud services

Setting up, managing and scaling an Oracle RDBMS in the cloud takes more than basic capacity planning and administration. It requires a database that has been properly architected and performance tuned to deal with cloud’s challenges—especially security and compliance. This is perhaps more obvious in regulated industries where compliance concerns are paramount, but many businesses moving data to the cloud face comparable concerns.

There’s a lot of buzz these days about making your business “data driven.” Highly experienced senior database professionals will be key to moving in that direction, and they’re in increasingly short supply. Contact Buda Consulting to discuss your Oracle DBA strategy and needs.