Roles vs Direct Database Privileges

Roles vs Direct Database Privileges

A colleague asked me today for my opinion on database security and the best way to grant a certain database privileges to a few users in a postgreSQL database.  I will share my thoughts here and I welcome your thoughts as well. These basic database security concepts here apply to any relational database including Oracle, SQL Server, MySQL, or any database that implements roles for security.  They also apply to application security roles where the access control is managed in the application rather than the database, as is often the case. 

My colleague needed to give certain users the ability to kill other processes. He was struggling with deciding how to structure the privilege. In PostgreSQL, the privilege to instruct another process to terminate is granted by virtue of the default role called pg_signal_backend.  He was deciding between granting that role directly to the users in question, or to create a role called something like Manage_Other_Processes that would be granted to the users in question. 

Here is how I think about using roles. 

A role is really a business role

Basically, one should grant a privilege to a role rather than directly to a user when that privilege is to be granted to a group of users, instead of just one, specifically, a group of users that perform the same business function. One benefit of this approach is that this simplifies replication of one user’s privilege to another user, as in the case of one user leaving the company and being replaced by another user.  

A privilege should also be granted to a role when that privilege enables the user to perform a certain function, and when it is likely that other privileges will also be required in order for a user to perform that same function.

These considerations really get to the whole idea of roles in the first place. A role really refers to the role that the individual receiving the privilege plays in the organization. I think it’s original intent was not really to be considered a database construct, but that is how many think of it now, this misalignment is particularly reflected in the naming of the pg_signal_backend role in postgreSQL, more on that later.

Database Privileges, Security Best Practices, Keeping it Organized

A key benefit of using roles is organization. A given user may have many privileges. Update, delete, insert, select, each on tables, views, stored procedures etc. Add in system privs and a typical user has lots of privileges. Managing privileges on that many objects is a challenge. The best way to manage a large number of things is to categorize and label them. This is accomplished with roles.   

For example, I can group together all the privileges on stored procedures, tables, views, and other database objects required to manage client records, and grant them to a role called manage_client_records. And I can group together all of the privileges required to manage employee records, and grant them to a role called manage_employee_records.

Database Security and adding new users

Rather than remembering that I need to grant execute permissions on 2 stored procedures and 10 tables for managing the employee records, and on 3 procedures, and 15 tables for managing customer records, I can simply grant all of those privileges to the appropriate roles once, and grant those roles to the proper users in one simple statement.

Ease of removing or changing user access

Perhaps most importantly, I can revoke all those privileges by simply revoking the roles, enhancing security by reducing the possibility of human error resulting in dangling privileges when someone changes roles in the company. 

Ease of managing application enhancements and changes

If the developers add functionality to the application, resulting in new tables, views, or other database objects that will require access by certain application users, these new privileges can be granted to the appropriate roles, and all users that have that role will receive that privilege. No need to individually grant the privileges to individual users.

Discovery and User Access reporting

When we do database security assessments, we often generate reports that show which users have privilege to access tables, execute stored procedures, and change system configuration.

What management really wants to know, however, is not what table a user can access, they want to know what business functions each user can perform and what data they can read or edit in that capacity. Here is where using roles really shines.

A report showing the set of users that can view or manage client accounts is much more useful to management then a report that shows a set of users that have select or edit privilege on the client table, and the client address table, and the client account table, and the client transaction table, etc.  Management needs to quickly be able to see what capability users have. Roles make it much easier for them to see that.  Imagine a report showing 10 users that have been granted the manage_client_data role, and 15 that have been granted the view_client_data role.  Twenty five lines that tell the complete story. Contrast that to a report with hundreds of lines showing all tables and stored procedures that all users have access to.  Of course a detail report will be useful as well for deep analysis, and that can be generated when using roles as well.

Database Privileges and System Roles

I used application related roles as examples in this article, but the same concepts apply to system roles and application-owner roles like those that my colleague asked about, and that motivated me to write this article.  And this deserves a little more discussion and some readers may disagree with my thoughts on this and I was definitely on the fence about it. Please comment and add your thoughts if you think differently. 

The privilege that he asked about was actually already a role, not a privilege. Pg_signal_backend is a role that enables the user to terminate processes owned by other users (except super-users). While this is already a role, I feel like it is so narrowly defined that it does not satisfy the real intent of role as I discussed it above. I feel like it would not be surprising if other similar privileges (roles) of this nature are likely to be needed by the same user, given that it needs to control other processes. And I would rather see a better defined (and named) role, like Manage_Other_Processes, that includes this role and any others that will end up being necessary. And then that role can be applied to any other users that need this capability.

Similar to my discussion about user access reporting above, a role with a name like Manage_Other_Processes will tell much more during a user access report than one with the name pg_signal_backend.  

To Role or not to Role

So at the end of the day, when designing a security scheme, I try to use roles wherever it is likely that the same business function requires multiple privileges, or where the same privileges are likely to be assigned to multiple users. Please share your thoughts and contact us for more information.

SQL Server Vulnerability Assessment – Keep Your SQL Database Safe With This Microsoft Tool

SQL Server Vulnerability Assessment – Keep Your SQL Database Safe With This Microsoft Tool

By now you all know how hackers are having their way with business all over the world.  I don’t need to give examples to remind you of that. Some are mentioned here and here and I’ve written a number of blogs about the importance of protecting the database here, here and here

So instead of talking about those issues again, let’s dive right in and discuss one of the simplest ways to identify typical vulnerabilities in your SQL Server database.  This is a tool that is already available to you that can significantly minimize your risk.

Microsoft provides a tool called the Vulnerability Assessment tool that will scan your databases for typical vulnerabilities. These include configuration errors,  excessive permissions, and permissions granted to users vs roles, among others. These checks look for violations of best practices for managing a database. Before this tool was released, one had to use a third party vulnerability assessment tool like Trustwave’s Appdetective Pro, or manually run scripts to find such vulnerabilities.  

How This Assessment Tool Compares With Third-Party Tools

I have used third party tools like Trustwave’s and Impreva’s, to identify vulnerabilities in customer systems, and I have used Microsoft’s Vulnerability Assessment (VA) as well. While I have not produced a master list of vulnerability checks that are done in each system in order to do a direct comparison, it feels to me that the VA checks for fewer vulnerabilities. Also, Appdetective Pro adds other features like a discovery tool, a penetration test, and a user rights review (more on that later), but here we will focus mostly on the vulnerability assessment tool.

If you have not taken any steps to secure your database, then using the SQL Server Vulnerability Assessment tool, and taking action based on its recommendations, will probably get you 90% of the way to a secure database. I am not suggesting that you should stop there. 90% is not good enough. But 90% is much better than 0%, which is where you might be if you haven’t run any vulnerability scan at all.

An Overview Of The SQL Server Vulnerability Assessment Tool

I will mention a few highlights here to give a sense of what kinds of things are covered and will provide a link below to a comprehensive guide provided by Microsoft.

The SQL Vulnerability Assessment Tool compares the configuration of your database to Microsoft best practice rules for database management from a security perspective. According to Microsoft’s guide at this point 87 rules are checked, but some only apply to later versions of SQL server. The rules are broken down into six categories.

Authentication and Authorization

These rules ensure that only the right people are able to connect to your database.  These address the confidentiality and integrity principles of the Information Security Triad. Authentication deals with ensuring that the users are who they represent themselves to be, and Authorization deals with what data assets they should have access to. Here are a few important rules that are checked in this category:

  • Password expiration check should be enabled for all SQL logins
  • Database principals should not be mapped to the sa account. 
  • Excessive permissions should not be granted to PUBLIC role on objects or columns

Auditing and Logging

These rules check to ensure that what gets done and seen in the database is traceable and provable. This addresses the non-repudiation principle of information security and enables forensic analysis in the event of a suspected security breach. A few sample rules checked in this category include:

  • Auditing of both successful and failed login attempts should be enabled
  • Auditing should be enabled at the server level
  • There should be at least 1 active audit in the system

Data Protection

Data protection rules are primarily related to encryption. Addressing the confidentiality principle, these rules ensure that data is protected at rest and in transit. Rules such as these are checked:

  • Transparent data encryption should be enabled
  • Database communication using TDS should be protected through TLS
  • Database Encryption Symmetric Keys should use AES algorithm

Installation Updates and Patches

This category would be very helpful but I am not sure the results can be trusted. When running this on a SQL Server 2012 database, it seems that the check for patches was not executed. It did not appear in the result set either as passed or failed. So I do not recommend using this tool to determine whether you are up to date on your patches until this is resolved.

Surface Area Reduction

Rules in this category address all three principles in the information security triad. They focus on protecting the database environment by reducing the threat vectors posed by external interfaces and integrations. Some interesting rules in this category include:

  • CLR should be disabled
  • Unused service broker endpoints should be removed
  • SQL Server instance shouldn’t be advertised by the SQL Server Browser service

Comprehensive List Of Vulnerabilities

Microsoft provides this reference guide that describes all of the vulnerabilities that VA checks. The guide references which version of SQL Server each rule applies to. This guide provides a lot of good information about this tool and about securing your SQL Server database in general. It is not perfect of course. For example, the description of the check related to patches seems to have some cut and paste remnants, but there is good information there. 

How Buda Employs A Vulnerability Scanner To Protect Our Customers’ Data Assets

When we perform a database security assessment for one of our customers using this tool or one of the other vulnerability scanners, we start of course by running the tool. We then examine the result set and determine the actual risk posed by each of the reported vulnerabilities in the context of the specific database, application, and customer. Often, some of the reported vulnerabilities are mitigated by processes that the organization has in place, or by the nature of the application or the data. After filtering out those that do not represent a real threat, we create a report for management that shows the action items that need to be taken, which may include further analysis. 

For example, some of the rules may fail because no baseline has been created for which users should have access to a given role. Addressing this will involve a study of what roles should be active in the system and who should be granted access to them. This can result in creation of baselines for use in future scans. 

Trustwave’s App Detective Pro that I mentioned earlier provides a user rights review report that may be useful for creating those baselines. 

Application Authorization Schemes

The Authorization and logging related checks that these scanners perform (and the Trustwave User Rights Review) are with respect to actual database users. Many applications, however, use application based authorization. These vulnerability scanners will not be able to provide insight about user authentication or logging in those cases.  

In these cases, we create a user rights review report that identifies what data assets a given application user can access, and we ensure that application logging is robust enough to provide the necessary level of granularity to support the security objectives. 

Where to find it

The Microsoft Vulnerability Assessment tool is available in SSMS v 17.xxx and above. So you may have to upgrade your SSMS (free) in order to get this tool. But the good news is that it works with all currently supported SQL Server versions.  

In Summary

Running the Microsoft Vulnerability Scanner can be an important part of a robust security plan for your SQL Server Databases. Running this scanner is an excellent first step to identify many vulnerabilities, some of which can be easily remediated. 

It is important that an experienced SQL Server database expert implement the recommendations and that additional analysis be done beyond the results produced by the tool. Additionally, when using applications that use application level authentication, deeper study must be done to ensure the security of the data in those applications.

If you like this article, please share it with your colleagues and subscribe to our blog to get the latest updates. If you have any questions, don’t hesitate to contact us as well! 

The Elements Of A Good Disaster Recovery Plan

The Elements Of A Good Disaster Recovery Plan

What is a Disaster Recovery Plan

The elements of a disaster recovery plan (also called a Business Continuity Plan) is a document that describes how an organization will survive a disaster. These can be natural disasters like hurricanes, fires, terrorist attacks, or any event that may prevent the business from operating. A good disaster recovery plan includes everything from how to replace lost personnel to how to relocate everything in the event that an entire building is lost. The plan includes human loss, product loss, customer loss, technology hardware loss and data loss.  This article will discuss only hardware loss and data loss, but it is important to think of the disaster recovery plan in the larger context of an overall disaster recovery plan. 

Why is a good disaster recovery plan important

When disaster strikes, time is critical, resources are stretched, and capabilities are limited.  Planning ahead for a disaster ensures minimum disruption by identifying everything that might be needed beforehand and ensuring that there is redundancy in each element. For example, a good disaster recovery plan will identify individuals responsible for restoring a database, and a backup to that individual in case the primary individual is lost. The backup individuals can then be trained properly to take action when it becomes necessary.  Once the crisis starts, it is too late to take these steps. 

Elements of a Disaster Recovery Plan

A Disaster Recovery plan includes many elements that help us be prepared in a crisis. The purpose of identifying all of these up front is to ensure that we have primary and backup human resources trained for each task that will be necessary to be performed in a crisis, and that that we have reliable backups in place of all physical and technical resources (applications, databases, servers, networks, buildings, vehicles, machinery) that will be required in order to stay in business or get back in business after a disaster. Some of the more critical elements of the plan follow. Since this is a database blog, the remainder of this article will be focused on applications and databases.

Scenarios

We want to enumerate as many possible disaster scenarios as we can in order to ensure a robust plan. As we describe each scenario in detail, we will find blind spots that we have and we will address them. The scenarios must describe what may happen, what that will look like, exactly what steps we will need take to get back in business, and exactly who will do them. Examples of technology related disasters:

  • Main data center is hit by extended power outage due to flooding damage to regional power grid
  • Infrastructure is hit with ransomware attack
  • Hurricane cuts connectivity to main data center
  • Human error causes loss of a large data table for mission critical applications.
  • Storage system firmware update causes corruption in production database

Inventory of applications (including dependencies on databases) 

Include nameless applications (reporting or analytical tools used against a data mart or data warehouse). Collecting this information on each application will help us know exactly who to call when disaster strikes, saving valuable time. Ensure that every known database is referenced here.

  • Application Owner
  • Recovery Time Objective
  • Recovery Point Objective
  • Responsible IT persons (primary and backup)
    • Application
    • Network
    • Cloud Infrastructure
    • Storage
    • Server
    • Database
    • Backup Maintenance

Test the Elements of a Disaster Recovery Plan

    • Test Procedures for each application in inventory
      • Identify systems to be used for test restore if applicable
        • Responsible party to provision these systems
      • Example Pre testing steps
        • Determine which applications/databases are in scope of this test
        • Gather data points to validate. This typically involves finding an example of both recently entered or modified data, and old data, to ensure that a full range of timeframes is represented and continues to be available after the recovery.
      • Example steps for conducting the test   — some or all of these may be applicable
        • Failover to backup database
        • Restore backup database
        • Point application to test database 
      • Example Post testing steps — some or all of these may be applicable
        • Validate the data points
        • Switch back to primary
        • Repoint the applications to primary database
  • Update the Disaster Recovery plan to reflect any lessons learned, staff changes, new, changed, or decommissioned databases, applications, or hardware.
    • Testing Schedule
      • When will tests be conducted?
        • Frequency — recommend minimum of twice per year.
        • What point in the quarter, month, week,
        • Time Of Day
  • Test Cases
    • Screens/reports to review
    • Data points to validate 
  • Responsible parties
    • Who will be responsible for conducting the test?
    • Who will be responsible for validating the results?

Living Document

As with many documents critical to our businesses, this must be a living document. This document contains names and contact information for key personnel that must be called in a time of crisis. It is critical that this document be updated regularly so changes in staff and responsibilities are reflected.  New applications and databases are added regularly as well, these must also be kept current.  Best practice is to update this document each time the tests are conducted.

Database Disaster recovery tools

One key aspect of the recovery plan from a database perspective will be the designation of a tool or tools to create standby databases that can be used in the event of a failure of the primary database. Most database tool vendors provide tools to do this. We will discuss the tools provided in Oracle for this purpose as well as a third party tool (Dbvisit). Future articles will describe DR options for SQL Server and other database.

Oracle Data Guard

Oracle provides a tool called Oracle Data Guard that can be used to configure and manage standby databases. Oracle Data Guard provides the capability to create and manage local or remote standby databases, and manage the transition from primary to standby and back. and it can create logical or physical standbys. At the center of Oracle Data Guard is a set of command line utilities entered at the Oracle console (SQL Prompt).  Oracle’s enterprise manager tool (Cloud Control) provides a graphical interface on top of dataguard and simplifies the use of the tool.  

Oracle Data Guard comes included as part of Oracle Enterprise Edition. A more powerful tool enabling greater use of standby databases is also available for enterprise edition called Active Standby. Unlike basic Oracle Data Guard, Active Data Guard has additional license fees.

DBVisit

Both of the Oracle Data Guard tools previously mentioned require Oracle Enterprise Edition.  There is no DR solution available from Oracle for Standard Edition. Fortunately, DBVisit offers a solution. Dbvisit provides the functionality to create and manage standby databases for Oracle Standard Edition. The tool offers a graphical user interface that makes creating and managing a DR solution for Oracle Standard Edition simple. And the licencing is much lower than the cost to upgrade to Oracle Enterprise Edition. If the only reason for needing Oracle Enterprise Edition are the DR capabilities of Oracle Data Guard, DBVisit is a good option. 

These are the Elements Of A Disaster Recovery Plan

In summary, a good DR plan should include everything about what an organization must do to recover from an emergency. This includes the who, what, when, where and how for the entire process from the moment that an emergency occurs to when the organization is fully recovered. 

If you would like to discuss creating and implementing a Disaster Recovery Plan, especially the Database Related components of your plan, give us a call and we can talk about the best approach.

Also, please leave comment with thoughts that you have about disaster recovery planning.  Let me know if you include things I didn’t mention. Or share stories about how a plan helped in a disaster, or how the absence of a plan hurt 🙁

And if you like this article, please share it with your colleagues and subscribe to our blog.

Database Maintenance Tips: Weeding the Database

Database Maintenance Tips: Weeding the Database

This time of year I spend a lot of time in the garden. Early mornings are peaceful and quiet, and the perfect time to reflect on the garden and clean it up. This is a discipline that I also apply to the databases that I manage. I will describe the process that I follow when tending to the garden, and then talk about how a similar process can help keep your database healthy, functional, and efficient.

How I maintain the garden

I look carefully at each bed, and think about what belongs and what does not. What is contributing to a healthy garden, and what is sucking energy away from what I really want to grow. There are multiple levels to this process:

I start by defining the edges of each bed, making sure that there is a well defined edge where this bed ends, and the lawn begins. Then I remove the weeds. Weeds of course are just plants that I don’t want there. They are not bad things, they just don’t contribute to the overall purpose of the bed, and they suck energy from the plants that do. They also make things look messy and compromise both the maintenance of the bed and enjoyment or utility of it by causing us to think about what belongs and what does not belong each time we look at it or need to work in it.

After we are done with this defining the edge, we drill into each individual plant in the bed. We look for dead blooms or branches that can be removed so they don’t suck energy from the living parts of the plant. We stand back and determine how we want the plant to look and we prune so it does not get messy. Essentially, we are ensuring that each plant plays the part we wish in the overall function of the bed.

As we do this, we ask questions like “what is the function of the bed, is it to screen for privacy, or to provide color, or to absorb wind?” And “what is the function of each plant within the bed?” The answer to these questions determine how tall or wide we let the bushes grow, how close together we let them get, and which ones will be allowed to take the most room.

Finally we lay mulch to minimize the weeds and to conserve the moisture in order to minimize maintenance costs and to protect these valuable plants.

What does this have to do with a database?

Managing a database is similar to managing a garden bed. Let’s look at how the above process is similar to proper and thorough database maintenance.

Defining the edges

This is a critical step for securing and maintaining our databases. There are at least three ways that the edges of our databases get messy (compromised).

User Access

Users of course are given access to our data. Initially this is done with intention. But invariably, someone needs some data for a report, a request is made for access, and the access is granted, often with the intent of the access being temporary.  From then on, data may leak from our database to external systems without real intention. In a well secured database, all access must be identified, documented, approved, and intended.

Power User Data

Often, data must be combined with external data for a specific report. A power user will build a table in our database for that purpose, and a script will be created to keep that “external table” up to date in our database. This external table may grow in unexpected ways because the application owner does not control it. In a well managed database, all data coming into the database must be identified, documented, approved, and intended.

Database links

Links may be set up to enable another application to either feed data to the subject database or to query data from it. These links must be examined on a regular basis to ensure they are still necessary, and that the linked database is as secure as the subject database is.

Removing the weeds

Once the edges are defined, we must look at all database objects and make sure they are contributing to the purpose of the database. We want to remove any tables, stored procedures, or functions that were introduced during tests or trials, any that serve obsolete functions, and any that were created by power users and are no longer needed or were not authorized. This is both for security purposes and to ensure that no resources are being wasted.

Pruning

Now that we only have valid objects in the database, we apply the same process to the remaining objects. This is a more time consuming and difficult process, but it is important to ensure that the system continues to remain secure and valuable to the organization. This means ensuring that tables only contain data that is relevant and that contribute value. There are two dimensions to this and they are most easily thought of in slightly technical terms: Rows and Columns.

Rows (records)

Are all rows or records in the database valuable and relevant. For the most part, this will be time based. If the system is now 10 years old, do we need all 10 years of data in the database? If not, can we purge some of that old data, or can we archive it so that it is out of the primary database environment in order to improve performance and decrease resource consumption?  Beyond time based concerns, there may be other examples, such as products that are no longer sold for which we store a lot of data, can these be purged?  Or perhaps we spun off an entire division of the company, but the old data remains, taking up space and impacting performance.

Columns (fields)

Are there unnecessary data points stored in any of our tables? In customer data there may be PII data that is not necessary and poses a security risk. In chemical lab data there may be unnecessary test results in a large table that will never be used but that are taking up space in the database and dragging down performance needlessly.

Laying the mulch

After we have cleaned up a garden bed, we lay mulch to reduce weeds, to improve the overall function and appearance of the bed, and to both protect the health of the bed and improve the efficiency of the bed by conserving water.

The equivalent to mulch in database maintenance is a well designed and implemented maintenance plan. This means regularly scheduled health checks (looking for weeds), maintenance jobs (pruning the dead branches),  and validation of backups and disaster recovery processes (Ok, I am struggling for an analogy for that one – maybe ensuring that the sprinkler system works in case there is no rain?).

It’s worth it

We hope you found these database management tips useful. A well managed database is like a beautiful garden. It requires hard work and discipline but the rewards are plentiful. A well maintained garden provides healthy fruits and vegetables and beautiful colors and scents. It attracts songbirds and butterflies, enhances curb appeal, and adds value to your home. A well maintained database provides accurate and timely information to your team to accelerate growth and ensure excellence in execution, and provides critical insights to management that are necessary to keep the business healthy and responsive to changing markets and regulations.

So get out there and bring your clippers!

7 Ways To Improve SQL Query Performance

7 Ways To Improve SQL Query Performance

How do you improve SQL query performance? That is a big question, and one that we get asked all the time. There is no one answer, but there is a process that we apply to make a difference in query performance. In this post, I will discuss some of the questions we ask, some of the diagnostics we run, and some of the steps we take to reduce the amount of time a query takes. 

The questions to ask are similar for any relational database software, so this discussion will apply to Oracle, SQL Server, MySQL, PostgreSQL, and others. I may mention tools or processes by a database-vendor specific name but, for the most part, each software vendor has something that is equivalent. 

Query tuning is a complex and iterative process, so no blog post, including this one, would be comprehensive. The objective is to help you understand how to think about tuning from a broader perspective rather than looking only at the query in question, and is more about concepts than syntax.

Questions to Ask When Looking to Improve SQL Query Performance

To narrow down where the problems are with a SQL query, we start with some basic questions about the query and how it is being executed. I will discuss each question and talk about why we ask it, and what information the answer might give us. None of these questions will tell us definitively what the problem is, but they can point us quickly in the right direction and save precious time when a client is waiting for improved response time.

Timeframe 

Is the query that we are interested in (hereafter referred to as “our query”) executed during a period when the system is heavily taxed by other processes?

  • Why we ask: If our query is executed during a very busy time, then the problem may not be with our query at all.  Reducing load on the system by examining other queries first (using this same strategy) may be more effective. So we would start by identifying and examining the most resource intensive queries first, to try to reduce overall system load. 

Proximity and Size

Does our query take the same amount of time whether it is executed locally or remotely?

  • Why we ask: If our query is executed remotely (executed in a browser or application on a server other than the database server) and if it returns a large number of rows, then it is possible that the data transfer is the bottleneck, rather than the retrieval of the data from the database. Asking this question may help us take the network out of the equation.

Result Set Characteristics 

When our query completes, does it return a large number (millions?) of rows?

  • Why we ask: When executing our query locally, if it takes a long time to complete, there are two possibilities. Either it takes a long time for the database software to find the data to return, or it takes a long time to return the data to the screen or the application. The former can be fixed by tuning the query; the latter may mean that our query is returning too many rows to be practical. In the latter case, we should revisit the intent of the query to see if an aggregated form of the data would be more usable, or if breaking the result set up into more manageable chunks makes sense. Also, a very large result set may be an indication of an error in the query itself, perhaps a missing join, or missing criteria resulting in a Cartesian product. In this case, we would look at the logic being expressed in the query and ensure that it matches the intent of the query. 

Is the result set both large and aggregated or sorted?

  • Why we ask:  Aggregation and sorting on large result sets require significant temporary space. If this is a significant part of the query operations, we want to look at the management of memory buffers, and temp space (System Global Area (SGA), Program Global Area (PGA) and temporary segments or their equivalents). We want to make sure that enough memory is allocated so that we are not excessively writing out to temp space, and that temp space is optimally sized and located.

Is the result set a (relatively) small subset of a large amount of data?

  • Why we ask:  If the database is very large, and if our query returns a small subset of the data, there are two broad solutions that may be applicable: adding or optimizing indexes, and adding or optimizing partitioning. Up to a certain data size, proper indexing alone can provide adequate performance. When data gets very large, however, a combination of indexes and partitions will be necessary to provide adequate performance when querying a subset of the data. 

Historical

Has the performance of the query degraded over time?

  • Why we ask:  If the query performed well in the past, but no longer does, look at the growth rates of data in the tables referenced by the query. If the amount of data has increased significantly, new indexes may be required that were not necessary when less data was referenced. Significant data growth may also result in optimizer statistics that no longer reflect the characteristics of the data, requiring a refresh of these statistics if they are not automatically refreshed.

Does the data being queried involve many updates or deletes (as opposed to mostly inserts)?

  • Why we ask: Data that is frequently updated may result in index or tablespace fragmentation. This may also result in invalid statistics as in the case of significant data growth. 

Conclusion

Query tuning is an iterative process and there are many other questions to ask as we get into the details. But the above questions help us see the big picture and can steer us in the right direction very quickly and help prevent us from going down the wrong path and wasting time.

If you have any other questions that you like to ask when tuning that you’d like to share, or if you have an interesting tuning story, please share in the comments.