Using the Data Dictionary to find hidden data in SQL Server

Using the Data Dictionary to find hidden data in SQL Server

A client asked me recently how he could find a string in his SQL Server database without knowing what table or column it was in. The string was a translation of a code that appeared on one of the UI screens. This was a packaged CRM database and he had no documentation on the schema. With hundreds of tables whose names were not obvious, he had no idea where the translation might be kept, but wanted to do some reporting using the translations. It was like finding a needle in a haystack!

In order to help, I wrote a utility to find the data for him. Using the data dictionary tables and a little dynamic sql,  I created a script that would search for a string in every column of every table in the database.

We can do this by using the information_schema.columns view to create statements that insert search results into a temporary table. We want to record the table, the column, and the full contents of any column that contained the string we wanted to search for in order to provide context for the search results.

There are some complications that we have to address as we do this.  First, since we want to do a like comparison against any of the fields we must restrict the search to char and varchar fields. This is necessary because the like comparison cannot be used against xml and some other datatypes. That restriction works in this case because I was searching for a string and it was very unlikely that this string would be embedded in an xml field. Second, to prevent errors resulting from spaces, hyphens, or other special characters in table or field names, we must surround the object names with brackets — this is always a good practice when using dynamic sql with sql server.

Since this utility uses the like comparison with a wilcdard before and after the search string, indexes will not be used so performance will be an issue. This utility is best run during non-production hours and may take a long time to complete. This can be mitigated by modifying the application to remove the leading wildcard and then indexes on any of the columns would be used, but this will only find strings that start at the beginning of the column value that is being checked.

We now have a nice utility that give a report of all places where the string lives. The utility can easily be extended to handle numbers, dates, and other data types as necessary. This script works with SQL Server but similar scripts can be created for any major database that has a user accessible data dictionary.

Note that this procedure does not automatically delete the prior contents of the search_findings table. You may wish to add that to the script if you don’t want the results to be cumulative.

The instructions follow.

1. Start by downloading find-it

2. Create the table that will hold the search results using create_search_results_table.sql

3. Create the find-it procedure using create_find-it_procedure.sql

4. Issue the search with the following command:

exec find_it  ‘search string’

Five Leading Causes of Oracle Database Performance Problems

Five Leading Causes of Oracle Database Performance Problems

As we evaluate our clients’ Oracle databases and help them solve problems relating to performance and stability, we see many causes for these issues.

However, there are a small number of basic issues that tend to cause problems on the majority of systems.

Our Tech Tips document describes these issues and discusses methods of identifying them and resolving them.

Topics include:

  • Statistics Gathering
  • Sga Sizing
  • Redo Log Sizing
  • Index Usage
  • Extent Management

Oracle Security — Data Masking using Enterprise Manager

Database administrators often have to provide test data sets to developers, QA teams, or UAT teams.

This can be done manually creating test data sets, which can take a very long time especially when the goal is to stress test a new system and a large amount of data is required.

An alternative approach often taken by database administrators is to copy the production database. Too often, an exact copy of the production data is used for this purpose, which of course raises a number of database security risks.

To mitigate these risks, database administrators typically mask or obfuscate the production data set before sending it out.

 

This process involves writing a set of scripts that will apply a mask or function to each sensitive piece of data so that the true data is hidden. Writing these scripts is very time consuming because there are many factors to consider when masking the data, including the need to preserve the relationship between data in tables, in order for the new data to make sense.

Oracle Enterprise Manager can now help significantly reduce the amount of work necessary to do this task.  By using the data masking capability in OEM, a database administrator can specify the mask or function that must be applied to each sensitive column of the database. Oracle data masking will then generate a set of pl/sql scripts that will perform the masking for you.

It will automatically handle the preservation of relationships between tables, as specified either by existing foreign key constraints in the database, or by relationships that the dba can specify in the Oracle masking tool. It also enables the dba to specify fields that must remain logically in sync after the masking operation, such as a city and state.  The generated scripts will ensure that those relationships remain consistent in the masked data.

This Oracle masking feature makes it much easier for oracle database administrators to quickly provide data sets to development and testing teams while reducing the risks of exposing sensitive data.

An excellent article about this feature with additional usage detail can be found at the following link: at datamasking.html

Using the Data Dictionary to find hidden data in SQL Server

Storing documents in a MongoDB database instead of the file system

We are currently working on an application that uses a mysql database. The application produces a large number of csv files that must be kept around for a few weeks, and then removed after being used in various ways by the users of the system.

We need to keep track of all of these files and present a list of the files to the user so they can download them to their local machine, or send them to other systems as needed.  We need to track when the file was created, who created it, and other information.

We considered a few alternatives for managing these files;  We could keep then on the file system and have scripts that keep the database synchoronized with the file system. Or we could load each file into a blob type and keep it in the mysql database. Either of these options will work fine if we are careful. But we are seeking something more.

Our research led us to MongoDB. An open source document oriented database.

MongoDB claims to bridge the gap between key-value stores which are typically used by high volume web applications that have simple data storage requirements, and traditional RDBMS database systems.

In addition to simple queries based on key-value pairs, you can use boolean expressions, regular expressions, set operations, and sorting and limiting. MongoDB also has the ability to grab documents directly from the file system using its gridFS functionality.

We have not implemented MongoDB for this project yet but it looks very promising.

I will send updates as we get closer.