Using the Data Dictionary to find hidden data in SQL Server

Using the Data Dictionary to find hidden data in SQL Server

A client asked me recently how he could find a string in his SQL Server database without knowing what table or column it was in. The string was a translation of a code that appeared on one of the UI screens. This was a packaged CRM database and he had no documentation on the schema. With hundreds of tables whose names were not obvious, he had no idea where the translation might be kept, but wanted to do some reporting using the translations. It was like finding a needle in a haystack!

In order to help, I wrote a utility to find the data for him. Using the data dictionary tables and a little dynamic sql,  I created a script that would search for a string in every column of every table in the database.

We can do this by using the information_schema.columns view to create statements that insert search results into a temporary table. We want to record the table, the column, and the full contents of any column that contained the string we wanted to search for in order to provide context for the search results.

There are some complications that we have to address as we do this.  First, since we want to do a like comparison against any of the fields we must restrict the search to char and varchar fields. This is necessary because the like comparison cannot be used against xml and some other datatypes. That restriction works in this case because I was searching for a string and it was very unlikely that this string would be embedded in an xml field. Second, to prevent errors resulting from spaces, hyphens, or other special characters in table or field names, we must surround the object names with brackets — this is always a good practice when using dynamic sql with sql server.

Since this utility uses the like comparison with a wilcdard before and after the search string, indexes will not be used so performance will be an issue. This utility is best run during non-production hours and may take a long time to complete. This can be mitigated by modifying the application to remove the leading wildcard and then indexes on any of the columns would be used, but this will only find strings that start at the beginning of the column value that is being checked.

We now have a nice utility that give a report of all places where the string lives. The utility can easily be extended to handle numbers, dates, and other data types as necessary. This script works with SQL Server but similar scripts can be created for any major database that has a user accessible data dictionary.

Note that this procedure does not automatically delete the prior contents of the search_findings table. You may wish to add that to the script if you don’t want the results to be cumulative.

The instructions follow.

1. Start by downloading find-it

2. Create the table that will hold the search results using create_search_results_table.sql

3. Create the find-it procedure using create_find-it_procedure.sql

4. Issue the search with the following command:

exec find_it  ‘search string’