Using Oracle GoldenGate for real time data replication and integration.

Oracle GoldenGate is Oracle’s best-of-breed, strategic solution for real-time data replication and integration. It offers a rapid, robust and low-overhead approach for moving small or large volumes of transactional data between diverse operational and analytical systems—including not just various Oracle Database versions but also non-Oracle databases like SQL Server, Sybase, IBM DB2 and others.

Though extremely powerful and flexible, Oracle GoldenGate is straightforward to install and configure. Its transformation, filtering and custom processing features make it suitable for a wide range of use cases, including:

  • Data warehousing and analytics/decision support
  • High availability and disaster recovery (HA/DR) scenarios
  • Database loading and migration

One of our DBA consultants has been working with a complex, mission-critical, enterprise-wide GoldenGate implementation that illustrates how much you can do with this offering. It is being used to enable a “Know Your Customer” application, where data is replicated in real-time between production and DR (“hot-hot”) sites. The client opted to use GoldenGate over Oracle’s physical standby capability because they needed read-write access at both sites.

GoldenGate replicates data among four main regions and three satellite regions worldwide in a complex, multi-master source/target architecture. Data can flow in real-time from any site to any site. Intercontinental and regional data restrictions mean that sensitive data must be masked before sending it out of some regions. This requirement ruled out most competing solutions, which provide only storage-level replication with little flexibility for transforming or filtering the data in between sites.

This implementation also leverages GoldenGate’s DDL replication support, which further supports high availability. Anytime a table is created or modified, the DDL (and hence the modified data) is replicated across the topology. For even greater data reliability, the implementation has been extended via a “self-healing” agent that identifies any data inconsistencies and attempts to fix them automatically, with no human intervention.

As you can see, GoldenGate has a very flexible architecture. It can handle many data extraction/transformation or conflict resolution scenarios out-of-the-box or with minimal customization using built-in functions. It plays very well with Oracle RDBMS, and offers endless possibilities for writing custom PL/SQL functions or native C language routines to get the functionality you need.

Click Here to find out more about Oracle GoldenGate use cases and benefits, and how to get started.

Using the Data Dictionary to find hidden data in SQL Server

Using the Data Dictionary to find hidden data in SQL Server

A client asked me recently how he could find a string in his SQL Server database without knowing what table or column it was in. The string was a translation of a code that appeared on one of the UI screens. This was a packaged CRM database and he had no documentation on the schema. With hundreds of tables whose names were not obvious, he had no idea where the translation might be kept, but wanted to do some reporting using the translations. It was like finding a needle in a haystack!

In order to help, I wrote a utility to find the data for him. Using the data dictionary tables and a little dynamic sql,  I created a script that would search for a string in every column of every table in the database.

We can do this by using the information_schema.columns view to create statements that insert search results into a temporary table. We want to record the table, the column, and the full contents of any column that contained the string we wanted to search for in order to provide context for the search results.

There are some complications that we have to address as we do this.  First, since we want to do a like comparison against any of the fields we must restrict the search to char and varchar fields. This is necessary because the like comparison cannot be used against xml and some other datatypes. That restriction works in this case because I was searching for a string and it was very unlikely that this string would be embedded in an xml field. Second, to prevent errors resulting from spaces, hyphens, or other special characters in table or field names, we must surround the object names with brackets — this is always a good practice when using dynamic sql with sql server.

Since this utility uses the like comparison with a wilcdard before and after the search string, indexes will not be used so performance will be an issue. This utility is best run during non-production hours and may take a long time to complete. This can be mitigated by modifying the application to remove the leading wildcard and then indexes on any of the columns would be used, but this will only find strings that start at the beginning of the column value that is being checked.

We now have a nice utility that give a report of all places where the string lives. The utility can easily be extended to handle numbers, dates, and other data types as necessary. This script works with SQL Server but similar scripts can be created for any major database that has a user accessible data dictionary.

Note that this procedure does not automatically delete the prior contents of the search_findings table. You may wish to add that to the script if you don’t want the results to be cumulative.

The instructions follow.

1. Start by downloading find-it

2. Create the table that will hold the search results using create_search_results_table.sql

3. Create the find-it procedure using create_find-it_procedure.sql

4. Issue the search with the following command:

exec find_it  ‘search string’

MySQL Parser Deficiencies

MySQL Parser Deficiencies

MySQL is a very popular database for many applications. But the parser has some important deficiencies that developers need to be aware of. The parser is inconsistent and not robust in the way that it enforces statement syntax. This may lead to ambiguity and application defects that are very difficult to find. I include one example below and link to another blog article that discusses another.

Inadequate Group By Parsing

The MySQL group by function does not require that all non-aggregated elements in the select list be present in the group by clause. This causes an ambiguity because if a given element in the select list is neither present in the group by clause, nor aggregated in the select list, then an arbitrary value is chosen (it actually seems to be the first value but I don’t know if that is always the case).  Consider this example table that contains the number of games a set of players has played in each of four seasons.

We can see that each player has played 100 games in each of the four seasons.

Now consider this simple query to get a sum of all the games that the player has played across all seasons:

As expected, we can see that each player has played 400 games.

But watch what happens if a novice developer places the season number field in the select list without placing it in the group by clause as well:

If the developer does not realize his error, he will think that each player played 400 games in season 1!

The parser should have prevented the query from being executed with the season_number in the select list when it was neither aggregated in some form or present in the group by clause.

This type of error is very difficult to find because the resulting data looks plausible unless you really know the data.

Function Syntax Inconsistency

The next example is related to the way that the parser enforces one statement syntax vs another. This involves the tolerance of whitespace (line breaks) when specifying the sum function vs the format function. This inconsistency does not concern me as much as the group by problem because it does result in an error that is visible to the developer, but it does illustrate a lack of consistency and robustness in the parser.

As MacLoghLainns Weblog describes, the SUM function requires that the opening parentheses be on the same line as the SUM keyword, while the FORMAT function does not have that restriction. View the blog article for more details.

Please reply back with comments or other issues that you have found that might help others avoid unexpected problems in their MySQL applications.

Visit www.budaconsulting.com for more information about Buda Consulting.

Total System Failure: Why you should validate your database backup

Total System Failure: Why you should validate your database backup

Every now and then things happen that make us realize how important double checking things can be.

I had a recent experience in which four unrelated parties (including me) did the wrong thing regarding one business transaction. If any one of us had done the right thing, there would have been no problem. But all four parties dropped the ball, causing potentially serious problem. This is why it is important to leave nothing to chance. Here is the story.

A few months ago, our health insurance company asked us to send it a report detailing who was on our payroll last year. The following sequence of mistakes turned this simple request into potentially serious problem for our business.

Mistake number 1: I put that request that I received from the insurance company to the side and did not send the report to them by the due date. This would normally be OK, because the insurance company sends out a certified letter in the event that they do not receive the report in time.

But…

Mistake number 2: We have a mailing address and a different physical address. For some inexplicable reason, the insurance company sent out the certified letter to our physical address instead of our mailing address, where they send all other correspondence, and where we would expect important business mail to arrive. This would normally be OK because if we receive mail to our physical address, it will eventually get to our bookkeeper and be addressed.

But…

Mistake number 3: The US Postal service delivered the certified letter notices to wrong suite They delivered it to the neighboring business instead of to us. This happens occasionally. I dont understand why this happens but since most of our mail comes to our mailing address, and our neighbor usually brings our mail over when this happens, it is normally not a problem.

But…

Mistake number 4: For some reason that puzzles me most of all, the neighboring business never bothered to tell me or the mailman that they received two certified letter notices for me over a month ago! While I acknowledge that this party had the least obligation in the transaction, I am shocked at the lack of consideration that they showed.

So, what does all this have to do with your Database?

All four parties involved in the above scenario had a well defined job to do, or at least a well understood and expected course of action. All are trusted parties that usually do what is expected of them. We had every reason to believe that all four parties involved would perform their jobs properly. And yet not one of the parties involved did the right thing.

The same can be true of your disaster recovery processes. Just because you have specified what should happen with your database backup, and who should do it, does not mean it us actually being done!

As I mentioned in an earlier post, when we examine a client’s disaster recovery posture, we find in a large percentage of systems that the backups are not working the way the client thinks they are, and in many cases, they are not working at all!

So be sure to have an independent party validate that you are really as protected as you think you are.

Important Steps to take to validate database backups.

  1. Validate Database Backup Schedules.
  2. Validate that ALL required databases and Schemas are being backed up
  3. Validate that ALL database stored procedures, etc are being backed up
  4. Validate that all external files (such as Oracle control and redo log files and SQL Server transaction logs) are included properly in the backup.
  5. Validate that the database files are placed into a backup-safe state (eg, Hot Backups in Oracle) prior to backing them up.
  6. Perform Test Restores using actual backup files from the nightly production backup routine to ensure that the backups are usable in the event of a disaster.

Bottom line: Validate that your backups really do what you think they are doing, and that they will enable you to recover fully when the inevitable system failure occurs.

Total System Failure: Why you should validate your database backup

Database Downtime: Prepare For The Unexpected

Test your Assumptions: Database Backup and Recovery

Every now and then something happens that really drives home the need to test and validate the assumptions that we have about the availability of our IT assets. I had an experience this week that brought to mind the importance of testing backup and recovery and business continuity plans.

Planning

At the beginning of each week, I look at each day’s activities and any travel that I need to do that week, and I plan out how I can be most productive each day. I find myself on the train often now between our offices in New Jersey and Boston and I have come to take my wifi service for granted.  I rarely have down time when traveling that way any longer.

Last month, while traveling to San Antonio by air, I was able to use wifi in the air, just as I can on the ground on the train.

Then last week, while planning a trip by air to Austin from Philadelphia,  I decided to make good use of the flight time. I planned to use the roughly four hours of flight time to get some work done that I will need for the next day.

Assuming

After I boarded the flight however, I found out that a huge assumption that I made was not correct. I found that not all airlines have wifi!

So now as I sit on the plane writing this post into a word document,  I am completely disconnected from the web, from my office, from my clients!

The problem here is not that I am not connected for a few hours,  it is that I did not anticipate that this might happen, and so I planned to use that time to get some important work done.  I assumed that the flight had wifi, and I did not validate that assumption!

Think about what will happen if you (or your customers) don’t have access to your servers for a few hours.  It can be that the connectivity to the servers was interrupted, as in my case,  or that the servers are completely down, or that your database software is hanging.  Ask yourself what will happen during those hours, and what you can do to avoid them in the first place.

Validating

Validating your assumptions is key to productivity. In this case, it is only one person whose productivity is compromised today, but consider the cost if your whole company is down for a few hours. What are you taking for granted?

So what does this have to do with the database?

This is a database blog, so you might be asking what this has to do with the database.  In the database business, we see people fail to validate assumptions all the time. A typical (and very dangerous) assumption people make is that their database is properly backed up and ready to restore in a disaster. As I describe in this blog post, this is rarely the case.  This is one of the most important assumptions for any company to validate.

If you haven’t tested your backup procedures lately we can help you validate that your database is indeed protected the way you assume it is.

Using the Data Dictionary to find hidden data in SQL Server

Why I Avoid Database Hints: a followup

In an earlier blog post about database hints, I gave a number of reasons why I dont like to use hints in SQL queries. Today I wanted to follow up to share one situation where I think hints are OK, even necessary.

As described very well in an article in an article by Andrew Kelly in SQL Server Magazine (InstantDoc 129916),  hints are sometimes necessary when we wish to minimize logging during bulk loading operations. In particular, when issuing a bulk insert, using the TABLOCK hint is necessary in order to reduce logging to the minimum amount even when recovery mode is set to simple.

 I still feel strongly that we should avoid the use of optimizer hints in select statements that are embedded in applications for the reasons that I mention in my original post, but Andrew’s article describes one situation where I think they are acceptable.

If you have comments about hints that you would like to share, please reply.