MySQL is a very popular database for many applications. But the parser has some important deficiencies that developers need to be aware of. The parser is inconsistent and not robust in the way that it enforces statement syntax. This may lead to ambiguity and application defects that are very difficult to find. I include one example below and link to another blog article that discusses another.

Inadequate Group By Parsing

The MySQL group by function does not require that all non-aggregated elements in the select list be present in the group by clause. This causes an ambiguity because if a given element in the select list is neither present in the group by clause, nor aggregated in the select list, then an arbitrary value is chosen (it actually seems to be the first value but I don’t know if that is always the case).  Consider this example table that contains the number of games a set of players has played in each of four seasons.

We can see that each player has played 100 games in each of the four seasons.

Now consider this simple query to get a sum of all the games that the player has played across all seasons:

As expected, we can see that each player has played 400 games.

But watch what happens if a novice developer places the season number field in the select list without placing it in the group by clause as well:

If the developer does not realize his error, he will think that each player played 400 games in season 1!

The parser should have prevented the query from being executed with the season_number in the select list when it was neither aggregated in some form or present in the group by clause.

This type of error is very difficult to find because the resulting data looks plausible unless you really know the data.

Function Syntax Inconsistency

The next example is related to the way that the parser enforces one statement syntax vs another. This involves the tolerance of whitespace (line breaks) when specifying the sum function vs the format function. This inconsistency does not concern me as much as the group by problem because it does result in an error that is visible to the developer, but it does illustrate a lack of consistency and robustness in the parser.

As MacLoghLainns Weblog describes, the SUM function requires that the opening parentheses be on the same line as the SUM keyword, while the FORMAT function does not have that restriction. View the blog article for more details.

Please reply back with comments or other issues that you have found that might help others avoid unexpected problems in their MySQL applications.

Visit www.budaconsulting.com for more information about Buda Consulting.