NoSQL | Buda Consulting

7 Ways To Improve SQL Query Performance

by Robert Buda | May 14, 2021 | MySQL, NoSQL, PostgreSQL, SQL Server

How do you improve SQL query performance? That is a big question, and one that we get asked all the time. There is no one answer, but there is a process that we apply to make a difference in query performance. In this post, I will discuss some of the questions we ask, some of the diagnostics we run, and some of the steps we take to reduce the amount of time a query takes.

The questions to ask are similar for any relational database software, so this discussion will apply to Oracle, SQL Server, MySQL, PostgreSQL, and others. I may mention tools or processes by a database-vendor specific name but, for the most part, each software vendor has something that is equivalent.

Query tuning is a complex and iterative process, so no blog post, including this one, would be comprehensive. The objective is to help you understand how to think about tuning from a broader perspective rather than looking only at the query in question, and is more about concepts than syntax.

Questions to Ask When Looking to Improve SQL Query Performance

To narrow down where the problems are with a SQL query, we start with some basic questions about the query and how it is being executed. I will discuss each question and talk about why we ask it, and what information the answer might give us. None of these questions will tell us definitively what the problem is, but they can point us quickly in the right direction and save precious time when a client is waiting for improved response time.

Timeframe

Is the query that we are interested in (hereafter referred to as “our query”) executed during a period when the system is heavily taxed by other processes?

Why we ask: If our query is executed during a very busy time, then the problem may not be with our query at all. Reducing load on the system by examining other queries first (using this same strategy) may be more effective. So we would start by identifying and examining the most resource intensive queries first, to try to reduce overall system load.

Proximity and Size

Does our query take the same amount of time whether it is executed locally or remotely?

Why we ask: If our query is executed remotely (executed in a browser or application on a server other than the database server) and if it returns a large number of rows, then it is possible that the data transfer is the bottleneck, rather than the retrieval of the data from the database. Asking this question may help us take the network out of the equation.

Result Set Characteristics

When our query completes, does it return a large number (millions?) of rows?

Why we ask: When executing our query locally, if it takes a long time to complete, there are two possibilities. Either it takes a long time for the database software to find the data to return, or it takes a long time to return the data to the screen or the application. The former can be fixed by tuning the query; the latter may mean that our query is returning too many rows to be practical. In the latter case, we should revisit the intent of the query to see if an aggregated form of the data would be more usable, or if breaking the result set up into more manageable chunks makes sense. Also, a very large result set may be an indication of an error in the query itself, perhaps a missing join, or missing criteria resulting in a Cartesian product. In this case, we would look at the logic being expressed in the query and ensure that it matches the intent of the query.

Is the result set both large and aggregated or sorted?

Why we ask: Aggregation and sorting on large result sets require significant temporary space. If this is a significant part of the query operations, we want to look at the management of memory buffers, and temp space (System Global Area (SGA), Program Global Area (PGA) and temporary segments or their equivalents). We want to make sure that enough memory is allocated so that we are not excessively writing out to temp space, and that temp space is optimally sized and located.

Is the result set a (relatively) small subset of a large amount of data?

Why we ask: If the database is very large, and if our query returns a small subset of the data, there are two broad solutions that may be applicable: adding or optimizing indexes, and adding or optimizing partitioning. Up to a certain data size, proper indexing alone can provide adequate performance. When data gets very large, however, a combination of indexes and partitions will be necessary to provide adequate performance when querying a subset of the data.

Historical

Has the performance of the query degraded over time?

Why we ask: If the query performed well in the past, but no longer does, look at the growth rates of data in the tables referenced by the query. If the amount of data has increased significantly, new indexes may be required that were not necessary when less data was referenced. Significant data growth may also result in optimizer statistics that no longer reflect the characteristics of the data, requiring a refresh of these statistics if they are not automatically refreshed.

Does the data being queried involve many updates or deletes (as opposed to mostly inserts)?

Why we ask: Data that is frequently updated may result in index or tablespace fragmentation. This may also result in invalid statistics as in the case of significant data growth.

Conclusion

Query tuning is an iterative process and there are many other questions to ask as we get into the details. But the above questions help us see the big picture and can steer us in the right direction very quickly and help prevent us from going down the wrong path and wasting time.

If you have any other questions that you like to ask when tuning that you’d like to share, or if you have an interesting tuning story, please share in the comments.

Beyond “Fast and Simple”: Top 5 Use Cases for NoSQL Database Technology

by Robert Buda | Apr 27, 2021 | NoSQL

Since I first blogged about NoSQL back in 2013, the use of “Not Only SQL” databases for mission-critical applications has proliferated across businesses of all sizes and sectors. With trends like big data analytics, cloud, mobile and IoT driving the modern digital enterprise, requirements for greater application performance and scalability combined with falling storage costs increasingly bias the playing field toward non-relational databases.

Is relational database (RDBMS) technology still the “reigning data champion”? Yes, but the gap has closed significantly. NoSQL’s more flexible data storage formats are helping to solve a range of problems that traditional relational databases struggle to address.

What are the top NoSQL use cases these days? Here are five of the most prevalent:

1. Real-time/Near Real-time Big Data Processing

The faster a company can process and act on fresh data, the greater its operational efficiency and business agility, and the greater the bottom-line value of its data. A typical approach to real-time big data processing uses stream processing to ingest new data combined with Apache Hadoop for analyzing historical data, plus a NoSQL database that integrates with both.

Payments processing leader PayPal is a prime example of a leading digital enterprise that processes big data on-the-fly and leverages it in multiple ways. PayPal captures vast quantities of raw clickstream data—more than 20 TB per day!—in multiple formats by processing it through Hadoop and Apache HBase NoSQL databases, and storing it all in the cloud for worldwide access by business analysts and data scientists. Fraud detection, data mining, customer segmentation and delivering personalized ads to customers are just some of the differentiating capabilities that PayPal has built on NoSQL.

2. Internet of Things

As of 2021, it’s estimated that about 46 billion IoT devices, from smartphones to home appliances to healthcare systems to factory sensors to smart vehicles, are now connected to the Internet. The amount of semi-structured data these devices continuously generate adds up to something like 847 zettabytes.

NoSQL databases are better suited than their relational counterparts to scale out to ingest this endless fire hose of diverse data. Freshub, a smart kitchen web platform for food shopping, is one example among many of an application that successfully processes data from huge numbers of IP-connected appliances. The Freshub solution maintains a MongoDB NoSQL cloud database of over 1 million grocery products gleaned from online catalogs in real-time. In this use case, NoSQL is well suited to integrate diverse and unpredictable data schemas from all these sources. NoSQL also scales out horizontally across an arbitrary number of cloud database nodes as Freshub grows its customer and data footprint.

3. Content Management

Online shopping now surpasses brick-and-mortar sales, and “content is king” across thousands of online marketplaces and web storefronts. Online sales leaders curate a selection of multimedia content (including user-generated and social media content like reviews, photos and videos) and deliver it to shoppers “at the moment of interaction.”

NoSQL document databases offer a flexible, open-ended data model that is ideal for storing a mix of structured, semi-structured and/or unstructured content. NoSQL also makes it possible to aggregate data that serves multiple business applications within a single catalog database. Whereas RDBMS with its fixed data models tend to result in the proliferation of multiple, overlapping catalogs for different purposes.

Forbes.com, which lives on viewership and ad revenue, exemplifies the use of NoSQL technology for content management. Forbes quickly built a custom content management system based on MongoDB in just a few months, giving them greater agility—including the ability to incorporate contributor content and analyze social sharing within clickstream data—at a lower cost. The same data store also feeds their mobile site, which now gets 50% of their total traffic.

4. Mobile Apps With Huge Numbers Of Users

Mobile phone and tablet use recently surpassed desktops as the top online platform for searching, shopping and otherwise viewing web content. Interestingly, as much of 90% of mobile data is served via apps and only 10% through browsers, an overwhelming shift in recent years.

Rapidly scaling mobile apps globally to serve tens of millions of users with acceptable performance (think mobile gaming or popular social media apps) often calls for distributed databases, which in turn calls for NoSQL. Flexible NoSQL data models also support rapid app update cycles better than relational data models in many cases.

For these reasons, more and more businesses looking to monetize web content are using NoSQL data stores for their apps. A popular case in point is The Weather Channel, whose MongoDB database instance handles millions of requests per minute while also processing user data and juggling weather updates from tens of thousands of worldwide locations.

5. Enriching The Digital Customer Experience

An engaging differentiating digital customer experience is built on data-intensive, time-critical capabilities like personalization, user profile management and a unified view of the customer across all your touch points. A lot of this demographic, behavioral and logistical data comes from the online clickstream, creating a write-intensive, multi-schema workload that taxes “scale-up” RDBMS infrastructure. A distributed NoSQL database can scale more cost-effectively, manage an ever-growing number of attributes with less administrative hassle, and often delivers lower latency—the Holy Grail of online interactions where you’re trying to get ads, recommendations, coupons, etc. in front of users in real-time.

Multimedia service provider Comcast uses a Couchbase NoSQL platform to deliver a positive customer support experience across multiple lines of business. A core goal of the platform is to capture data from huge numbers of omnichannel interactions (phone calls, online help, chatbots, etc.) and relate it all back to individual customers’ accounts and service status. Scalability and resilience are also critical concerns, as in any customer experience scenario. Especially because the better your solution works, the more customers will use it.

Other NoSQL Use Cases & Conclusion

Real-time updates and queries
Discussion thread hierarchy
Data caching and archiving
Simple data collection and analysis functions associated with voting and surveys
Cross over data analysis that cannot be conducted in relational environments
Online gaming where numerous simple queries need to run in fractions of a second

As these current NoSQL use cases illustrate, the strengths that I highlighted back in 2013, like flexible data models, low latency, ease of delivery/maintenance and the ability to integrate structured, semi-structured and unstructured elements, continue to make NoSQL a preferred choice for “digital transformation” across industries. The ongoing success of NoSQL innovators and early adopters like Netflix, Amazon, Twitter, Facebook and AOL continues to pave the way for new solutions.

Wondering if NoSQL technology is right for your application, how to architect a new NoSQL solution or how to move your current RDBMS to a NoSQL alternative? Contact Buda Consulting for a 15-minute “database discussion” to explore whether we can help.

MySQL Fabric: The Best of NoSQL and Relational Databases

by Robert Buda | Jul 12, 2014 | MySQL, NoSQL, Oracle DBA

Oracle Corp. is currently the world’s second-largest software vendor—and it isn’t going to let a little thing like unstructured data stand in its way. With the recent release of its MySQL Fabric technology, which is meant to meet the demands of cloud- and web-based applications, Oracle is positioning itself to dominate the big data landscape.

Most enterprise data is still stored in relational databases written in SQL. To handle diverse data types and increase the flexibility of database structures, database developers are increasingly employing newer, open source DBMSs, especially MySQL (which Oracle maintains) and more recently NoSQL.

MySQL is currently the world’s most popular open source database. An RDBMS-based SQL implementation designed to support web as well as embedded database applications, MySQL drives some of the world’s largest websites including Google, Facebook, Twitter and YouTube. It has proven to be easy-to-use, reliable and scalable.

Despite the promise it offers for big data and real-time web applications, NoSQL has yet to evolve to deliver enterprise-grade reporting and manageability. MySQL Fabric is designed to solve these problems by delivering the best of NoSQL and SQL/RDBMS.

The new MySQL Fabric open source framework seeks to combine the flexibility of NoSQL with the robust speed of RDBMS. It should also simplify the management and scaling of MySQL databases by making it easy to manage them in groups.

MySQL Fabric offers high availability through failure detection and failover, by automatically promoting a slave database to be the new master if the master database goes down. It also offers enhanced scalability through automated data sharding, a process of separating database tables into multiple sections. Sharding helps you manage MySQL databases that are too large (or frequently accessed) for a single server.

Other key features include:

Automatic routing of transactions to the current master database, combined with load balancing of queries across slave databases
Extensions to PHP, Python and Java connectors to route transactions and queries directly to the correct MySQL server, eliminating the latency associated with passing through a proxy

By enabling multiple copies of a MySQL database to work together, MySQL Fabric will make it easier to perform live backups and scale MySQL databases across multiple servers. This, in turn, will make it easier to safely “scale out” MySQL applications in both on-premise and cloud implementations.

The new framework will support the growing use of MySQL for high-traffic, business-critical web applications. MySQL Fabric also positions Oracle strongly against NoSQL databases like MongoDB and MySQL add-on providers like Percona. Prior to the release of MySQL Fabric, DBAs had to write code or buy third-party software to create a MySQL server cluster.

You can download the new framework as part of the MySQL Utilities 1.4.3 package at: http://dev.mysql.com/downloads/fabric/

Note that Oracle also offers the MySQL Cluster version of MySQL, which offers some advantages over MySQL Fabric, such as faster failover times and a two-phase commit to ensure that each transaction is fully recognized.

Contact Buda Consulting to talk over how these technologies can help maximize the performance and reliability of your critical, customer-facing applications.