Beyond “Fast and Simple”: Top 5 Use Cases for NoSQL Database Technology

Beyond “Fast and Simple”: Top 5 Use Cases for NoSQL Database Technology

Since I first blogged about NoSQL back in 2013, the use of “Not Only SQL” databases for mission-critical applications has proliferated across businesses of all sizes and sectors. With trends like big data analytics, cloud, mobile and IoT driving the modern digital enterprise, requirements for greater application performance and scalability combined with falling storage costs increasingly bias the playing field toward non-relational databases.

Is relational database (RDBMS) technology still the “reigning data champion”? Yes, but the gap has closed significantly. NoSQL’s more flexible data storage formats are helping to solve a range of problems that traditional relational databases struggle to address. 

What are the top NoSQL use cases these days? Here are five of the most prevalent:

1. Real-time/Near Real-time Big Data Processing

The faster a company can process and act on fresh data, the greater its operational efficiency and business agility, and the greater the bottom-line value of its data. A typical approach to real-time big data processing uses stream processing to ingest new data combined with Apache Hadoop for analyzing historical data, plus a NoSQL database that integrates with both. 

Payments processing leader PayPal is a prime example of a leading digital enterprise that processes big data on-the-fly and leverages it in multiple ways. PayPal captures vast quantities of raw clickstream data—more than 20 TB per day!—in multiple formats by processing it through Hadoop and Apache HBase NoSQL databases, and storing it all in the cloud for worldwide access by business analysts and data scientists. Fraud detection, data mining, customer segmentation and delivering personalized ads to customers are just some of the differentiating capabilities that PayPal has built on NoSQL.

2. Internet of Things

As of 2021, it’s estimated that about 46 billion IoT devices, from smartphones to home appliances to healthcare systems to factory sensors to smart vehicles, are now connected to the Internet. The amount of semi-structured data these devices continuously generate adds up to something like 847 zettabytes

NoSQL databases are better suited than their relational counterparts to scale out to ingest this endless fire hose of diverse data. Freshub, a smart kitchen web platform for food shopping, is one example among many of an application that successfully processes data from huge numbers of IP-connected appliances. The Freshub solution maintains a MongoDB NoSQL cloud database of over 1 million grocery products gleaned from online catalogs in real-time. In this use case, NoSQL is well suited to integrate diverse and unpredictable data schemas from all these sources. NoSQL also scales out horizontally across an arbitrary number of cloud database nodes as Freshub grows its customer and data footprint.

3. Content Management

Online shopping now surpasses brick-and-mortar sales, and “content is king” across thousands of online marketplaces and web storefronts. Online sales leaders curate a selection of multimedia content (including user-generated and social media content like reviews, photos and videos) and deliver it to shoppers “at the moment of interaction.” 

NoSQL document databases offer a flexible, open-ended data model that is ideal for storing a mix of structured, semi-structured and/or unstructured content. NoSQL also makes it possible to aggregate data that serves multiple business applications within a single catalog database. Whereas RDBMS with its fixed data models tend to result in the proliferation of multiple, overlapping catalogs for different purposes.  

Forbes.com, which lives on viewership and ad revenue, exemplifies the use of NoSQL technology for content management. Forbes quickly built a custom content management system based on MongoDB in just a few months, giving them greater agility—including the ability to incorporate contributor content and analyze social sharing within clickstream data—at a lower cost. The same data store also feeds their mobile site, which now gets 50% of their total traffic.

4. Mobile Apps With Huge Numbers Of Users

Mobile phone and tablet use recently surpassed desktops as the top online platform for searching, shopping and otherwise viewing web content. Interestingly, as much of 90% of mobile data is served via apps and only 10% through browsers, an overwhelming shift in recent years.

Rapidly scaling mobile apps globally to serve tens of millions of users with acceptable performance (think mobile gaming or popular social media apps) often calls for distributed databases, which in turn calls for NoSQL. Flexible NoSQL data models also support rapid app update cycles better than relational data models in many cases. 

For these reasons, more and more businesses looking to monetize web content are using NoSQL data stores for their apps. A popular case in point is The Weather Channel, whose MongoDB database instance handles millions of requests per minute while also processing user data and juggling weather updates from tens of thousands of worldwide locations.

5. Enriching The Digital Customer Experience

An engaging differentiating digital customer experience is built on data-intensive, time-critical capabilities like personalization, user profile management and a unified view of the customer across all your touch points. A lot of this demographic, behavioral and logistical data comes from the online clickstream, creating a write-intensive, multi-schema workload that taxes “scale-up” RDBMS infrastructure. A distributed NoSQL database can scale more cost-effectively, manage an ever-growing number of attributes with less administrative hassle, and often delivers lower latency—the Holy Grail of online interactions where you’re trying to get ads, recommendations, coupons, etc. in front of users in real-time.  

Multimedia service provider Comcast uses a Couchbase NoSQL platform to deliver a positive customer support experience across multiple lines of business. A core goal of the platform is to capture data from huge numbers of omnichannel interactions (phone calls, online help, chatbots, etc.) and relate it all back to individual customers’ accounts and service status. Scalability and resilience are also critical concerns, as in any customer experience scenario. Especially because the better your solution works, the more customers will use it.

Other NoSQL Use Cases & Conclusion

  • Real-time updates and queries
  • Discussion thread hierarchy
  • Data caching and archiving
  • Simple data collection and analysis functions associated with voting and surveys
  • Cross over data analysis that cannot be conducted in relational environments
  • Online gaming where numerous simple queries need to run in fractions of a second

As these current NoSQL use cases illustrate, the strengths that I highlighted back in 2013, like flexible data models, low latency, ease of delivery/maintenance and the ability to integrate structured, semi-structured and unstructured elements, continue to make NoSQL a preferred choice for “digital transformation” across industries. The ongoing success of NoSQL innovators and early adopters like Netflix, Amazon, Twitter, Facebook and AOL continues to pave the way for new solutions.

Wondering if NoSQL technology is right for your application, how to architect a new NoSQL solution or how to move your current RDBMS to a NoSQL alternative? Contact Buda Consulting for a 15-minute “database discussion” to explore whether we can help. 

Relational Database Design: It’s All About The Set

Relational Database Design: It’s All About The Set

The Lost Science Of Relational Algebra And Set Theory

I originally wrote this post in 2011. Much has changed in the database technology landscape since then. Big Data Technologies such as Hadoop have gone mainstream, cloud technology and is changing how and where we think about hosting our databases.

But relational databases are still relied upon as the best option for rich transactional data.

So, since this technology is still the foundation of our mission critical systems, we should understand how to take advantage of one of the foundational elements of relational technology: The Set.

The SQL language (Structured Query Language) was built upon relational algebra. This rigorous approach to query definition is largely about set theory. This post is not a detailed technical discussion of relational algebra or set theory, instead it is about the way that relational databases are often misused.

The purpose of this article is to discuss the central theme of relational database technology and one of its greatest strengths. One that is often overlooked by those practicing Oracle Database Design or SQL Server Database Design and Database Development. I am talking about Set Theory. Relational Databases like Oracle and SQL Server are built and optimized to process sets of rows, as opposed to individual rows. Many application developers, even those that use these relational tools, struggle to think beyond the individual row. That is why the major relational database vendors have created very powerful procedural languages such as PL/SQL and T/SQL.

In many cases, developers use these tools to step row by row through a dataset (by using cursors) because they may not understand how the set operators work. This approach leads to unnecessary development and higher maintenance costs, as well a poor performance.

There are definitely times when a procedural process is necessary. But often times there are set-based alternatives that would be more efficient and much easier to develop.

In this post, I will focus on three core set operators: Union, Intersect, and Difference.

First some definitions:

Given two sets, Set A and Set B

Union:  All records from set A and all records from Set B.  If a record exists in both sets, it will only appear once in the Union. (Areas A, B, and C in figure 1).

Intersection: The unique set of records that exist in both set A and set B (Area C in figure 1).

Difference: The difference between Set A and Set B are all the records in Set A except those that also exist in Set B. (Area A in figure 1).

Vendor Differences

Relational databases implement these operators in different ways, but they all provide a relatively simple way to combine and compare similar sets of data. Oracle has the Union, Intersect, and Minus operators. SQL Server has Union, Intersect, and Except operators.

MySql has the ability to perform these operations as well, but it is more complex. For example, in order to do a difference operation, you must use a not exists or not in operator, resulting in a more complex sql statement.

Example

Lets examine how Oracle implements each of these set operations with a simple example.

This post is intended to discuss the concepts so I did not include the data and the actual query results in the post. But you can download the script to create and populate the tables with test data and run the queries here: set_tables_sql

Suppose you collect bank account events (debits, credits) from multiple sources. You place them into one common table, but also maintain the original source records in separate tables for historical purposes.  The original source records never change, but the events in the common table can be modified as necessary by the users.

Now suppose that occasionally you need to compare the transactional data in the common table to the original source data to see which rows have been changed. This is very easy using set operators.

The tables that we will use for this example follow. I used different column names in each table to illustrate that the column names do not need to be the same in each set that you are comparing. However, the number of columns in each query and the data types in each query must be the same.

Table Definitions

CREATE TABLE Event
(
Event_Id NUMBER,
Event_Name VARCHAR2(30),
Event_Description VARCHAR2(255),
Data_Source_location VARCHAR2(30),
Event_Date DATE
);

CREATE TABLE Event_Source_1
(
Event_Id_Orig NUMBER,
Event_Name_Orig VARCHAR2(30),
Event_Description_Orig VARCHAR2(255),
Data_Source_location_Orig VARCHAR2(30),
Event_Date_Orig DATE
);

CREATE TABLE Event_Source_2
(
Event_Id_Orig NUMBER,
Event_Name_Orig VARCHAR2(30),
Event_Description_Orig VARCHAR2(255),
Data_Source_location_Orig VARCHAR2(30),
Event_Date_Orig DATE
);

Example 1 — Union: Now suppose you needed to display all event names that appear in Event Source 1 and Event Source 2. The Union operator will display records from both tables, but records appearing in both tables will only appear once (unless the union all operator is specified, in which case duplicates will be displayed).

SELECT Event_Name_Orig FROM Event_Source_1
UNION
SELECT Event_Name_Orig FROM Event_Source_2;

Example 2 — Intersection: Now suppose you needed to display only events from Source 1 that have remained unchanged in the Event table. This can be done with an intersection between Event and Event_Source_1.

SELECT Event_Name,Event_Description,Data_Source_Location FROM Event
INTERSECT
SELECT Event_Name_Orig,Event_Description_Orig,Data_Source_Location_Orig FROM Event_Source_1;

Example 3  —  Difference: Now suppose you want to know all Data Source Locations that appear in the original Data Source 2 data but not in the original Data Source 1 data. This can be done by using the difference operation, implemented with the Minus operator by Oracle. This will take all the records from one set and subtract those that also exist in another set.

SELECT Event_Name_Orig,Event_Description_Orig,Data_Source_Location_Orig FROM Event_Source_1
MINUS
SELECT Event_Name_Orig,Event_Description_Orig,Data_Source_Location_Orig FROM Event_Source_2

Database Design Considerations

These powerful operators can be used to reduce or eliminate the need for cursors in many cases. The usefulness of these operators is dependent on sound database design and a well-normalized table structure. For example, a table that has repeating columns designating the same data element (as opposed to using multiple rows) will render these operators much less useful.

Conclusion

With careful database design and a good understanding of the Set management tools provided by the relational vendors, we can simplify and speed development and reduce maintenance costs. Lets think in terms in sets and get the most out of our relational database investment!

If you would like to discuss set theory or relational database design, please give me a call at (888) 809-4803 x 700 and if you have further thoughts on the topic, please add comments!

If you enjoyed this article please like and share!

Advantages, Disadvantages, & Risks Of Virtualization For Your Database — And How To Get It Right!

Advantages, Disadvantages, & Risks Of Virtualization For Your Database — And How To Get It Right!

Types Of Virtualization

There are many types of virtualization. Storage, network, server, database. For the purposes of this article I will discuss server virtualization, but with a special focus on servers that will house databases.  

Server virtualization essentially refers to abstracting the services that make up a computer server from the underlying hardware resources. Database virtualization on the other hand refers to abstracting the services that make up a database system from the servers that provide those services. It is essentially another layer of abstraction.

I am choosing to write about server virtualization in this article because it is has been more widely adopted so far than database virtualization and is implemented in a database agnostic way. In other words, when you virtualize your database servers, the advantages and disadvantages will apply to any database that you are using on that server. 

Advantages Of Virtualization

There are many advantages to server virtualization.  I will discuss two key advantages here.

Rapid Provisioning

Probably the greatest advantage of server virtualization is rapid provisioning. Virtualization platforms like VMWare enable us to build new servers in seconds based on existing servers or server templates. This is a major improvement over needing to configure servers individually in the past.  This saves time, money, and perhaps most importantly, improves consistency and can be used to enforce policies if administered and controlled properly.  By narrowly configuring servers to handle one database or a collection of related databases, we can extend the benefits of rapid provisioning to the database, facilitating rapid creation or refresh of test, dev, or qa database environments, for example. 

Resource Utilization

Another advantage of virtualization is greater resource utilization. Servers on many virtualization platforms can be configured to use resources such as memory on an as- needed basis. This minimizes the amount of resources that need to be maintained for burst times, assuming that all servers do not burst at the same time. Of course, taking advantage of this capability requires careful planning and an understanding of the resource usage profiles of your servers. 

Disadvantages/Risks Of Virtualization

While there are many advantages of virtualization, there are also key disadvantages, which come mostly in the form of risk.  These disadvantages are not inherent problems with virtualization. Instead, they can be the result of a lack of strict planning and management of a virtual environment. 

Management and Accountability

Rapid and simple provisioning comes with a cost. The ease and speed of spinning up new servers tends to promote server and database sprawl, causing management and accountability problems. When virtualizing, strict policies and procedures must be implemented and enforced to avoid future problems, especially in environments with multiple system managers.

Hardware Cost

In addition to management and accountability problems, actual costs can spin out of control. In a cloud environment like AWS (one type of virtualization), cloud provider costs that seem small on a server-by-server basis, quickly add up as server sprawl kicks in. Similarly with in-house virtualization infrastructure, easily created servers eventually overwhelm the resources in the system and more hardware must be purchased, often with difficulty tracking those costs to specific projects or departments. 

Quality

Database and System management involves a number of skills;  there are the hard technical skills like knowing what command commands to execute in order to download and install a Linux distribution or an Oracle Patch.  And for every one of those hard skills, there are a hundred soft skills, like knowing what downstream impact a Linux patch may have, what the likely security implications are of granting access to folder required by a piece of software that needs to be installed, or knowing how to determine the most efficient way to configure resources for Oracle.  A huge risk in a virtualized environment is that the ease and speed of provisioning may give the false impression that the need for highly skilled system and database managers has diminished. On the contrary, I think that the ease and speed of provisioning increases the need for those skill sets, because the potential to propagate a poor configuration throughout the system is much greater in a virtualized environment.  And fixing 10 servers later is much more expensive than provisioning the first server properly in the beginning.

Licensing

Major database vendors price their software based on the underlying resources on the machine that it is running on. In a virtual environment, we can assign a certain amount of computer power to a database server, and that can be a small fraction of the total computer power of the virtualization cluster. But the vendors don’t see it that way, Oracle for example, bases the cost on the total CPU power across the whole cluster regardless of how much power we assign to a given server. This is true unless we use Oracle’s virtualization platform, where it honors the resource partitioning of the virtualization platform. Misunderstanding about this licensing model has caused many companies to be unexpectedly charged very large back licensing fees.  Note that a potential solution to this may be to create a separate virtualization cluster for the database environment but this limits some of the advantages described above.

Security

Security is always a concern when provisioning a server or database. There are many configuration settings, folder access rights restrictions, OS and database users that need to be deactivated, removed, or restricted. The rapid cloning and perceived lower skill requirement for provisioning new servers can take a small security problem and rapidly propagate it throughout the environment. So while there may not be new security vulnerabilities introduced simply because we now operate in a virtual environment,  as with all of the other disadvantages and risk that I mentioned, the risks are magnified in a rapid provisioning environment. 

How To Get It Right?

I spoke to two experts who are responsible for virtualization platforms for their organizations or for client organizations that run mission critical applications. I wanted to find out the keys to success in building and maintaining a solid virtualization platform. Here are some of the takeaways.

I first spoke to the CIO of a financial institution that runs their entire shop on virtualized servers. He said that in-house server virtualization is a mature technology and risks are low for an organization with a relatively stable application mix and resource load, and with a small system management staff. He also feels that compromising like throwing a whole blade server at a specific application, is somethings worth it to limit the risk of resource contention, even though it may reduce the benefit of efficient resource utilization.

I also spoke with Rocco Guerriero, CEO of Contour Data Solutions. Rocco says that having the right policies and procedures in place is the key to ensuring a trouble free environment. For example, he points out that if you are implementing a mission critical database server with a standby database, it is important to ensure that the virtual server holding the primary database will never be migrated to the same physical server that holds the standby. This can be done using rules that can be specified using the virtualization tools.

Rocco also cautions that a good rule of thumb is to have enough resources in each cluster so that you don’t exceeding 50% usage during typical load.  This ensures that servers can acquire the resources they need when demand spikes.

And finally, he recommends carefully assigning priorities to virtual machines if you need to ensure that certain servers always get the resources they need even at the expense of others. 

Conclusion

Server Virtualization can be an effective way to reduce costs and speed up provisioning of hardware and software for our IT projects. But we must mitigate the risks from the beginning. Here are a few steps that I believe should be part the management plan for any virtualized environment.

  • Establish policies and procedures that must be followed for all servers to be provisioned. Based on the conversations that I had with these experts, and on my experience administering databases in virtual environments, I think policies should be in place that require the following;   Evaluation and implementation of necessary virtualization rules as described above, change control, security review, database license review, before and after cluster resource capacity review, and a record of each server stating what application(s) it is to be used for, resource cost estimate, provenance tracking of the server image, backup and restore requirements, and other data that will assist with management.
  • Ensure that only highly skilled system and database administrators are responsible for provisioning and configuring new servers and databases. Resist the temptation to enable regular users or developers who do not have system management experience to do this.

At the end of the day the thing to remember is that provisioning servers is very easy, and very easy to get it wrong.

4 Keys to Avoiding the Number 1 Cause of Database Project Failure

We all know that database projects and other technical/IT projects often fail. They are never completed, the results fall far short of expectations, nobody uses the new application, and so on.

Why? At the end of the day, if we look beneath the surface-level issues, the main reason for database project failure — by far — is poor communication. 

Case in point: If a project fails because of technical errors or deficiency, It’s either because the technical resources did not have the right skill set, or the requirements that they were working from were incorrect or incomplete.

If it’s the former, then there was a breakdown in communication between the resources and the project manager regarding the set of abilities that the resources have, or there was a breakdown between the project manager and the business analyst regarding what skill sets were needed for the project. If it’s the latter then there was a breakdown in communication between the business analyst and the project manager regarding what the overall system requirements were.

Another typical project failure involves missing deadlines. Typical causes of missing deadlines include resources not being available when needed, or the infrastructure not being ready when it was needed, or the business users not being ready when needed for testing or migration activities. 

Again, in all of these cases the root cause is communication. If one of the parties is not ready when they need to be, it is either because they didn’t know when they would be needed, or they incorrectly stated their availability. If the infrastructure is not available when it is needed, then either the requirements or the deadline for the infrastructure were not properly communicated to the infrastructure team, or the infrastructure team miscommunicated their ability to get the work done in time.

If you look deeper and break down the presenting problems, in almost all cases the root cause of project failures is communication. Often the communication failures occur in the very beginning of the project, during the scoping and estimate or quotation process.

Here are 4 key approaches that I use to mitigate the significant risks to project success caused by poor communication:

  1. When asking someone for a decision on an important point, I always ask twice. If the two answers differ, I ask a third time. And I continue that process until the answers become consistent. If I receive the two different answers from two different critical stakeholders, I will find a reason to send a joint email or have a conversation with both present, and I will re-ask the question in hopes of gaining consensus. (Political sensitivity and tact is critical here… Perhaps that’s the subject of another blog post…)
  2. When nailing down an important decision, I follow up in writing to validate and underscore everyone’s understanding, especially for something for which I have received two different answers over time.
  3. I treat decisions differently than statements of fact. If I ask a client, “Do your customers connect directly to your database?”, this is a statement of fact. There is a right and wrong answer to this question, and it can be validated independently. However, if I ask the customer, “How many customers do you want the database to support in five years?”, this is a decision or a target. There is no right or wrong answer. This cannot be validated except by the same individual (assuming they are the decision-maker).

    I treat statements of fact very differently from decisions/targets:

    • I validate a statement of fact in a variety of ways. I might look at the user accounts on the existing system, or I might ask someone else in the organization, or I might look at the application for clues. 
    • For decisions or targets, validation can be more difficult. As mentioned above, I ask at least twice for any decision that can impact the scope of the project. If the answers differ, or if I feel like the answer is not solid and may change (based on my client’s tone of voice, hesitation, inconsistencies with other statements or requests, or other factors), I will ask again until I am satisfied that the answer is solid.
  4. For all important points that can impact the project time or cost estimate, or the database design or implementation, I always validate them in one fashion or another before we act on them. And if I can’t validate them for some reason, I call them out separately as an assumption in the estimate or quote in order to bring it to the client’s attention and to the team’s attention, and then I mention it directly when reviewing the document with them.

To sum up: as you might expect, the antidote to poor communication is good communication. Especially going into a project, keep the above in mind. Get clarity and validate what you’re hearing. This will make you look good, your customers and technical team members will appreciate it, and your projects are much more likely to succeed.

To get optimum value and results from your database project investments, contact Buda Consulting.

Database Patch News — December 2020 (Issue 5)

Database Patch News — December 2020 (Issue 5)

Welcome to Database Patch News, Buda Consulting’s newsletter of current patch information for Oracle and Microsoft SQL Server. Here you’ll find information recently made available on patches—including security patches—and desupported versions.

Why should you care about patching vulnerabilities and bugs? Two big reasons:

  1. Unpatched systems are a top cyber attack target. Patch releases literally advertise vulnerabilities to the hacker community. The longer you wait to patch, the greater your security risk. 
  2. Along with running a supported database version, applying the latest patches ensures that you can get support from the vendor in case of an issue. Patching also helps eliminate downtime and lost productivity associated with bugs. 

Here are the latest patch updates for Oracle and SQL Server:

Oracle Patches:

October 20, 2020 Quarterly Patch Updates:

19c – Release Update 19.9 is available (31771877 & 31668882)

18c – Release Update 18.12 is available (31730250 & 31668892)

12cR2 – Release Update 201020 is available (31741641 & 31668898)

Regular support ends in Mar 2023 and extended support ends in Mar 2026.

12cR1 – Release Update 201020 is available (31550110 & 31668915)

Regular support ended in July 2019 and extended support ends in July 2021.

11gR4 – Patch Set Update 201020 is available (31720776)

Regular support ended in October 2018 and extended support ends in December 2020.

 

SQL Server Patches:

SQL Server 2019

Cumulative update 8 (Latest build) Released Oct 1, 2020

Mainstream support ends Jan 7, 2025

Extended support ends Jan 8, 2030

 

SQL Server 2017

Cumulative update 22 (Latest build) Released Sept 10, 2020

Mainstream support ends Oct 11, 2022

Extended support ends Oct 12, 2027

 

SQL Server 2016 Service Pack 2

Cumulative update 15 Release date: Sept 28, 2020

Mainstream support ends Jul 13, 2021

Extended support ends Jul 14, 2026

 

SQL Server 2014 Service Pack 3

Cumulative update 4 Release date: Feb 11, 2019

Mainstream support ended Jul 9, 2019

Extended support ends Jul 9, 2024

 

SQL Server 2012 Service Pack 4

Release date: Oct 5, 2017

Mainstream support ended Jul 11, 2017

Extended support ends Jul 12, 2022


Note: All other SQL Server versions not mentioned are no longer supported.