Cloud Database Wars: Google Spanner vs. Microsoft CosmosDB
One of the reasons cloud computing is such a powerful force in the industry today is the innovation the providers are delivering. AWS is famous for the staggering pace at which new features and services are released (see Figure 1)."
Figure 1: AWS yearly feature improvements
Google recently delivered Spanner, a remarkably innovative SQL database service that provides global consistency, leveraging GPS and atomic clocks.
Not to be left out, Microsoft responded with CosmosDB, a database service that, while quite different from Spanner, is tremendously innovative in its own way. I regard CosmosDB as a powerful storage service that offers tremendous scale and flexibility.
CosmosDB’s powerful service can be more difficult to comprehend due to its unique capabilities. In this regard, it is different than Spanner; which benefits from the fact that relational databases are well-understood functionally. Let’s break down the key differences between the two database services to fully understand the benefits and attributes of each.
The Difference Between Spanner and CosmosDB
It’s relatively easy to understand the unique aspects of Spanner—how it extends relational database technology in ways that are noteworthy, and addresses shortcomings that bedevil application developers. In other words, Spanner is like what developers have always used, only much better.
CosmosDB, on the other hand, offers highly flexible use cases and provides multiple options for data state access. Both of these are important, but they’re different than what has been available in the past, so it’s necessary to understand the functionality before one can recognize what makes CosmosDB so innovative.
Let’s start by discussing the CosmosDB architecture. Microsoft describes this as structured around storage containers (Note: this does not appear to have anything to do with execution containers like Docker, but rather implies a pool of storage not bound to specific servers or storage devices, i.e., a virtual storage construct).
“An Azure Cosmos DB container is a schema-agnostic container of arbitrary user-generated entities and stored procedures, triggers and user-defined-functions (UDFs),” according to Microsoft.
In other words, a CosmosDB is a container of data that is schema-agnostic that can be operated on in a variety of ways. Here is a figure that depicts the CosmosDB architecture:
(Image Courtesy Microsoft)
The key term here is “projections.” Internally, CosmosDB lumps all the data into a container in which Microsoft automagically tracks the individual data attributes and their relationship, but—and here is where its innovation shows—the data can be projected as key-value, document, or graph databases, each of which can be accessed by a use case-specific API.
In other words, you can use CosmosDB as any of these types of databases, depending on what your application is best served by, but under the covers, it’s all one melange of data. And, by the way, if you use CosmosDB as a document database, Azure provides SQL capabilities, including triggers and stored procedures. Again, this is extremely innovative and very useful.
There is more to CosmosDB than clever storage projected through a variety of use case-specific APIs.
Naturally, CosmosDB can mirror data across the world, to allow for local low-latency access. That raises the issue of how quickly changes to the data (or the schema that describes the data) can be propagated.
Just as Spanner leverages Google’s globe-spanning fiber network to reduce latency, so too does CosmosDB. When users make a schema change in one location, that change is propagated to every other Azure location that is set to provide database access via a mirrored version. And the schema change is fast—on the order of milliseconds, which means that applications are never far out of date in terms of the structure of data they can work with.
(NOTE: Microsoft states that the service is schemaless and that each piece of data is indexed, but the point is that when you add a field to a key-value database in one location, every copy of that database knows about and can work with that field very quickly. I call that schema propagation).
A second issue with latency addresses individual items of data; in other words, if an application located in Dallas changes the data it works with at one point in time, how soon will a mirrored version of that data located in, say, Mumbai see that changed data? As you’ll recall, Spanner addresses this with a very clever two-phase commit protocol powered by atomic clocks to ensure consistency.
CosmosDB approaches it somewhat differently, by offering five different data consistency choices, ranging from strong to eventual.
So, in our example, the application could be set to request strong consistency, and that would ensure that when a query is set against that data—no matter which storage mirror the application user happens to run against—will be consistent throughout the use of the data. Any changes to the data that would suffice for the terms of the query would be excluded from the query access.
This ensures that no inconsistent data can sneak in during a time-bounded use of the data. So if someone wants to see the current balance of a bank account and the application is set to strong consistency, transactions that flow in subsequent to the balance query would be excluded in the returned data. Obviously, consistent data is important for many transactional applications.
On the other hand, if an application displays vacation photos and the collection of photos is being updated while a user is viewing the collection, a photo being added to the display is probably not a big deal.
The Bottom Line
CosmosDB is a remarkably innovative offering for myriad reasons, including:
- The use of data containers that can be projected as different types of databases is unique in my experience.
- Its provision of an SQL interface to document projections, including triggers and stored procedures will bring use of CosmosDB into the skill capabilities of a large percentage of the world’s technical staff. In other words, despite the impressive technical underpinnings of the service, its use is not limited to super-programmers.
- The flexible consistency model makes CosmosDB a good fit for a wide variety of application use cases ranging from the most stringent to the least restrictive data requirements.
From a larger perspective, CosmosDB is a perfect illustration of how the big cloud providers—AWS, Azure, and Google, (AAG)—are changing the very nature of the IT industry.
No end user, no matter how large, could ever hope to implement a storage system like CosmosDB. Its scale and operation would be beyond the talent pool and budget of any individual IT organization.
This is why savvy IT organizations are shifting their investment from infrastructure to applications and focusing their efforts on leveraging the innovative services emerging from the AAG cloud providers.
About the On-Demand Webinar
About the Webinar