One of the reasons cloud computing is such a powerful force in the industry today is the innovation the providers are delivering. AWS is famous for the staggering pace at which new features and services are released (see Figure 1)."
Figure 1: AWS yearly feature improvements
Google recently delivered Spanner, a remarkably innovative SQL database service that provides global consistency, leveraging GPS and atomic clocks.
Not to be left out, Microsoft responded with CosmosDB, a database service that, while quite different from Spanner, is tremendously innovative in its own way. I regard CosmosDB as a powerful storage service that offers tremendous scale and flexibility.
CosmosDB’s powerful service can be more difficult to comprehend due to its unique capabilities. In this regard, it is different than Spanner; which benefits from the fact that relational databases are well-understood functionally. Let’s break down the key differences between the two database services to fully understand the benefits and attributes of each.
|Learn about the AWS architectural principles and services like IAM, VPC, EC2, EBS and more with the AWS Solutions Architect Course. Register today!|
The Difference Between Spanner and CosmosDB
It’s relatively easy to understand the unique aspects of Spanner - how it extends relational database technology in ways that are noteworthy, and addresses shortcomings that bedevil application developers. In other words, Spanner is like what developers have always used, only much better.
CosmosDB, on the other hand, offers highly flexible use cases and provides multiple options for data state access. Both of these are important, but they’re different than what has been available in the past, so it’s necessary to understand the functionality before one can recognize what makes CosmosDB so innovative.
Let’s start by discussing CosmosDB architecture. Microsoft describes this as structured around storage containers (Note: this does not appear to have anything to do with execution containers like Docker, but rather implies a pool of storage not bound to specific servers or storage devices, i.e., a virtual storage construct).
“An Azure CosmosDB container is a schema-agnostic container of arbitrary user-generated entities and stored procedures, triggers and user-defined-functions (UDFs),” according to Microsoft.
In other words, a CosmosDB is a container of data that is schema-agnostic that can be operated on in a variety of ways. Here is a figure that depicts CosmosDB architecture:
(Image Courtesy Microsoft)
The key term here is “projections.” Internally, CosmosDB lumps all the data into a container in which Microsoft automagically tracks the individual data attributes and their relationship, but - and here is where its innovation shows—the data can be projected as key-value, document, or graph databases, each of which can be accessed by a use case-specific API.
In other words, you can use CosmosDB as any of these types of databases, depending on what your application is best served by, but under the covers, it’s all one melange of data. And, by the way, if you use CosmosDB as a document database, Azure provides SQL capabilities, including triggers and stored procedures. Again, this is extremely innovative and very useful.
Looking forward to enhancing your cloud computing skills? Enroll in our cloud computing certification course today and take your career to new heights.
There is more to CosmosDB than clever storage projected through a variety of use case-specific APIs.
Naturally, CosmosDB can mirror data across the world, to allow for local low-latency access. That raises the issue of how quickly changes to the data (or the schema that describes the data) can be propagated.
Just as Spanner leverages Google’s globe-spanning fiber network to reduce latency, so too does CosmosDB. When users make a schema change in one location, that change is propagated to every other Azure location that is set to provide database access via a mirrored version. And the schema change is fast—on the order of milliseconds, which means that applications are never far out of date in terms of the structure of data they can work with.
(NOTE: Microsoft states that the service is schemaless and that each piece of data is indexed, but the point is that when you add a field to a key-value database in one location, every copy of that database knows about and can work with that field very quickly. I call that schema propagation).
A second issue with latency addresses individual items of data; in other words, if an application located in Dallas changes the data it works with at one point in time, how soon will a mirrored version of that data located in, say, Mumbai see that changed data? As you’ll recall, Spanner addresses this with a very clever two-phase commit protocol powered by atomic clocks to ensure consistency.
CosmosDB approaches it somewhat differently, by offering five different data consistency choices, ranging from strong to eventual.
So, in our example, the application could be set to request strong consistency, and that would ensure that when a query is set against that data—no matter which storage mirror the application user happens to run against—will be consistent throughout the use of the data. Any changes to the data that would suffice for the terms of the query would be excluded from the query access.
This ensures that no inconsistent data can sneak in during a time-bounded use of the data. So if someone wants to see the current balance of a bank account and the application is set to strong consistency, transactions that flow in subsequent to the balance query would be excluded in the returned data. Obviously, consistent data is important for many transactional applications.
On the other hand, if an application displays vacation photos and the collection of photos is being updated while a user is viewing the collection, a photo is added to the display is probably not a big deal.
The Bottom Line
CosmosDB is a remarkably innovative offering for myriad reasons, including:
- The use of data containers that can be projected as different types of databases is unique in my experience.
- Its provision of an SQL interface to document projections, including triggers and stored procedures will bring the use of CosmosDB into the skill capabilities of a large percentage of the world’s technical staff. In other words, despite the impressive technical underpinnings of the service, its use is not limited to super-programmers.
- The flexible consistency model makes CosmosDB a good fit for a wide variety of application use cases ranging from the most stringent to the least restrictive data requirements.
No end-user, no matter how large, could ever hope to implement a storage system like CosmosDB. Its scale and operation would be beyond the talent pool and budget of any individual IT organization.
This is why savvy IT organizations are shifting their investment from infrastructure to applications and focusing their efforts on leveraging the innovative services emerging from the AAG cloud providers.
Planning for a career in Cloud Computing? Here are the Popular Cloud Computing Certification Courses
Check out our course on AWS Tutorial For Beginners | AWS Introduction | What is AWS?