Why Google’s New Spanner May Be the Database of Your Future
Google gets a bad rap in the cloud computing world. For a long time its commitment to the market seemed uncertain, although that has changed dramatically since the arrival of Diane Greene.
But even when people questioned Google’s commitment to cloud computing, everyone recognized that the company is a technology powerhouse.
Building services like Search, AdWords, YouTube, and the like requires world-class technical chops. Because they’re accessed through a browser, it’s easy to overlook just how much smart software hangs off the back of the connection between a Google user and a service. But trust me; there is unbelievable functionality sitting inside Google’s data centers.
The company just announced a cloud offering that reflects Google’s heritage of computing innovation: Spanner. Spanner indicates that Google is much more than a late-arriving copycat -- in fact, Spanner just might move Google ahead of competition in storage.
What is Spanner? Simply stated, it is a globally consistent SQL database. Sounds simple, right? Make no mistake, though. Spanner is an amazing offering that hides a ton of technology behind a standard SQL interface.
Why is Spanner such a big deal? To understand that, let’s look at the biggest challenges organizations face in building widely distributed applications.
The key challenge of these type of applications has always been keeping data consistent across broad geographies. Anytime someone wants to access a given piece of data, it’s accuracy comes into question. Did someone update it in Argentina, but that change hasn’t gotten to Germany where someone else is accessing it?
In practice, this makes it very difficult to build globally distributed systems, especially systems in which the integrity of data is critical, e.g., financial transaction systems. But Spanner holds the potential to allow users to solve a class of problems for which there has previously been no satisfactory solution.
Google accomplishes this by using GPS and keeping atomic clocks in each of its data centers as described in this Wired article. (If you want the nitty-gritty on Spanner, here is a link to the Google white paper describing its inner workings; be forewarned, it’s not for the faint of heart). Spanner timestamps updates and uses those timestamps to evaluate whether a given location’s data is up to date when it’s accessed.
Behind the scene the data update process relies on a two-phase procedure:
When a data element is updated somewhere, Spanner sends a notification of the update to every location where that data is stored, marking it as updated and showing the Spanner timestamp and the location of the updated data. If a request is received for the data from a location other than where the updated version resides, Spanner does not serve it up from that location, but retrieves it from the location where the most recently updated version resides.
The actual changed data then flows across the Spanner network, updating each location as it is received (which would vary according to the distance from the original changed data location). When a given location receives and updates the local copy of the data, it is marked as up-to-date, and any subsequent nearby requests for the data is served up from that location.
The magnitude of this accomplishment cannot be overstated. In one of my jobs that I ran as part of engineering for a database vendor, and we spent endless amounts of developer time in addressing data consistency. We never really solved it for distributed systems that resided in different data centers.
So, besides Spanner being a laudable technical achievement by Google, what does it make possible for users?
Here are three use cases that are ideal candidates for Spanner:
1. Global Ecommerce sites. Ecommerce is the very definition of transactional systems -- placing orders for goods, taking payment, and updating inventory. Many ecommerce sites have large national or international footprints. Making transactional databases globally consistent is a huge benefit because it ensures data integrity--not to mention reducing the need for clever workaround software engineering to mitigate data consistency issues.
2. Dynamic data-driven websites with regionally dispersed users. Dynamic websites construct pages on the fly based on user identity, past history, current session clicks, and much more. They are heavily data-reliant and the data they use to construct a page is highly erratic. It might include breaking news for one page, and then a minute later might serve a page substituting sushi restaurant locations to the same user, depending on what content the user is interacting with. Being able to draw data from a system no matter where the user resides would be extremely useful to such sites.
3. Disaster recovery (DR). DR is important, but extremely difficult to implement, particularly if you follow the best practice of using remote geographic DR sites to protect against regional disasters. In practice, many IT organizations do backups and pray they never have to use them. Spanner offers a real path forward for DR. Instead of recovering an application by rebuilding it with data retrieved from a backup tape, just replicate the data in a Spanner database to a remote location. If the primary site becomes available, spin up new computing resources at a new location and have them connect to the secondary Spanner database system. This simplifies DR and brings it within the reach of many more IT groups.
Spanner is a truly remarkable offering and well represents Google’s impressive technical capabilities. If Google continues to roll out such innovative offerings it will soon move out of the category of cloud also-ran and into the position of a contender.
About the On-Demand Webinar
About the Webinar