NoSQL Database by Christof Strauch - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

4. Key-/Value-Stores

 

Having discussed common concepts, techniques and patterns the first category of NoSQL datastores will be investigated in this chapter. Key-/value-stores have a simple data model in common: a map/dictionary, allowing clients to put and request values per key. Besides the data-model and the API, modern key-value stores favor high scalability over consistency and therefore most of them also omit rich ad-hoc querying and analytics features (especially joins and aggregate operations are set aside). Often, the length of keys to be stored is limited to a certain number of bytes while there is less limitation on values (cf. [Ipp09], [Nor09]).

Key-/value-stores have existed for a long time (e.g. Berkeley DB [Ora10d]) but a large number of this class of NoSQL stores that has emerged in the last couple of years has been heavily influenced by Amazon’s Dynamo which will be investigated thoroughly in this chapter. Among the great variety of free and open- source key-/value-stores, Project Voldemort will be further examined. At the end of the chapter some other notable key-/value-stores will be briefly looked at.

4.1. Amazon’s Dynamo

Amazon Dynamo is one of several databases used at Amazon for different purposes (others are e.g. SimpleDB or S3, the Simple Storage Service, cf. [Ama10b], [Ama10a]). Because of its influence on a number of NoSQL databases, especially key-/value-stores, the following section will investigate in more detail Dynamo’s influencing factors, applied concepts, system design and implementation.

4.1.1. Context and Requirements at Amazon

The technologial context these storage services operate upon shall be outlined as follows (according to Amazon’s Dynamo Paper by DeCandia et al. cf. [DHJ+07, p. 205]):

  • The infrastructure is made up by tens of thousands of servers and network components located in many datacenters around the world.
  • Commodity hardware is used.
  • Component failure is the “standard mode of operation”.
  • “Amazon uses a highly decentralized, loosely coupled, service oriented architecture consisting of hundreds of services.”

Apart from these technological factors, the design of Dynamo is also influenced by business considerations (cf. [DHJ+07, p. 205]):

  • Strict, internal service level agreements (SLAs) regarding “performance, reliability and efficiency” have to be met in the 99.9th percentile of the distribution1. DeCandia et al. consider “state management” as offered by Dynamo and the other databases as being crucial to meet these SLAs in a service whose business logic is in most cases rather lightweight at Amazon (cf. [DHJ+07, p. 207–208]).
  • One of the most important requirements at Amazon is reliability “because even the slightest outage has significant financial consequences and impacts customer trust”.
  • “[To] support continuous growth, the platform needs to be highly scalable”.

At Amazon “Dynamo is used to manage the state of services that have very high reliability requirements and need tight control over the tradeoffs between availability, consistency, cost-effectiveness and performance”. DeCandia et al. furthermore argue that a lot of services only need access via primary key (such as “best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog”) and that the usage of a common relational database “would lead to inefficiencies and limit scale and availability” (cf. [DHJ+07, p. 205]).

4.1.2. Concepts Applied in Dynamo

Out of the concepts presented in chapter 3, Dynamo uses consistent hashing along with replication as a partitioning scheme. Objects stored in partitions among nodes are versioned (multi-version storage). To maintain consistency during updates Dynamo uses a quorum-like technique and a (not further specified) protocol for decentralized replica synchronization. To manage membership and detect machine failures it employs a gossip-based protocol which allows to add and remove servers with “a minimal need for manual administration” (cf. [DHJ+07, p. 205–206]).

DeCandia et al. note their contribution to the research community is that Dynamo as an “eventually- consistent storage system can be used in production with demanding applications”.

4.1.3. System Design Cosiderations and Overview

DeCandia et