NoSQL Database by Christof Strauch - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

7. Conclusion

 

The aim of this paper was to give a thorough overview and introduction to the NoSQL database movement which appeared in the recent years to provide alternatives to the predominant relational database man- agement systems. Chapter 2 discussed reasons, rationales and motives for the development and usage of nonrelational database systems. These can be summarized by the need for high scalability, the processing of large amounts of data, the ability to distribute data among many (often commodity) servers, conse- quently a distribution-aware design of DBMSs (instead of adding such facilities on top) as well as a smooth integration with programming languages and their data structures (instead of e.g. costly object-relational mapping). As shown in chapter 2, relational DBMSs have certain flaws and limitations regarding these requirements as they were designed in a time where hardware (especially main-memory) was expensive and full dynamic querying was expected to be the most important use case; as shown by Stonebraker et al. the situation today is very different, so a complete redesign of database management systems is suggested. Because of the limitations of relational DBMSs and today’s needs, a wide range of non-relational datastores has emerged. Chapter 2 outlines several attempts to classify and characterize them.

Chapter 3 introduced concepts, techniques and patterns that are commonly used by NoSQL databases to address consistency, partitioning, storage layout, querying, and distributed data processing. Important concepts in this field—like eventual consistency and ACID vs. BASE transaction characteristics—have been discussed along with a number of notable techniques such as multi-version storage, vector clocks, state vs. operational transfer models, consistent hashing, MapReduce, and row-based vs. columnar vs. log-structured merge tree persistence.

As a first class of NoSQL databases, key-/value-stores have been examined in chapter 4. Most of these datastores heavily borrow from Amazon’s Dynamo, a proprietary, fully distributed, eventual consistent key-/value-store which has been discussed in detail in this paper. The chapter also looked at popular open- source key-/value-stores like Project Voldemort, Tokyo Cabinet/Tyrant, Redis as well as MemcacheDB.

Chapter 5 has discussed document stores by observing CouchDB and MongoDB as the two major repre- sentatives of this class of NoSQL databases. These document stores provide the abstraction of documents which are flat or nested namespaces for key-/value-pairs. CouchDB is a document store written in Erlang and accessible via a RESTful HTTP-interface providing multi-version concurrency control and replication between servers. MongoDB is a datastore with additional features such as nested documents, rich dynamic querying capabilities and automatic sharding.

In chapter 6 column-stores have been discussed as a third class of NoSQL databases. Besides pure column- stores for analytics datastores integrating column- and row-orientation can be subsumed in this field. An important representative of the latter is Google’s Bigtable which allows to store multidimensional maps indexed by row, column-family, column and timestamp. Via a central master server, Bigtable automatically partitions and distributes data among multiple tablet servers of a cluster. The design and implementation of the proprietary Bigtable have been adopted by open-source projects like Hypertable and HBase. The chapter concludes with an examination of Apache Cassandra which integrates the full-distribution and eventual consistency of Amazon’s Dynamo with the data model of Google’s Bigtable.