NoSQL Database by Christof Strauch - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Failure Types in Distributed Systems Chang et al. criticize the assumption made in many distributed protocols that large distributed systems are only vulnerable to few failures like “the standard network partitions and fail-stop failures”. In contrast, they faced a lot more issues: “memory and network corruption, large clock skew, hung machines, extended and asymmetric network partitions, bugs in other systems that we are using (Chubby for example), overflow of GFS quotas, and planned and unplanned hardware maintenance”. Hence, they argue that such sources of failure also have to be addressed when designing and implementing distributed systems protocols. Examples that were implemented at Google are checksumming for RPC calls as well as removing assumptions in a part of the system about the other parts of the system (e.g. that only a fixed set of errors can be returned by a service like Chubby).

Feature Implementation A lesson learned at Google while developing Bigtable at Google is to imple- ment new features into such a system only if the actual usage patterns for them are known. A counterexample Chang et al. mention are general purpose distributed transactions that were planned for Bigtable but never implemented as there never was an immediate need for them. It turned out that most applications using Bigtable only needed single-row transactions. The only use case for distributed transactions that came up was the maintenance of secondary indices which can be dealt with by a “specialized mechanism [...that] will be less general than distributed transactions, but will be more efficient”. Hence, general purpose implementations arising when no actual requirements and usage patterns are specified should be avoided according to Chang et al..

System-Level Monitoring A practical suggestion is to monitor the system as well at its clients in order to detect and analyze problems. In Bigtable e.g. the RPC being used by it produces “a detailed trace of the important actions” which helped to “detect and fix many problems such as lock contention on tablet data structures, slow writes to GFS while committing Bigtable mutations, and stuck accesses to the METADATA table when METADATA tablets are unavailable”.

Value Simple Designs In the eyes of Chang et al. the most important lesson to be learned from Bigtable’s development is that simplicity and clarity in design as well as code are of great value—especially for big and unexpectedly evolving systems like Bigtable. As an example they mention the tablet- server membership protocol which was designed too simple at first, refactored iteratively so that it became too complex and too much depending on seldomly used Chubby-features, and in the end was redesigned to “to a newer simpler protocol that depends solely on widely-used Chubby features” (see section 6.1.4).

6.2. Bigtable Derivatives

As the Bigtable code as well as the components required to operate it are not available under an open source or free software licence, open source projects have emerged that are adopting the concepts described in the Bigtable paper by Chang et al.. Notably in this field are Hypertable and HBase.

Hypertable

Hypertable is modelled after Google’s Bigtable and inspired by “our own experience in solving large-scale data-intensive tasks” according to its developers. The project’s goal is “to set the open source standard for highly available, petabyte scale, database systems”. Hypertable is almost completely written in C++ and relies on a distributed filesystem such as Apache Hadoop’s HDFS (Hadoop Distributed File System) as well as a distributed lock-manager. Regarding its data model it supports all abstractions available in Bigtable; in contrast to Hbase column-families with an arbitrary numbers of distinct columns are available in Hypertable. Tables are partitioned by ranges of row keys (like in Bigtable) and the resulting partitions get replicated between servers. The data representation and processing at runtime is also borrowed from Bigtable: “[updates] are done in memory and later flushed to disk”. Hypertable has its own query language called HQL (Hypertable Query Language) and exposes a native C++ as well as a Thrift