NoSQL Database by Christof Strauch - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Logging is done by traditional DBMSs in addition to modifications of relational data on each transaction. As log files are persisted to disk to ensure durability logging is expensive and decreases transaction performance.

Locking of datasets to be manipulated causes overhead as write operations in the lock-table have to occur before and after the modifications of the transaction.

Latching because of shared data structures (e.g. B-trees, the lock-table, resource-tables) inside an RDBMS incurs further transaction costs. As these datastructures have to be accessed by multiple threads short-term locks (aka latches) are often used to provide parallel but careful access to them.

Buffer Management finally also plays its part when it comes to transaction overhead. As data in tradi- tional RDBMSs is organized in fixed pages work has to be done to manage the disk-pages cached in memory (done by the buffer-pool) and also to resolve database entries to disk pages (and back) and identify field boundaries.

As stated before, communication is considered the main source of overhead according to Stonebraker and by far outweighs the other ones (take from this survey: [HAMS08]) which almost equally increase total transaction costs. Besides avoiding communication between the application and the database all four other sources of performance overhead have to be eliminated in order to considerably improve single node performance.

Now, datastores whether relational or not have specific themes in common and NoSQL databases also have to address the components of performance overhead mentioned above. In this context, Stonebraker raises the following examples:

  • Distribution of data among multiple sites and a shared-nothing approach is provided by relational as well as non-relational datastores. “Obviously, a well-designed multi-site system, whether based on SQL or something else, is way more scalable than a single-site system” according to Stonebraker.
  • Many NoSQL databases are disk-based, implement a buffer pool and are multi-threaded. When providing these features and properties, two of the four sources of performance overhead still remain (Locking, Buffer management) and cannot be eliminated.
  • Transaction-wise many NoSQL datastores provide only single-record transactions with BASE properties (see chapter 3 on that). In contrast to relational DBMSs ACID properties are sacrificed in favor of performance.

Stonebraker consequently summarizes his considerations as follows: “However, the net-net is that the single-node performance of a NoSQL, disk-based, non-ACID, multithreaded system is limited to be a modest factor faster than a well-designed stored-procedure SQL OLTP engine. In essence, ACID transactions are jettisoned for a modest performance boost, and this performance boost has nothing to do with SQL”. In his point of view, the real tasks to speed up a DBMS focus on the elimination of locking, latching, logging and buffer management as well as support for stored procedures which compile a high level language (such as SQL) into low level code. How such a system can look like is described in the paper “The end of an architectural era: (it’s time for a complete rewrite)” ([SMA+07]) that has already been discussed above.

Stonebraker also does not expect SQL datastores to die but rather states: “I fully expect very high speed, open-source SQL engines in the near future that provide automatic sharding. [...] Moreover, they will con- tinue to provide ACID transactions along with the increased programmer productivity, lower maintenance, and better data independence afforded by SQL.” Hence “high performance does not require jettisoning either SQL or ACID transactions”. It rather “depends on removing overhead” caused by traditional implementations of ACID transactions, multi-threading and disk management. The removal of these sources of overhead “is possible in either a SQL context or some other context”, Stonebraker concludes.

2.2.6. Requirements of Administrators and Operators

In his blog post “The dark side of NoSQL” (cf. [Sch09]) Stephan Schmidt argues that the NoSQL debate is dominated by a developer’s view on the topic which usually iterates on properties and capabilities developers like (e.g. performance, ease of use, schemalessness, nice APIs) whereas the needs of operations people and system administrators are often forgotten in his sight. He reports that companies8 encounter difficulties especially in the following fields:

Ad Hoc Data Fixing To allow for ad hoc data fixing there first has to be some kind of query