High Availability The next topics Stonebraker et al. address are high availability and failover. Again, they outline the historical development of these issues from log-tapes that have been sent off site by organizations and were run on newly delivered hardware in the case of disaster over disaster recovery services installing log-tapes on remote hardware towards hot standby or multiple-site solutions that are common today. Stonebraker et al. regard high availability and built-in disaster recovery as a crucial feature for DBMSs which—like the other design issues they mention—has to be considered in the architecture and design of these systems. They particularly require DBMSs in the OLTP field to
1. “keep multiple replicas consistent, requiring the ability to run seamlessly on a grid of geographically dispersed systems”
2. “start with shared-nothing support at the bottom of the system” instead of gluing “multi-machine support onto [...] SMP architectures.”
3. support a shared-nothing architecture in the best way by using “multiple machines in a peer-to-peer configuration” so that “load can be dispersed across multiple machines, and inter-machine replication can be utilized for fault tolerance”. In such a configuration all machine resources can be utilized during normal operation and failures only cause a degraded operation as fewer resources are available. In contrast, todays HA solutions having a hot standby only utilize part of the hardware resources in normal operation as standby machines only wait for the live machines to go down. The conclude that “[these] points argue for a complete redesign of RDBMS engines so they can implement peer-to-peer HA in the guts of a new architecture” they conclude this aspect.
In such a highly available system that Stonebraker et al. require they do not see any need for a redo log as in the case of failure a dead site resuming activity “can be refreshed from the data on an operational site”. Thus, there is only a need for an undo log allowing to rollback transactions. Such an undo log does not have to be persisted beyond a transaction and therefore “can be a main memory data structure that is discarded on transaction commit”. As “In an HA world, one is led to having no persistent redo log, just a transient undo one” Stonebraker et al. see another potential to remove complex code that is needed for recovery from a redo log; but they also admit that the recovery logic only changes “to new functionality to bring failed sites up to date from operational sites when they resume operation”.
No Knobs Finally, Stonebraker et al. point out that current RDBMSs were designed in an “era, [when] computers were expensive and people were cheap. Today we have the reverse. Personnel costs are the dominant expense in an IT shop”. They especially criticize that “RDMBSs have a vast array of complex tuning knobs, which are legacy features from a bygone era” but still used as automatic tuning aids of RDBMSs “do not produce systems with anywhere near the performance that a skilled DBA can produce”. Instead of providing such features that only try to figure out a better configuration for a number of knobs Stonebraker et al. require a database to have no such knobs at all but to be ““self-everything” (self-healing, self-maintaining, self- tuning, etc.)”.
Considerations Concerning Transactions, Processing and Environment
Having discussed the historical development of the IT business since the 1970s when RDBMSs were designed and the consequences this development should have had on their architecture Stonebraker et al. now turn towards other issues that impact the performance of these systems negatively: