Basser Seminar Series

Snapshot Isolation and Serializable Execution

Speaker: Michael Cahill and Alan Fekete
School of IT, The University of Sydney

Time: Wednesday 3 September 2008, 4:00-5:00pm

Location: The University of Sydney, School of IT Building, Lecture Theatre (Room 123), Level 1

Abstract

The idea of ACID transactions is a powerful way to design database- using application programs, but this depends on having correct and efficient concurrency control algorithms in the dbms engine. Traditional engines use strict two-phase locking, which guarantees correct (serializable) execution, but often suffers from low throughput. Snapshot Isolation (SI) is a multiversion concurrency control mechanism that is widely available (for example, in Oracle, PostgreSQL and MSSQL Server). It generally gets good performance, and it avoids the classical interleaving anomalies such as Lost Update or Inconsistent Reads. However, unlike traditional two-phase locking or timestamp concurrency control, SI does not guarantee that arbitrary executions are serializable. In particular, it is possible for a concurrent execution to violate undeclared data integrity constraints, even though each application program preserves the constraints when run alone in the system.

In this talk we present a body of work that studies different ways how to ensure serializable executions, when executing on a platform offering SI. The key to this is a theory that shows how a particular set of transactions may have a pattern of data-conflicts which prevents non-serializable execution. This pattern can be detected by drawing a graph (a variant of the usual serializability graph for an execution) and checking for a special feature, called a dangerous structure. If a set of programs does not have a dangerous structure in its graph, then all executions will be serializable.

Given a set of programs, there are a number of ways we can modify the programs to remove all the dangerous structures. These modifications introduce extra conflicts between certain pairs of transactions while not changing the meaning of any program, for example, by adding SQL statements to update a table not used for any other purpose. We have studied the performance of these modifications, and we find that some choices give performance almost as good as for the unmodified programs, as well as removing all possibility of concurrency anomalies.

Another approach is to change the dbms engine, to introduce dynamic checks that prevent anomalies. We find that despite these checks, the variant engine can keep most of the performance of SI, while guaranteeing true serializable execution. In contrast to program modification, this doesn't require pre-knowledge of the transactions that will run in the system, but it does require access to the source of the engine.

This talk is based on a series of papers which appeared in TODS, PODS'05, VLDB'07, DASFAA'08, AICCSA'08, ICDE'08 and SIGMOD'08 (this latter paper was given a "Best Paper" award at the conference). The coauthors of these papers are Mohammad Alomari, Sudhir Jorwekar, Dimitrios Liarokapis, Patrick O'Neil, Elizabeth O'Neil, Krithi Ramamritham, Uwe Roehm, Dennis Shasha and S. Sudarshan (from Sydney University, UMass Boston, IIT Bombay and NYU).

Speaker's biography

Alan Fekete completed a PhD in Mathematics at Harvard University in 1987, after an undergraduate education in Computer Science and Mathematics at Sydney University. He has been an academic in Computer Science at Sydney since 1988, and he has held visiting appointments at Cornell, MIT, UWashington and Microsoft Research. His research is focussed on the use of formal methods to understand infrastructure “systems” software, especially database transaction management. Fekete has an H-index of 18 according to Google Scholar, he is recognized as a Distinguished Scientist by the ACM, and he has supervised 11 completed PhD students.

Michael Cahill completed his undergraduate education with a University Medal in Computer Science at Sydney University in 1997 after completing a project on exchange at Harvard University with Margo Seltzer's systems research group. He has almost 20 years of experience as a professional software engineer, having worked for several Sydney software companies, including three years as CTO of startup Bullant, before joining Seltzer's Sleepycat Software in 2001. Sleepycat was acquired by Oracle in 2006, where Michael is now a senior engineer for the open source embedded database, Berkeley DB. He has extensive experience building scalable, reliable and extensible software systems. He is active in introducing computer science to high school students through the National Computer Science School (NCSS) and the NCSS Challenge, and is in the final year of his PhD in the area of database transaction management.