Research and Research-Linked projects for sem1 2011, supervised by Alan Fekete
Measuring Isolation Anomalies for Replicated Data
It is a theme in designing distributed systems, that there is a tradeoff between data consistency and performance.
Especially with replicated data, mechanisms that keep the data consistent have a serious impact on performance.
The performance impact is easy to quantify, but usually only hand-waving describes
the amount of consistency given up by high-performaning mechanisms. This project aims to measure the amount
of inconsistency produced by common data replication mechanisms, such as lazy propagation of updates.
The project will build on recent work by Fekete, Goldrei and Asenjo (VLDB09) which defined
a benchmark to measure inconsistency in a single-site database. The project will define a
new benchmark for replicated data, and use this to measure the impact of lazy propagation of updates.
This project is suitable for Research Track (18crpts), or Honours.
This project would be part of the activity of
the Database research group of the School. This project requires good
awareness of performance issues, skill working with database systems internals,
some programming (Java and perhaps also C), and plenty of data analysis.
A performance model for multiversion concurrency control algorithms
In multiversion storage management, changes to a record produce a new version,
rather than modifying the original storage location. These techniques are used in several
important database engines,
including InnoDB, PostgreSQL, Microsoft SQLServer, and Oracle. Some platforms use multiversion to
offer an isolation level called
Snapshot isolation (SI), while recent research has shown how to do serializable isolation. In particular,
Michael Cahill, a PhD student here, introduced a new multiversion concurrency control algorithm called
SerializableSI, and implemented it in the InnoDB stroage engine (Cahill, Roehm
and Fekete, ACM TODS 2009). In ICDE'11, a paper by Revilak, O'Neil and O'Neil will describe
another algorithm (Precisely Serializable Snapshot Isolation) implmented
in the same platform. To understand the tradeoffs between the various algorithms,
it would be valuable to create a predictive model that explains performance
of these algorithms, that is, which expresses a prediction for throughput in terms of features
of the application code such as the
amount of data read and written in each transaction, the amount of conflict between transactions, etc, as well
as features of the hardware such as the bandwidth to the disk, and the speed of the CPU,
and key features of the algorithms such as the amount of overhead in each operation due to checking concurrency control.
The project will begin by measuring performance of very simple microbenchmark code
(eg a program that just reads N items and writes M of them). We will see what features matter,
and what types of functions can approximate the observed performance, and then generalize this to a model
which
can be evaluated on more complex programs.
This project is suitable for Research Track (18crpts), or Honours. A cut-down version would be suitable as research-linked (12crpts or TSP).
This project would be part of the activity of
the Database research group of the School. This project requires good
awareness of performance issues, skill working with database tuning,
some programming (Java and perhaps also C), plenty of data analysis, and good mathematical modeling.
Web Services for Advanced Patent Search and Analysis (supervised by Dr Vladimir Tosic at NICTA)
NICTA recently implemented innovative Web-based patent search and analysis software that can
programmatically retrieve relevant international patent information from free patent databases
(European Patent Offices esp@cenet Web services and US Patent and Trademark Offices PatFT/AppFT)
and other sources, it then stores the pertinent patent information in a relational database,
performs various patent analyses to determine patenting trends, and graphically visualizes analysis results.
The main task in this thesis project is to design, implement, and test several additional modules in the
previously described software. The main among these modules will be those performing
additional advanced patent analysis procedures that will help inventors make better decisions about their
patent portfolio (e.g., making patenting decisions suitable for company's business strategy).
Another important group of new models will provide innovative integration of the NICTA's patent search
and analysis software with software for systematic review of academic literature.
This project would be done based at NICTA's ATP site. Involvement in this will require agreeing to
appropriate conditions regarding Intellectual Property. It would be suitable for Research Track (18 crpts)
or Honours; a cut-down version would be suitable as research linked (12crpts or TSP) project. This project requires good skill in programming and maturity in analysis and evaluation, but
it does not require prior knowledge of patent law. The student would be expected to learn about data mining
and related topics during the project, so a background in AI would be useful.
Autonomic Business-Driven IT Management of Cloud Computing Systems (supervised by Dr Vladimir Tosic at NICTA)
Business-driven IT management (BDIM) has the goal is to determine mappings between technical metrics (e.g.,
response time, availability) and business metrics (e.g., profit, customer satisfaction) and leverage them
to make run-time IT system and/or service management decisions that maximize business value.
Ideally, IT systems should be self-managing (a.k.a. autonomic), or at least managed with minimal
human intervention. Cloud computing technologies enable provisioning of computing infrastructure (e.g.,
memory storage), platforms (e.g., virtualized Linux desktops), and software applications (e.g.,
customer relationship management suites) over the Internet, as a utility that can be bought on demand.
Cloud computing is (to some extent) already provided by major computing companies and
used for diverse purposes, but there is a strong need for additional innovation.
NICTA previously developed IT system/service management tools for autonomic BDIM
and started applying them to cloud computing systems. Students working on this project will design,
implement, and test innovative extensions of this NICTA software. Their work will result in novel
architectures, components, algorithms, and data structures for autonomic management
that uses various metrics to make decisions best for the business.
This project would be done based at NICTA's ATP site. Involvement in this will require agreeing to
appropriate conditions regarding Intellectual Property. It would be suitable for Research Track (18 crpts)
or Honours; a cut-down version would be suitable as research linked (12crpts or TSP) project.
This project requires considerable skill in Java programming and understanding of distributed
computing and networks, as well as maturity in analysis and evaluation,
but it does not require prior experience with cloud computing. A background in
information systems or business topics would be useful.
Joint research
I am also involved in research on astroinformatics (with Dr Tara Murphy) and
on static analysis for approximate data in
sensor networks (with Dr Bernhard Scholz). Please consult the project lists of these colleagues.