Database Sampling and Applications

Speaker: Associate Professor Gautam Das
Science and Engineering Department, The University of Texas, USA

Time: Thursday 21 August 2008, 4-5pm **Note different day

Location: The University of Sydney, School of IT Building, Lecture Theatre (Room 123), Level 1


In recent years, advances in data collection technologies and increasingly affordable secondary storage have led to a proliferation of very large databases. These arise in diverse applications, ranging from commercial data warehouses such as sales data of large retailers, scientific databases such as satellite imagery data and biological genome databases, data collected from sensor networks, user click-thru data collected at large websites, and other domains. However, while the collection of massive data sets has become relatively straightforward, effective data analysis has proven more difficult to achieve, primarily because query processing is prohibitively expensive over such large databases. Recent efforts to overcome these problems include sampling-based approaches, in which a small sample of the data is acquired for further analysis, at the cost of small inaccuracies in the answers. The seminar will touch upon emerging research directions of database sampling, such as sampling for aggregation queries, sampling over disk-based systems, sampling over data streams, sensor networks, distributed P2P systems, as well as query processing over imprecise data. This tutorial will serve to educate database researchers in important data modeling, statistical, and approximation techniques that are expected to find increasing applications in database/data mining research and applications.

Gautam Das is an Associate Professor in the Computer Science and Engineering Department of the University of Texas at Arlington. Prior to UTA, Dr Das has held positions at Microsoft Research, Compaq Corporation and the University of Memphis. He graduated with a BTech in computer science from IIT Kanpur, India, and with a PhD in computer science from the University of Wisconsin- Madison.

Dr. Das's research interests span data mining, information retrieval, databases, approximate query processing, applied graph and network algorithms, and computational geometry. His research has resulted in over 75 papers, many of which have appeared in premier conferences and journals such as SIGMOD, VLDB, ICDE, KDD, TODS, and TKDE. Dr. Das has served as the Program Chair of CIT 2004, as well as in program committees of premier conferences such as SIGMOD, PODS, ICDE, KDD, and ICML. His research has been supported by grants from NSF, ONR, Cadence, Apollo, and Microsoft.