Cloud Computing, Grid Computing, Green Computing

1. Scheduling and Load Balancing in Large Scale Distributed Computing Environments

Large scale distributed systems (e.g. Grid Computing, Cloud Computing) are quite prevalent today. These systems provide high performance capabilities to a wide range of applications. These applications normally have different, and sometimes conflicting, requirements. This will necessitate the development of more flexible scheduling techniques. Another factor which is detrimental to the performance of such these systems is the dynamic nature of such combination of heterogeneous resources that are, for most of the time, located in disparate locations. In addition, the availability of resources (e.g. computational, storage, etc) for some of the time does not mean that such resources will be available all the time. Such conditions will add more complexity to the design of these schedulers. This also suggests the need to suites of schedulers that can be used in different operating scenarios. This project deals with the study and development of a variety of scheduling scenarios and algorithms that can help in achieving the ultimate goal of furthering our understanding of scheduling in large scale distributed systems.

2. Quality of Service in Distributed Computing Systems

There is a need to develop a comprehensive framework to determine what QoS means in the context of the distributed systems and the services that will be provided through such infrastructure. What complicates the scenario is that the fact the distributed systems will provide a whole range of services and not only high performance computing. There is a great need for the development of different QoS metrics for distributed systems that could capture all the complexity and provide meaningful measures for a wide range of applications. This will possibly mean that new classes of algorithms and simulation models need to be developed. These should be able to characterize the variety of workloads and applications that can be used to better understand the behaviour of distributed computing systems under different operating conditions.


 
3. Healing and Self-Repair in Large Scale Distributed Computing Systems

As the complexity of distributed systems increases time there will be a need to endow such systems with capabilities that make them capable of operating in disaster scenarios. What makes this problem very complex is the heterogeneous nature of today’s distributed computing environments that could be made up of hundreds or thousands of components (computers, databases, etc). In addition, a user in one location might not be able to have control over other parts of the system. So it is rather logical that there is a need for “smart” algorithms (protocols) that can achieve such an acceptable level of fault-tolerance and account for a variety of disaster recovery scenarios.

4. Application Isolation Techniques in Cloud Computing Platforms

The cloud computing model allows people to use CPU, storage and even network bandwidth from remote resource providers. These resource providers often host lots of third-party applications in tens of thousands machines in their data centres. As many third-party applications share physical CPUs, storage and networks, how to isolate these applications becomes an issue. The project will review the technologies, such as virtual machines, used by existing “cloud computing” infrastructure providers for achieving application level isolation and examine the effectiveness of these technologies. It will also investigate how to make a “cloud computing” platform trustworthy.

5. Accountability in Distributed Systems for Bioinformatics Data Management

As a growing number of scientific activities involve computers, there are increasing needs to make important activities accountable in computer systems, e.g., people may be interested in how a conclusion is drawn, what data support a claim and what tools are used for processing the data. Existing efforts related to this problem include data provenance management in database area and scientific workflow management in Grid computing area. However, these works do not explicitly addresses accountability problem. The project intends to produce a few mechanisms to manage distributed workflow and data for attesting complex bioinformatics computing processes.

6. Application-Specific Service Level Agreement and Energy-Efficiency Improvement in Cloud Computing Platforms

Cloud computing environments are gaining popularity as the de facto platforms for many applications. These systems bring a range of heterogeneous resources that should be able to function continuously and autonomously. However, these systems expend a lot of energy. Thus, this project aims to develop new algorithms and tools for energy-aware resource management allocation for large-scale distributed systems enabling these systems to become environmentally friendly. The proposed framework will be ‘holistic’ in nature seamlessly integrating a set of both site–level and system–level/service–level energy–aware resource allocation schemes addressing a range of complex scenarios and different operating conditions.


 

 

Back to the School Home Page
 


Last changed: October 10, 2009