Project offerings for Semester 1, 2017

Click on the supervisor's name for a description of the projects they are offering.

Projects will be added over the coming weeks.


Supervisor Project Credit points
Wei Bao  Resource scheduling in big data analytics system 12 or 18
Internet of Things boosted Smart Post-Sales 18
Tom Cai              Content Analysis for Large-scale Biomedical Imaging Data 12 or 18
Machine Learning based Image Pattern Classification 12 or 18
Mapping the Brain White Matter Pathways with Neuroimaging Data 12 or 18
Neuroimaging Computing in Automated Detection of the Longitudinal Brain Changes 12 or 18
Curvilinear Network Reconstruction from 2D / 3D Images 18
Context Modeling for Medical Image Retrieval 18
Neuroimaging Computing for Early Detection of Dementia 18
Joseph Davis      

Detecting Citation Cartels and Rings in Citation Networks

12
Phillip Gough

Statistical Uncertainty and Chronic Disease Maps: How to best communicate the risk of cancer using spatial visualisations

12
Joachim Gudmundsson Evaluating team-sports performance using network measurements 12 or 18
Ralph Holz   

Honeypots for blockchains

12 or 18

Scanning Robots

12 or 18

Analysing the traffic of mobile messengers

12 or 18

Passive monitoring of blockchain protocols

12 or 18
Seokhee Hong  

Scalable Visual Analytics

12 or 18

Visualisation and Analysis of Massive Complex Social Networks and Biological Networks

12 or 18

Navigation and Interaction Techniques for 2.5D Network Visualisation

12 or 18
Bryn Jeffries

Data science in Alertness and Fatigue

12 or 18
Judy Kay

Evaluating the Actual and Perceived Exertion Provided by Virtual Reality Games

12
David Lowe   

Lab augmentation 2 : Development of a laboratory augmentation prototype that demonstrates the feasibility of using current mobile phones to support augmentation of standard laboratory experiments

12 or 18

MOOLS: Massive Open Online Labs

12 or 18

Using virtual reality augmentation to support simultaneous use of physical equipment

12 or 18

Enhancing laboratory learning through scripted guidance using Smart Sparrow

12 or 18
Josiah Poon    

Semantic search criteria

12 or 18

Detect & extract numbers from paragraphs

12 or 18

Finding relation of symptoms and drugs

12 or 18

Information extraction from Discharge Summary

12

Finding the relationship between TCM & WM on stroke

12
Simon Poon

Data visualisation: examining the healthiness of the food supply and the global burden of nutrition-related disease (with George Institute's Food Policy Division)

18
Uwe Roehm

Web Database Vulnerability Analysis & Improvement

12

Web Content Mining Database

12
Bernhard Scholz

Matching Service (with RateSetter Australia)

12

An Integrated Development Environment for the Cloud

12

Performance Benchmark Suite for Logic Oriented Programs

12

Python Language Bindings for Soufflé

12

Debugger for Soufflé

12

Smart Contracts: Blockchain Security

12

Making Security User Interfaces of Internet Browsers Securer (Google Industry Project)

12
 Xiuying Wang

 

 

 

 

Learning based approach for medical image segmentation

12 or 18

Target object segmentation and understanding via affinity estimation

12 or 18

Deep learning based method for lesion detection and classification from biomedical image

12 or 18

 Zhiyong Wang

 

 

 

Multimedia Data Summarization

12 or 18

Human Motion Analysis, Modeling, Animation, and Synthesis

12 or 18

Predictive Analytics of Big Time Series Data

12 or 18
Remote-sensing Image Analysis 12 or 18

Audio Data Analysis

12 or 18

Bing Zhou

 

 

Parallel computing in large-scale fMRI data with high-dimensions

18

Transcription factor network visualisation based on chromatin states

18
Ying Zhou 

Visualizing Wikipedia Co-authorship Network

12 or 18

Large Scale Graph Algorithms on GraphX and Giraph

18
Albert Zomaya and MohamadReza Hoseiny  Designing a QoS- Aware Controller for Dynamic Scheduling in processing of cloud-based data streaming platform 12
MPC- based model to Trade-off Utilization and Quality of Service in large-scale cloud-based Lambda platforms 12
Albert Zomaya and Wei Li   

The Publication Ranking and Recommendation Web Portal

12

Implementation and performance evaluation of Internet of Things applications on Raspberry PI

12

Distributed Big Data-Streams Processing for Internet of Things

18

Task Scheduling Algorithms for the Internet of Things

18

 

Project supervised by Wei Bao

Resource scheduling in big data analytics system (12 or 18cp)
Big data analytics have become a necessity to businesses all over the world. Data centers in the cloud host huge volumes of data and process them using thousands of machines. Companies may simply rent (without the need of building up their own computation infrastructure) cloud platforms to run data analytics, gaining new insight to understand their customers and to explorer new market opportunities.

To improve performance of a data analytics cluster, the data analytics system should account for heterogeneity of the environment and workloads. First, data analytics workloads have different resource demands in different types of resources. Workloads may be CPU, I/O, memory, or storage intensive. Some of them might also require high network bandwidth. Second, computing environment is heterogeneous. The cluster may consist multiple types of servers with different capacities and performance. Some machines are more suitable to store large data whereas others run faster computations. Therefore, we must wisely schedule data analytics jobs in this complicated environment, to optimize the performance (e.g., delay, cost, etc). Each job should receive a “fair” share of resources to make progress while providing good performance.

The student is expected to implement cutting-edge multi-resource scheduling algorithms, to exam and compare their real-world performance. The student is also expected to design a new algorithm to achieve even better performance.

Requirements: Good programming skills, such as Python, Matlab, and Java. Strong mathematical background is a plus.

Internet of Things boosted Smart Post-Sales (18cp)
In many applications of the Internet of Things (IoT), there is a need to transmit the data collected by the smart things to the cloud. The emerging Smart Post-Sales (SPS) is one such kind. Manufacturers have great interests to collect the data of their products for analysis.

One obstacle, however, is how to transmit the data from the diverse things to the cloud using different wireless technologies, such as WiFi, ZigBee, Bluetooth, LoRa, and LTE. Different choices have different pros and cons. For example, one may develop a WiFi network. However, WiFi needs additional infrastructure: the vendor should build up its own WiFi access points. Another choice is transmission through LTE cellular networks, yet the communication channels are obviously too expensive. Therefore, we must wisely design a new architecture that can take advantage of different technologies, to support the SPS in a cost-effective way.

The student is expected to build up a new SPS system, through combining different IoT modules (e.g., sensing and data collection) and communication modules (e.g, WiFi, LoRa, and LTE). Experiments will be conducted to examine the performance of the designed system.

Requirements: Good programming skills, such as Python, Matlab, and Java. Strong mathematical background is a plus.

Project supervised by Weidong (Tom) Cai

Content Analysis for Large-scale Biomedical Imaging Data (12 or 18cp)
Great advances in biological tissue labeling and automated microscopic imaging have revolutionized how biologists visualize molecular, sub-cellular, cellular, and super-cellular structures and study their respective functions. How to interpret such image datasets in a quantitative and automatic way has become a major challenge in current computational biology. The essential methods of bioimage informatics involve generation, visualization, analysis and management. This project aims to develop automatic or semi-automatic approaches for content analysis in microscopic images, such as detection of certain cell structures, and tracing of cell changes over time. The studies will focus on computer vision algorithms and interactive framework development.

Preferred Programming Language: Matlab

Machine Learning based Image Pattern Classification (12 or 18cp)
Image pattern classification has a wide variety of applications, such as object detection and scene classification. The classification performance is largely dependent on the descriptiveness and discriminativeness of feature representation. Consequently, how to best model the complex visual features is crucial. Currently many different ways of image feature extraction have been proposed in the literature, yet their performance is still unsatisfactory and feature extraction remains a hot topic in computer vision. This project aims to study the various techniques of contextual feature representation and evaluate their effectiveness for different image applications.

Preferred Programming Language: Matlab

Mapping the Brain White Matter Pathways with Neuroimaging Data (12 or 18cp)
Advances in MR imaging sequences enable the modeling of the brain white matter pathways in vivo, which is an important technique to advance our understanding of the human brain under normal and pathological conditions. Current studies on brain white matter focus on mainly voxel-level measures, but ignore the morphological information, such as curvature and torsion of individual tracts, and dispersion of tract bundles. This project aims to develop novel computational methods to reconstruct the major white matter pathways and investigate the morphology of the white matter tracts and their association with the connected cortex. These methods would advance our knowledge of the brain structure with various potential translational applications.

Preferred Programming Language: Python / Matlab

Neuroimaging Computing in Automated Detection of the Longitudinal Brain Changes (12 or 18cp)
Neuroimaging technologies, such as MRI, have transformed the way we study brain under normal or pathological conditions. As the imaging facilities become increasingly accessible, more and more imaging data are collected from patients with chronic disorders in longitudinal settings. Such big neuroimaging data enables new possibilities to study the brain with high translational impact, such as early detection of the longitudinal changes in the brain, and large-scale evaluation of imaging-based biomarkers. This projects aims to develop novel computational methods to automatically detect the longitudinal changes in the brain based on the large-scale longitudinal neuroimaging data using machine-learning and deep-learning techniques.

Preferred Programming Language: Python / Matlab

Curvilinear Network Reconstruction from 2D / 3D Images (18cp)
Networks of curvilinear structures are pervasive both in nature and man-made systems. They appear at all possible scales, ranging from nanometers in Electron Microscopy image stacks of neurons to petameters in darkmatter arbors binding massive galaxy clusters. Modern imaging techniques are capable of acquiring a vast amount of image-based data containing such network structures. The network structures can be extracted from the images and used for further topology analysis. However, in spite of many years of sustained effort, fully automated network reconstruction remains elusive, especially when the images are noisy and the linear structures exhibit a complex morphology. This project aims to develop fully automated methods to extract the curvilinear network structures from 2D / 3D images with machine learning and image processing algorithms. The resulted methods are expected to beused in biological, medical and satellite aerial images.

Context Modeling for Medical Image Retrieval (18cp)
Content-based medical image retrieval is a valuable mechanism to assist patient diagnosis. Different from text-based search engines, similarity of images is evaluated based on comparison between visual features. Consequently, how to best encode the complex visual features in a comparable mathematic form is crucial. Different from the image retrieval techniques proposed for general imaging, in the medical domain, disease-specific contexts need to be modeled as the retrieval target. This project aims to study the various techniques of visual feature extraction and context modeling in medical imaging, and to develop new methodologies for content-based image retrieval of various medical applications.

Neuroimaging Computing for Early Detection of Dementia (18cp)
Dementia is one of the leading causes of disability in Australia, and the socioeconomic burden of dementia will be aggravated over the forthcoming decades as people live longer. So far, there is no cure for dementia, and current medical interventions may only halt or slow down the progression of the disease. Therefore, early detection of the dementia symptoms is the most important step in the management of the disease.Multi-modal neuroimaging has been increasingly used in the evaluation ofpatents with early dementia in the research setting, and shows great potential in mental health and clinical applications.The objective of this project is to design and develop novel neuroimaging computing models and methods to investigate pattern of dementia pathology with a focus on early detection of the disease.

Project supervised by Joseph Davis

Detecting Citation Cartels and Rings in Citation Networks (12cp)
Project description: It is widely known that one of the negative consequences of exclusive reliance on metrics for performance assessment of researchers is the increasing prevalence of excessive self-citations, citation cartels by small groups of co-authors and other researchers, h-index manipulation, and citation stacking by some journals to increase the journal impact factor. While there have been several reports of journal citation stacking, citation rings have tended to be more elusive. This project involves reviewing existing and developing new social network analysis (SNA) approaches for systematically identifying anomalous patterns in citation networks.

Requirements : Good data analysis, algorithmic and programming background. Experience of working with large social network data is desirable.

Project supervised by Phillip Gough

Statistical Uncertainty and Chronic Disease Maps: How to best communicate the risk of cancer using spatial visualisations (12cp)

This project will investigate the way that end-users understand and interpret statistical uncertainty in chronic disease maps. Data that has been previously collected from two sources, an online game and focus groups, will be used inform a design guide for chronic disease maps. The role of the student will be to assist processing the data from the game and focus groups. These sources will inform user personas, in order to use a human-centred design process of designing a spatial visualisation of relative cancer risk. The project will contribute to a design guide for interactive, online chronic disease maps.

Special Requirements: Strong skills in HCI and coding, preferably JavaScript.

Contact: If you are interested in this project, please send your expression of interest and your CV to: Evelyn.Riegler@sydney.edu.au

Project supervised by Joachim Gudmundsson

Evaluating team-sports performance using network measurements (12 or 18cp)

Understanding the interaction between football players is one of the more important and complex problems in sports science. Player interaction can give insight into a team’s playing style, or be used to assess the importance of individual players to the team. Capturing the interactions between individuals is a central goal In the last decade numerous papers have appeared that apply social network analysis to team sports. Two types of networks have dominated the research literature to date: passing networks and transition networks. Passing networks have been most frequently studied type in the research field.
In the last decade numerous papers have appeared that apply social network analysis to team sports. Two types of networks have dominated the research literature to date: passing networks and transition networks. Many properties of these networks have been studied, among them centrality, density, heterogeneity, entropy, and Nash equilibria. The aim of the project is to implement and experimentally study the above measurements, and at the end evaluate how well they measure the performance of a player.

Good programming skills are required.

Project supervised by Ralph Holz

Honeypots for blockchains (12 or 18cp)
Blockchains have become popular platforms for trading assets. They are best-known for their two main representatives, Bitcoin and Ethereum.

Blockchains are operated by tens of thousands of clients, akin to peer-to-peer networks. This makes the clients attractive targets for criminals, and attacks, intrusion, and theft have indeed already been reported.

In this project, we are going to build an honeypot for a popular blockchain. This is software that imitates the behaviour of a blockchain client and is seemingly open to attack - however, its true purpose is to log the activities of an attacker trying to compromise the "blockchain client". Such data gives deep insights into attacker behaviour and motivation.

This project is suitable for several students as we can develop honeypots for different blockchains that might share parts of the code.

Scanning robots (12 or 18cp)
By convention, websites may maintain a file robots.txt on their web servers that tells crawlers such as Google, Bing, and Baidu not to index portions of their websites. By convention, crawlers follow this advice.

Interestingly, robots.txt reveal information about the operations of the websites itself. They can tell us to which degree a website is dynamic, or which content the website operator may consider private or non-indexable, for whatever reason.

In this project, we are going to extend an Internet scanner to detect robots.txt on a webserver. We download the files and analyse them to identify prevalence, functionality, misconfigurations, and possible security holes.

Analysing the traffic of mobile messengers (12 or 18cp)
Mobile messengers have first replaced text messages (SMS), and more recently they have become serious competition to established social networks such as Facebook. In this work, we are going to analyse the network traffic that mobile messengers produce in the network of the University of Sydney.

We will develop a protocol dissector for the Bro Intrusion Detection System. This system is used to analyse traffic passing through the network of the University of Sydney. We run the dissector to obtain data about the prevalence of mobile messaging, its usage patterns (time, frequency), and interaction with remote servers.

Note: the privacy of university members is not infringed in this project as we do not identify network users and measurement data is not released. We hold ethical clearance for the operation of Bro.

Passive monitoring of blockchain protocols (12 or 18cp)
Blockchains have become popular platforms for trading assets. They are best-known for their two main representatives, Bitcoin and Ethereum.

Blockchains are operated by tens of thousands of clients, akin to peer-to-peer networks. It is unknown, however, how these networks form precisely and what blockchain activity occurs in the network of the University of Sydney.

In this project, we are going to make use of our network monitor that we operate on the university's link to the Internet. We are going to develop protocol dissectors for the Bro network monitor in order. We will then monitor traffic over several weeks in order to obtain a better understanding of blockchain activity in our network and how it relates to the global blockchains.

This project can be taken by more than one student as there are several blockchains to explore.

Project supervised by Seokhee Hong

Scalable Visual Analytics (12 or 18cp)
Technological advances such as sensors have increased data volumes in the last few years, and now we are experiencing a “data deluge” in which data is produced much faster than it can be used by humans.
Further, these huge and complex data sets have grown in importance due to factors such as international terrorism, the success of genomics, increasingly complex software systems, and widespread fraud on stock markets.

We aim to develop new visual representation, visualization and interaction methods for humans to find patterns in huge abstract data sets, especially network data sets.

These data sets include social networks, telephone call networks, biological networks, physical computer networks, stock buy-sell networks, and transport networks.

These new visualization and interaction methods are in high demand by industry.

Visualisation and Analysis of Massive Complex Social Networks and Biological Networks (12 or 18cp)
Recent technological advances have led to many massive complex network models in many domains, including social networks, biological networks, webgraphs and software engineering.

Visualization can be an effective analysis tool for such networks. Good visualisation reveals the hidden structure of the networks and amplifies human understanding, thus leading to new insights, new findings and possible prediction of the future.

However, visualisation of such massive complex networks is very challenging due to the scalability and the visual complexity.

This project addresses the challenging issues for visualisation and analysis of massive complex networks by designing and evaluating new efficient and effective algorithms for massive complex social networks and biological networks.

In particular, integration of good analysis method with good visualisation method will be the key approach to solve the research challenge.

Navigation and Interaction Techniques for 2.5D Network Visualisation (12 or 18cp)
Recent technological advances have led to many large and complex network models in many domains, including social networks, biological networks, webgraphs and software engineering.

Visualization can be an effective analysis tool for such networks; good visualisation may reveal the hidden structure of the networks and amplifies human understanding, thus leading to new insights, new findings and possible prediction of the future.

However, visualisation itself cannot serve as an effective and efficient analysis tool for large and complex networks, if it is not equipped with suitable interaction and navigation methods.

Well designed and easy-to-use navigation and interaction techniques can enable the users to communicate with visualization much faster and effectively to perform various analysis tasks such as finding patterns, trends and unexpected events.

Recently, 2.5D graph visualization methods have been successfully applied for visualization of large and complex networks, arising from biological networks, social networks and internet networks.
However, the corresponding navigation method has yet been investigated so far.

This project aim to design, implement and evaluate new methods for navigating 2.5D layouts of large and complex networks to enable users to perform analytical tasks.

Project supervised by Judy Kay

Evaluating the Actual and Perceived Exertion Provided by Virtual Reality Games
Supervisors: Prof J Kay and Ms S Yoo (PhD Student)
Virtual Reality games have shown potential in making exercise fun and engaging. This is because many virtual reality games for head mounted displays like the HTC Vive offer engaging experiences where interaction requires the player to physically move around, performing arm and body movements and these can actually provide quite good levels of exercise.
In this project, the student will work with the research team to refine a set of criteria to evaluate the actual and perceived exertion provided by a set of carefully chosen existing virtual reality games. This will involve gaining familiarity with literature and broader publications on exergames in virtual reality, refining the set of criteria to evaluate them and then running evaluations. It also optionally involve work with pen source games, modifying them to support the goals of the research, by improving the logging of the activity to match heart-rate measures and integrating user feedback. The work involves studies with people who play the games. The goal is to discover perceived exertion, and analysis of data from heart rate sensors as a measure of actual exertion. This project is well suited to either an individual student or a small group. The project will be customised to match the skills of the student, with varying levels of programming, data analysis and studies.

Essential skills:

  • Excellent problem solving skills.
  • Programming knowledge to support analysis of dats – Python or Java preferred.

Desirable skills:

  • Experience with the Unity Game Engine.
  • Experience with VR
  • Experience in user studies

Developmental outcomes for student:

  • Exposure to high-impact research, scientific expertise in multiple disciplines, and scientific infrastructure.
  • Experience designing and developing virtual reality apps (optional).
  • Experience in studies of people using virtual reality exergames.
  • Improved skills in scientific communication (e.g. writing scientific reports and oral presentations).

Project supervised by Bryn Jeffries

Data science in Alertness and Fatigue (12 or 18cp
How can we measure alertness? How do our bodies change as we stay awake for extended periods? Can we predict when we will be unsafe to work or drive? Data to explore these questions and more has been collected from several recent studies performed by the Alertness CRC, but many questions are yet to be answered, and require advanced analytical techniques. In this project you will collaborate with researchers in Psychology to explore data from new angles. This is well suited to students with good skills in Data Science, Data Mining or Database Systems.

Project supervised by David Lowe

Lab augmentation 2 : Development of a laboratory augmentation prototype that demonstrates the feasibility of using current mobile phones to support augmentation of standard laboratory experiments (12 or 18cp)
The outcome will be a prototype phone app that allows students to point their phone camera at a set of laboratory apparatus and have it supplement the display with additional information related to the apparatus. (Extension) The augmented information varies depending on additional information retrieved live from an external source (nominally connected to the equipment, so that the information represents the current state of the apparatus.

MOOLS: Massive Open Online Labs (12 or 18cp)
This project involves investigation of strategies that allow multiple users to share control of a single item of physical laboratory equipment, with the objective of allowing each user to feel as though they are an active participant in the resultant behaviour of the equipment. The core outcome will be development of the software interface for an online heat transfer experiment that allows gamified shared control of a set of laboratory equipment.

Using virtual reality augmentation to support simultaneous use of physical equipment (12 or 18cp)
This project will adapt concepts from earlier work on the use of augmentation of an experimental environment to allow multiple users to simultaneously undertake experimentation on the same item of laboratory equipment. The equipment will be designed to allow each user will have their own virtual software agent (manifested just to them using augmented reality) which reacts to the behaviour of the equipment. The outcome will be a simple prototype that demonstrates the feasibility of the approach.

Enhancing laboratory learning through scripted guidance using Smart Sparrow (12 or 18cp)
Investigation of the feasibility of using Smart Sparrow to provide adaptive guidance in carrying out a physical laboratory experiment. This will require consideration of the ways in which the Smart Sparrow adaptation engine can respond to events drawn from the real world (and in particular from the equipment under exploration). The outcome will be an implementation and evaluation of a proof-of-concept prototype and a set of recommendations regarding feasibility and possible design issues.

Projects supervised by Josiah Poon

Background for Project 1&2
Systematic review is a popular method in the medical and health research. It is a rigour method and demands the submission of a protocol before the proceeding of such a review. It superficially resembles a literature review by searching and downloading relevant articles. However, it is more rigour in the inclusion and exclusion step. The selected articles are later aggregated and information will be extracted for meta-analysis. The whole process is very time-consuming and it can take years to complete.

Research Area: Text Mining (TM) & Data Analytic/Mining (DA)

Project 1 (TM): Semantic search criteria (12 or 18 cps)
One of these tedious steps is decide whether an article should be included or not for the next step. Although a researcher can specify the key terms to search for relevant articles, there are certain semantics that cannot be articulated in current search engines. Even so, a search engine does not understand the full paper to help make a decision. For example, in a lung cancer study, the followings are the inclusion rules that cannot be specified in existing search:

  • Cell cycle arrest in G0 / G1 phase
  • P27 up-regulates after intervention
  • c-MYC down-regulates after intervention

The aim of this project is to apply text mining to this particular step to help reduce the workload of the researchers in systematic review.

Project 2 (TM): Detect & extract numbers from paragraphs (12 or 18 cps)
Since aggregation of numbers is part of the activity in a systematic review, it is important to understand the semantics of numbers in a paper and to extract them where necessary. For example:

  • Total response rates of A and B groups were 70.0% and 78.9%.
  • The mean survival was 11 months in A group and 10 months in B group.
  • Six, twelve and eighteen months cumulative survival rates of A and B groups were 75.0%, 42.5%, 26.2% and 81.6%, 26.4%, 10.5%

The advantage of an automated method not only saves the researchers time to find and transcribe these numbers for analysis, if a paper does not contain the required numbers, the paper can be ignored.

Project 3 (DA): Finding relation of symptoms and drugs (12 or 18 cps)
This is a health-related data analytic project. We tend to have a naïve assumption that it only exists a one-to-one relationship between symptoms and drugs, i.e. we somehow how the view that a drug is only able to address one symptom (target) displayed by a patient. This assumption is not true in reality. It is not uncommon to find relationship of multi-drugs for multi-symptoms. The aim of this project is to compare different computational approaches find this kind of relationship, e.g. bipartite graph, Bron-Kerbosch algorithm and other network approaches.

Project 4 (TM): Information extraction from Discharge Summary (12 cps)
Discharge summary is a report prepared by a clinician when a patient leaves a hospital or after a series of treatments. It details a patient's complaints, diagnostic findings, the prescribed therapies and the patient's response. It also contains recommendations on discharge. There is useful information that can contribute to data mining but, unfortunately, they are recorded in an unstructured manner. It will be help a lot if these can be extracted from these summaries. However, it is not an easy task because not only of spelling errors, but additional challenges like they contain a lot of special codes and abbreviation, as well as not following grammatical rules. The aim of this project is to develop a tool to extract desirable information from a set of discharge summaries from the Stroke Department for data analysis.

Project 5 (TM): Finding the relationship between TCM & WM on stroke (12 cps)
The aim of this text-mining project is to compare the topology of stroke between traditional Chinese (TCM) and western medicine (WM). This is achieved through the analysis of literature from these two fields. Deliverables are included but not limited to the identification of disease-treatment pairs, as well as the overlapping/ differences for between WM and TCM.

Projects supervised by Simon Poon

Data visualisation: examining the healthiness of the food supply and the global burden of nutrition-related disease (18cp)
(in collaboration with George Institute’s Food Policy Division)
The George Institute’s Food Policy Division works in Australia and internationally to reduce rates of death and disease caused by diets high in salt, saturated fat and sugar or excess energy, by undertaking research and advocating for a healthier food environment. The Group’s main focuses are food reformulation, monitoring changes in the food supply, and developing and testing innovative approaches to encourage consumers towards better food choices. A key example of innovation and collaboration is FoodSwitch, an award winning smartphone app that helps consumers to make healthier food choices.

The FoodSwitch database contains around 80,000 food and beverage products assigned to more than 600 food categories. The database contains a large amount of information about each food and beverage item, with up to 100 data points per individual product. Currently the data have a front-end “CMS” (Content Management System) where the values can be viewed by the user. Information for each product such as nutrient information, brand name, manufacturer name, ingredient lists, allergen information, serving sizes and many other pieces of data will be utilised in this project to better visualise what our food supply looks like and where policymakers should target improvements.

This project has been established to explore innovative approaches to viewing information about the healthiness of the food supply that can then be used by consumers, researchers and policy makers alike.
Relevant skills: Web programming, Data Visualisation using D3, business intelligence and Dashboard Development.

Projects supervised by Uwe Roehm

Web Database Vulnerability Analysis & Improvement (12cp)
Web security is crucial nowadays. The goal of this project is to conduct a security analysis of a web database application, and to improve the implementation such that the identified security weaknesses are fixed. The web database in question is hosted within the School of IT, written in PHP and Javascript and runs on top of a MySQL database. This project will consist of two phases: In the first phase, the student will conduct a 'white-box' vulnerability analysis of the existing system with regard to code implementation, design issues, known security threads, and data privacy requirements. This will include inspecting the existing code base and system architecture, as well as an analysis of the system design against known vulnerabilities. In the second phase, the student shall fix any high-priority vulnerabilities found during the analysis, and implement a logging component which shall keep track of data changes during runtime.

Skills needed: PHP and Javascript, good knowledge in web technologies and SQL databases.

Suitable for the following majors in the MIT: Databases, Software Engineering, IT Security

Web Content Mining Database (12cp)
The content of most websites changes constantly - this is in particular true for forums or news websites.

The goal of this project is to implement an automated web content mining system that allows to follow and analyse the changes of a given website over time.

The intended system consists of two parts: The first part is a website tracker that periodically captures the content of a given website and stores the web content in a temporal text database. The student shall compare different open source solutions and if possible adapt one of them for this project.

The second part is to perform proof-of-concept website monitoring with some simple explorative analysis of the captured content, such as: When is a site most active in terms of updates? Which topics are most popular? How can authors be classified by the articles they are writing?

Skills needed: Python, good knowledge in web technologies and SQL databases.

Suitable for the following majors: Databases, Software Engineering

Projects supervised by Bernhard Scholz

Matching Service (12cp)
(in collaboration with RateSetter Australia)
RateSetter has borrowers and lenders that need to be matched together in a market place. If a match is made the lender will have loaned their money to the borrower. A borrower market order will contain a rate, amount and various other attributes such as borrowing term, purpose of loan, risk category. Lender market orders will contain a rate, amount and rules to target specific attributes of borrower orders.

In order for this market to be flexible, the rules defined on a lender order can be an arbitrarily complex predicate. An example being:
And(EqualTo(LoanPurpose, Business), Or(GreaterThan(Business.Age, 2), EqualTo(Business.Age, 2)))

The borrower order ‘profile’ can contain arbitrary information such as: { Name: {First: ‘John’, Last: ‘Smith’}, Purpose: ‘Car’, Business: {Age: ‘5’}}. The core of this problem is to match lender orders against borrower orders and vice versa based on these predicates.

It is trivial to evaluate lender predicates against borrower orders in a naïve way, however when there are hundreds of thousands of existing lender orders and a new borrower order arrives, the algorithm/data structures must be capable of finding all matches in real time.

Deliverables:

  • Solution design covering algorithms / data structures
  • Implementation of design
  • Performance results

Industry Supervisor: Kym McGain

An Integrated Development Environment for the Cloud (12cp)
Integrated Development Environments (IDEs) are essential for modern software engineering. They accelerate the development cycle from the source code to the binary, and facilitate debugging. However, IDEs are normally executed on a personal desktop or laptop computer.

This project investigates how standard technologies can be fused to a simple IDE in the cloud. The standard technologies comprise a remote UNIX shell, a remote editor, and some visualisation techniques using HTML, Javascript, and web services.

Required Skills: web technologies, HTML, Javascript

Performance Benchmark Suite for Logic Oriented Programs (12cp)
The performance of logic oriented programs is not well understood. To obtain a deep insight of common runtime behaviour of logic programs, benchmarks are required to test various aspects of the runtime behaviour. To collate, archive and present benchmark results in a systematic way, a benchmark harness is required. The benchmark harness is the driver for the benchmark execution.

This project designs and implements a benchmark harness for logic-oriented programming engines. The harness should be written in a scripting language such as Python. The presentation of the benchmark results should be performed with web technologies such as HTML, and Javascript.

Required Skills: scripting language such as python

Python Language Bindings for Soufflé (12cp)
Soufflé is an open-source translator for a logic-oriented programming language. The translator compiles a declarative specification into parallel C++ code. Soufflé's applications include security specifications for SDNs, and checking security properties of large-scale Java programs. Soufflé can be fully integrated with other languages. It has currently a JNI and a C++ interface. To enhance the interoperability of Soufflé, we would like to have interfaces for Python.

In this project, we implement a language binding for Python. The challenges of this project will be to replicate relational data-structures in the Python language and implement high-performance interfaces such that information can be exchanged between the Python language and Soufflé efficiently.

Requirements: some Python and C/ C++ knowledge

Debugger for Soufflé (12cp)
Soufflé is an open-source translator for a logic-oriented programming language. The translator compiles a declarative specification into parallel C++ code. Soufflé's applications include security specifications for SDNs, and checking security properties of large-scale Java programs. Soufflé has no dynamic query interface for querying tuples.

The aim of this project is to implement a simple query language that can retrieve information from the computed logical relations. The query language

Requirements: good C++ knowledge

Smart Contracts: Blockchain Security (12cp)
Smart contracts are computer programs that enforce a set of rules and work seamlessly in conjunction with the blockchain. Although it was mentioned on occasion in the early 2000s, the concept of smart contracts has been recently popularised by the Ethereum programmer Vitalik Buterin in later 2013. This kind of smart contracts are programs in a new “scripting language” that can construct new applications on top of blockchains. The applications can “privatise” the blockchain for various purposes outside the scope of bitcoins. Proof-of-concept implementation of smart contracts for various applications, including financial auditing, sport betting, and music distributions are already available. The latest contender is, indeed, an entire company whose steering is determined with the use of smart contracts for voting on projects to be funded (the so-called DAO). Hence, with smart contracts, agreements can be transferred from the hands of lawyers and paper documents to the digital world using the blockchain as a virtual machine and relying on the distributed trust it enables. However, there is an inherent issue with smart contracts: who can ensure that the smart contracts are correctly programmed? This fundamental question is of paramount importance for legal and financial applications using smart contracts.

In this project, we will extend a program analysis framework for Smart Contracts with Soufflé. The framework will report on rules in the smart contract that are wrongly implemented. Hence, the abstract interpretation framework will prevent the loss of money caused by wrongly implemented rules.

Requirements: programming knowledge in C++ / some basics of logic

Making Security User Interfaces of Internet Browsers Securer (12cp)
(Google Industry Project)
Internet browsers have various dialogues with the user to control the security settings of the browser. The dialogue for non-IT users is quite often cryptic and incomprehensible. Unfortunately, existing technology does not permit a full automation of security settings, and the user is still required to give feedback about security settings of the browser. The consequences of the wrong security settings can be drastic, e.g., the security of the system can be jeopardised if the wrong security settings are chosen.

In this project, we will study user interfaces for security settings in the Chrome internet browser. We would like to study rich user interfaces so that users are given the right context about their security settings, and inform them about consequences how the security settings may affect their system.

Requirements: programming knowledge in JavaScript/HTML/CSS, some basics of user interfaces

Industrial Supervisor: Raymes Khoury

Projects supervised by Xiuying Wang

Learning based approach for medical image segmentation (12 or 18cp)
Accurate medical image segmentation is an essential procedure to accomplish precision oncology. Computerized learning approach is well recognized in the research community of the recognition and segmentation of tumors/tissues from medical images. This project will utilize learning based approach to incorporate different categories of knowledge. Multi-modality and multi-channel medical images will be investigated for effective learning and segmentation of target objects. A more feasible learning scheme is to be designed to co-operate the attributes of knowledge from different aspects. The results may contribute to the design of medical image segmentation, analysis and assessment platform.

Target object segmentation and understanding via affinity estimation (12 or 18cp)
User inference and prior knowledge plays an essential role in computing based medical image understanding, analysis and other processing procedures. This project will utilize information theories and statistical models to simulate the similarities between data points. Exemplar-based and user-friendly learning techniques will be investigated for effective target object extraction and analytics. A robust affinity estimation scheme is to be developed for prior knowledge incorporation and association. The improved accuracy, efficiency and general applicability will show its broad value in various kinds of target object extraction and analysis.

Deep learning based method for lesion detection and classification from biomedical image (12 or 18cp)
Cancers have been the biggest threat to the public health and life quality. It urges early diagnose and treatments in clinical practice, which is critical to give patients the best chance for recovery and survival. This project is to develop an automated algorithm that accurately detects the lesions on tissues and distinguish the lesion from biomedical images such as CT and MRI. This project will utilize deep learning which has been proved powerful for recognition for lesion detection and classification.

Requirements: The student is expected to have solid software development skills.

Projects supervised by Zhiyong Wang

Multimedia Data Summarization (12 or 18cp)
Multimedia data is becoming the biggest big data as technological advances have made it ever easier to produce multimedia content. For example, hundreds hours of video are uploaded to YouTube every minute. While such wealthy multimedia data is valuable for deriving many insights, it has become extremely time consuming, if not possible, to watch through a large amount of video content. Multimedia data summarization is to produce a concise yet informative version of a given piece of multimedia content, which is highly demanded to assist human beings to discover new knowledge from massive rich multimedia data. This project is to advance this field by developing advanced video content analysis techniques and identifying new applications.

Human Motion Analysis, Modeling, Animation, and Synthesis (12 or 18cp)
People are the focus in most activities; hence investigating human motion has been driven by a wide range of applications such as visual surveillance, 3D animation, novel human computer interaction, sports, and medical diagnosis and treatment. This project is to address a number of challenge issues of this area in realistic scenarios, including human tracking, motion detection, recognition, modeling, animation, and synthesis. Students will gain comprehensive knowledge in computer vision (e.g. object segmentation and tracking, and action/event detection and recognition), 3D modeling, computer graphics, and machine learning.

Predictive Analytics of Big Time Series Data (12 or 18cp)
Big time series data have been collected to derive insights in almost every field, such as the clicking/view behavior of users on social media sites, electricity usage of every household in utility consumption, traffic flow in transportation, to name a few. Being able to predict future state of an event is of great importance for effective planning. For example, social media sites such as YouTube will be able to better distribute popular video content to their caching servers in advance so that users can start watching the videos with minimal delay. This project is to investigate existing algorithms and develop advanced analytic algorithms for higher prediction accuracy.

Remote-sensing Image Analysis (12 or 18cp)
Remote sensing images have played a key role in many fields such as monitoring and protecting our natural environment, improving agriculture, and assessing water quality. Due to the limitation of the current imaging technology, advanced image analysis techniques such as unmixing and classification are needed to better utilize remote sensing images. Meanwhile, the increasing number of massive remote sensing images demands efficient algorithms in order to support timely decision-making. This project is to investigate efficient and effective approaches to address the emerging issues in remote sensing image analysis. Students will develop strong skills in image analysis, machine learning, and data mining.

Audio Data Analysis (12 or 18cp)
Audio data such as speech and music is very important for our daily life. For example, speech data has rich affective information in addition to the spoken content, and music is very powerful to influence the emotion of individuals. Therefore, it is very helpful to understand the full meaning of a given piece of audio data. This project is to investigate intelligent algorithms for such a purpose and explore the value of audio data. Students will develop strong skills in audio data analysis, machine learning, and data mining, while enjoying the beauty of various sounds.

Projects supervised by Bing Zhou

Parallel computing in large-scale fMRI data with high-dimensions (18cp)
The functional magnetic resonance imaging (fMRI) is a key technique for map the human brain activity at precise localisations. The Neuroimaging data generated from fMRI experiments record entire brain patterns as voxels with their activity profiled through a time-course. To study patients with mental illness and healthy individuals, patterns from millions of voxels need to be mapped to specific brain regions using brain atlas and compared across the time-course.

The amount of data generated from fMRI is extremely large in scale and inherently high-dimensional. Therefore, parallel and distributed computing is crucial for addressing the computational complexity in fMRI data analysis. In this project, we aim to develop parallel algorithms to computing correlations and associations between different brain regions using high-dimensional fMRI data generated from patients who have been diagnosed to have Schizophrenia, Bipolar and compare these with healthy individuals.

This project will enable you to explore cutting-edge parallel computing algorithms to handle neuroimaging data generated latest brain imaging techniques and shed light on mental illness through such an integrative process.

Requirements: This project is mainly on parallel algorithm design, implementation and testing. Thus it will be a good project for you if you are a good programmer and interested in programming on high-performance computing clusters and GPUs.

This project is in collaboration with Dr Pengyi Yang and Professor Jean Yang (School of Mathematics and Statistics).

Transcription factor network visualisation based on chromatin states (18cp)
Chromatin states of DNA encode for types of regulations and accessibility of transcription factors. The Encyclopedia of DNA Elements (ENCODE) project has profiled a large collection of histone modifications that together specify the chromatin states genome-wide.

We have previously established a parallel computing framework for reconstructing transcription networks by integrating genome-wide binding profiles (ChIP-Seq) of hundreds of transcription factors. This project aims to extend on previous work by implementing parallel and distributed computing algorithms to reconstruct and visualise transcription factor networks based on chromatin states learnt from histone modifications data using hidden markov model (HMM).

This project will allow you the opportunity to develop and apply cutting-edge parallel and distributed computing algorithms and methods to “omics” data generated from the state-of-the-art biological platform. You will get involved in (1) algorithm design, implementation and testing on multicore computers and clusters of PCs; and (2) an interactive graphical user interface design and implementation.
Requirements: good programming skill (essential) and experience in graphical user interface development (desirable).

This project is in collaboration with Dr Pengyi Yang (School of Mathematics and Statistics).

Projects supervised by Ying Zhou

Visualizing Wikipedia Co-authorship Network (12 or 18cp)]]
Wikipedia is maintained by mass collaborative efforts. Each article is created and revised by many editors. Previous research in academic publication domain suggests that there are clear clusters in academic coauthorship network, with each cluster representing a research topic. Investigating Wikipedia Co-authorship network presents some unique and interesting challenges since most articles have a long revision history and involve many editors with unequal contributions. This project aims to extract co-author information from the revision history and to visualise the co-authorship network along the timeline with various degree of details.

Required skill: Python, MongoDB

Large Scale Graph Algorithms on GraphX and Giraph (18cp)
Large scale graph processing has many use cases. Spark based GraphX and Hadoop based Giraph are the two popular large scale graph processing frameworks. Each has implemented a small set of popular algorithms such as PageRank, Triangle counting and others. A large number of algorithms are left for developers to provide their own implementations. This group of projects aim to implement a few other equally prominent algorithms, for instance the HITS algorithm for bipartite graph, that can be incorporated as general framework API.
Each project will focus on a single algorithm on a chosen framework. A variety of implementations should be provided and tested with a large dataset on a cluster of machines.

Required skill: Java, Hadoop or Spark, basic understanding of graph processing

Project supervised by Albert Zomaya and Wei Li

The Publication Ranking and Recommendation Web Portal (12cp)
Academic rankings have a huge impact in academia. The rankings of universities across the world published by different organisations always attract much attention from people. Such information is often the most significant decision-making factor for prospective students, as well as for university presidents and governing boards. One of the important factors to affect the universities rankings is the academic publication.

In this project, you are expected to develop a ranking system mainly targeted on computer science publications. The ranking information is extracted from the existing and available ranking systems, e.g. CORE, CCF, JCR, QUALIS and so on. Also, the related information of the publication needs to be provided, e.g. for an academic conference, e.g. deadline, location, acceptance rate and so on.

Implementation and performance evaluation of Internet of Things applications on Raspberry PI (12cp)
The Internet of Things (IoT), can be considered as a highly dynamic and radically distributed networked system. It is a system composed of a very large number of smart objects which are identifiable, able to communicate and to interact, either among themselves, building networks of interconnected objects, or with end-users or other entities in the network. The presence of smart devices able to sense physical phenomena and convert them into a stream of information data, as well as the presence of devices able to trigger corresponding actions, maximizes safety, security, comfort, convenience and energy-savings. Raspberry Pi is a small, powerful, cheap, hackable and education-oriented computer board introduced in 2012. It operates in the same way as a standard PC, requiring a keyboard for command entry, a display unit and a power supply.

In this project, you are expecting to implement a self-chosen/appointed IoT application on the Raspberry PI with or without the additional Sensehat component and demonstrate the performance of your application. You are also encouraged to compare the same application implemented on different microprocessor development boards, e.g. Arduino, Orange Pi, Banana Pi, BeagleBone Black, Phidgets and Udoo etc.

Distributed Big Data-Streams Processing for Internet of Things (18cp)
The fundamental goal of big data-stream processing is to process live data in a fully integrated fashion, providing real-time information and results to end-users, while monitoring and aggregating new information for supporting decision-making. The high volume of streaming data flowing from distributed sources of Internet of Things (IoT) makes it hard to be processed in the batch processing mode, where all data is first stored on disk and later retrieved for processing and analysis. In addition, the continuous sensed data across large, potentially multi-site applications that connect remote sensing and processing, needs to rely on a distributed computing infrastructure.
The goal of this project is to develop online distributed data analytics technique to address the needs from such systems. Skills on signal processing, descriptive statistics, inferential statistics and distributed systems are preferred.

Task Scheduling Algorithms for the Internet of Things (18cp)
Applications for the Internet of Things are typically composed of several tasks or services, executed by devices and other Internet available resources, including resources hosted in a cloud platform. The complete execution of an application in such environments is a distributed, collaborative process. To enable collaborative processing of applications, the following problems must be solved:

(i) assigning tasks to devices (and other physical or virtual resources),

(ii) determining the execution sequence of tasks, and

(iii) scheduling communication between involved devices and resources. Unlike traditional task scheduling, with the involvement of sensors and actuators, the types of tasks in IoT applications are more than computation and communication.

Different allocation of these tasks on the nodes consumes different amounts of resources from the limited devices and provides different quality of service (QoS) to the applications. Although QoS management in traditional distributed and networked system is a well-studied topic, in IoT this is still a poorly investigated subject and the definition of QoS in IoT is still not clear. The traditional QoS attributes such as throughput, delay, or jitter are not suitable in IoT, where additional attributes are concerned, such as for instance the information accuracy (that is qualified with the probability that an accuracy can be reached), and the network resources required. Therefore, a task scheduling algorithm should handle the efficient execution of a large number of applications in the IoT infrastructure considering multiple types of resources and different QoS parameters, specific to IoT environments. Moreover, such an algorithm must consider situations in which multiple applications can perform common tasks (such as the sensing of the same physical variable), that do not need to be performed several times by the devices and physical resources, thereby saving energy. Still, there may be applications with higher priorities in relation to the required response time (e.g., critical time applications) or in relation to the amount of resources provided to them (e.g., bandwidth, sensing coverage), compared to others that are sharing the IoT infrastructure and therefore these priorities must be respected when allocating tasks.

This project aim to study existing task scheduling algorithms, especially those for cloud computing and wireless sensor network environments, and to propose a new one, specifically tailored for IoT environments.

Project supervised by Albert Zomaya and MohamadReza Hoseiny

Designing a QoS- Aware Controller for Dynamic Scheduling in processing of cloud-based data streaming platform (12 or 18cp)
More and more companies are facing with a huge amount of streaming data that needs to be quickly processed in a real time fashion to extract meaningful information.

Stream data processing is different from well-studied batch data processing which does not necessarily need to be done in real-time (the issue of velocity). In such environments, data must analyzed/transformed continuously in the main RAM before it is stored on the hard drive. Especially in environments that the value of the analysis decreases with time. Normally, this is done by a cluster of server (worker) nodes that continuously work on processing the incoming stream of data. One of the major issues posed by streaming data processing is keeping the QoS level under fluctuations of request rates. Past research showed that the presence of high arrival rate of streaming data within short periods causes serious degradation to the overall performance of underlying system.

In this project, we are looking for creating some advanced controller techniques to allocate effectively available resources to handle the big data streaming with complex arrival patterns in order to preserve the QoS enforced by end-users.

MPC- based model to Trade-off Utilization and Quality of Service in large-scale cloud-based Lambda platforms (12 or 18cp)
Today, more and more applications perform sophisticated forms of micro-services. There are several platforms that provide the infrastructure required to build for such a software architecture, one of the most recent one is Lambda platform. Enterprises can exploit Lambda platforms (e.g. AWS Lambda) to extend other services with custom logic, or create their own back-end micro-services that operate at cloud's scale, performance, and security. A Lambda platform is a form of server-less computing service that can run client's code in response to external events and manages the underlying compute resources automatically. In this project, our aim is to propose an effective solution based on MPC model to use the promising technologies offered by Lambda platform to handle a burst of events coming to a Lambda cluster to reach an optimal trade-off among the server's utilization and the amount of Quality of Service enforced by each user.