Computer science internships

Supervisor: A/Prof. Chang Xu

Eligibility: Research experience in one of the following areas: deep learning, machine learning. Proficiency in deep learning programming, e.g., PyTorch

Project Description:

Hallucination in vision-language models refers to the phenomenon where the model generates content that is not grounded in the input visual data. Such outputs can include fabricated details or inaccurate interpretations that do not reflect the actual scene, potentially leading to misinformation. This issue emerges when the model extrapolates beyond the visible cues, inadvertently blending plausible but unverified information with genuine content.

Large vision-language models have achieved impressive results in tasks like image captioning, visual question answering, and scene understanding, opening up transformative possibilities across various domains. However, despite these advancements, these models often suffer from hallucination, which can compromise their reliability in critical applications. The challenge lies in ensuring that the model’s outputs are both visually grounded and factually accurate.

The primary focus of this project is to identify and mitigate the root causes of hallucination in large vision-language models. By developing advanced methods to better align visual input with generated language, we aim to enhance the robustness and trustworthiness of these models, thereby improving their applicability in real-world scenarios.

Requirement to be on campus: Yes, *dependent on government’s health advice.

Supervisors: Dr Muhammad Sajjad Akbar and Dr Mohammad Polash

Eligibility: Machine Learning, it would be good to have security background but not compulsory

Project Description:

In the evolving field of cybersecurity, bridging the gap between academic learning and industry demands is crucial.

This project aims to enhance cybersecurity education by aligning curriculum design with real-world job market needs.

We plan to conduct a comprehensive analysis of thousands of cybersecurity job postings from platforms like Seek, Jora, Monster, Glassdoor, LinkedIn, and Indeed using supervised machine learning techniques.

The collected data will be categorized to identify key roles, required skills, certifications, and salary trends. The insights generated will help universities refine their cybersecurity curricula, ensuring graduates acquire industry-relevant skills. This project will not only help students understand career pathways in cybersecurity but also assist faculty in focusing on in-demand skills.

By integrating machine learning-driven industry insights into curriculum design, we aim to create a dynamic, industry-aligned learning experience, fostering better employability and professional readiness for students.

Requirement to be on campus: No

Supervisors: Dr Muhammad Sajjad Akbar and Dr Mohammad Polash

Eligibility Criteria: Machine Learning, it would be good to have security background but not compulsory

Project Description:

With increasing global data privacy concerns, Australian companies must comply with GDPR regulations to ensure secure and ethical data handling. Many organizations struggle with understanding and implementing GDPR requirements, leading to compliance risks and potential penalties.

This project aims to develop a privacy compliance tool to assist organizations in understanding, assessing, and implementing GDPR regulations effectively. The tool will provide automated compliance checks, risk assessments, and tailored recommendations based on company-specific data practices.

By leveraging machine learning and rule-based frameworks, the tool will help businesses navigate data protection policies, consent management, and security measures. This project ensures seamless GDPR integration into corporate data governance strategies, reducing legal risks while promoting transparency and accountability in data handling. Through this solution, Australian businesses can proactively adapt to evolving privacy regulations, fostering consumer trust and regulatory compliance in an increasingly data-driven world.

Requirement to be on campus: No

Supervisor: Dr Mohammad Polash

Eligibility:

Ability to review current literature on this topic
Have the skillset to implement such a system
Willing to write a research article about this

Project Description:

With the emergence of GenAI, universities are moving towards pen-and-paper exams. However, for programming exams, it creates a lot of issues with marking and feedback. In this work, the aim is to recognize students’ handwritten code using OCR. However, the challenge is that OCR errors, perhaps due to varied handwriting styles and inconsistencies in the quality of the handwritten text, can lead to inaccurate or incomplete recognition of the code.

Therefore, this research focuses on several core challenges. First, we seek to improve the accuracy of OCR in recognizing programming languages by training the model to understand specific programming (e.g. OOP, SQL etc) syntax and structures. Second, we aim to develop a system that can not only recognize the code but also provide meaningful feedback on its correctness and quality.

Requirement to be on campus: No

Supervisor: Dr Mohammad Polash

Eligibility:

Proficiency in OOP programming
Ability to review current literature on OOP notional machines and have the skillset to implement those
Capable of representing and communicating complex OOP concepts in an accessible, engaging way.
Willing to write a research article about this

Project Description:

This project aims to systematically collect and present notional machines across a diverse range of Object-Oriented Programming (OOP) subtopics, creating a comprehensive repository that supports both educators and researchers. By doing so, we hope to enhance the teaching and learning experience in OOP, making abstract concepts more accessible and tangible for students.

Furthermore, we are dedicated to expanding the scope of notional machines by developing resources for OOP subtopics that currently lack them. Hopefully, this expansion will not only fill existing gaps but also provide educators and researchers with new opportunities to engage with and advance the field of programming education.

Requirement to be on campus: No

Supervisor: A/Prof. Nguyen Tran

Eligibility: Machine learning Python coding, LLM knowledge

Project Description:

Large language models (LLMs), like the newly released Llama 3.1 with 405 billion parameters, have revolutionized various real-world applications. However, their high-end hardware requirements often limit accessibility for many researchers.

Our project aims to democratize the use of such transformative technology through distributed computing. By leveraging multiple GPUs at edge networks, we enable efficient inference and fine-tuning of LLMs, making cutting-edge AI research feasible and more inclusive. This approach not only enhances computational efficiency but also broadens the scope of research possibilities by reducing the barrier to entry for utilizing state-of-the-art LLMs.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisor: A/Prof. Wei Bao

Eligibility: Strong mathematics background, experience in distributed computing, basic knowledge in convex optimization.

Project Description:

In recent years, distributed and edge computing have been widely adopted to protect data privacy and enhance scalability in various fields, which intensified concerns of algorithm fairness, emphasizing the need for equal treatment of diverse demographic groups in decision-making processes.

As an example, federated learning (FL) implements distributed computing to leverage decentralized data in machine learning to preserve data privacy. FL enables multiple clients to collaboratively train a global model without sharing private data, making it especially valuable for privacy-sensitive applications such as healthcare and finance.

Despite its advantages, FL introduces challenges in ensuring algorithm fairness. Biases present in the local dataset can propagate and amplify in the global model, remaining concealed under privacy protection and potentially leading to discriminatory outcomes. Existing fairness-enhancing techniques often fail to address the unique challenges posed by distributed settings, including heterogeneous data distributions, limited communication, and adversarial attacks.

This project aims to investigate fairness in distributed computing algorithms and develop robust methodologies to mitigate bias, ensuring equitable outcomes across diverse demographic groups. You will design algorithms that address both robustness and fairness in distributed computing, with their performance evaluated through experimental validations against various metrics and benchmarks. Additionally, you will conduct theoretical analyses of the proposed methodologies, including proving generalization bounds and performing convergence analysis.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisor: A/Prof. Wei Bao

Eligibility: Familiar with probability theory and convex optimization.

Project Description:

Edge computing has emerged as a critical approach to facilitate smart application with scalability. However, due to the diversity of the computation capacity and energy restriction, edge devices may take significant time to complete computational-intensive tasks. Two solutions have been adopted to tackle this challenge: 1) We assign heavy tasks to more capable edge machines, leaving light tasks to less capable edge machines; 2) We can reduce task processing time by sacrificing processing quality (i.e., controllable processing time).

In this project, you will solve the task assignment in edge computing under the framework of online matching problem. You are expected to propose online matching algorithms, which strategically assign tasks to machines and balances between processing quality and processing time for each task, in order to maximize the overall processing quality. You will formulate the task assignment problem as an optimization problem and derive the theoretical performance of your proposed algorithm. You will also conduct experiments to evaluate your algorithm against performance metrics and benchmarks.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisors: Prof Michael Cahill, Prof Alan Fekete, A/Prof Uwe Roehm, Dr Liyanage; in collaboration with Dr Bernhard Scholz [Fantom Foundation]

Eligibility: Some experience with Java is essential, exposure to Go and database internals would be helpful but is not required.

Project Description:

The Ethereum blockchain can be treated as a database with two channels for queries: a fast, untrusted channel and a slow, trusted channel. A query returns results over the fast channel and certification key over the slow channel that a client can use to check the validity of the results received. It uses a data structure called a Merkle Patricia Trie (MPT) to provide efficient lookup and insert operations for immutable data.

The goal of this project is to enhance a database benchmark such as YCSB to measure the performance of a tamper-evident database. The benchmark and the database system already exist: this project will connect them, measure the performance and identify bottlenecks.

Requirement to be on campus: No

Supervisors: Prof Michael Cahill, Prof Alan Fekete, A/Prof Uwe Roehm, Dr Rahul Gopinath; in collaboration with Dr Bernhard Scholz [Fantom Foundation]

Eligibility: Some experience with Python is essential, some exposure to Go and database internals would be helpful but is not required.

Project Description:

This project involves testing the correctness of the transaction layer of the StateDB of the Go-Ethereum Geth client. Through fuzz testing, we aim to verify whether the implementation aligns with the specification. This database layer has simple set and get operations combined with complex transaction semantics.

The database backend already exists: this project will apply fuzz testing to generate sequences of operations and verify that the database ends up in the expected state after they are executed.

Requirement to be on campus: No

Supervisors: Dr Liyanage, Prof Alan Fekete, Prof Michael Cahill, A/Prof Uwe Roehm

Eligibility: Some experience with measuring dbms performance; also experience with PostgreSQL is desirable.

Project Description:

The supervisors recently proposed to extend the YCSB benchmark system to cover situations where some database records get longer over time. This project will use this framework to measure performance of extra dbms platforms such as PostgreSQL, and then the project will analyse the results.

Requirement to be on campus: No

Supervisor: Prof Alan Fekete, Prof Michael Cahill, Dr Liyanage, A/Prof Uwe Roehm

Eligibility: Some experience with measuring dbms performance; also experience with PostgreSQL and MongoDB is desirable.

Project Description:

FerretDB is a “shim” that allows applications coded against MongoDB’s query language, to run with data in a PostgreSQL system. The project will investigate characteristics of workload that impact on the performance obtained. This is an extension of a recent honours thesis.

Requirement to be on campus: No

Supervisors: Prof Alan Fekete, Prof Cahill, Dr Liyanage, A/Prof Uwe Roehm; in collaboration with Dr Anna Liu [AWS]

Eligibility: Some experience with using cloud platforms such as AWS, and running machine learning training and inference pipelines.

Project Description:

This project will develop a method for estimating compute resource requirements for given use cases and workloads. The project will describe a class of AI use cases, taking into account both training and inference, and then to estimate the number of GPUs/GPU hours required, and suggest mapping to optimal GPU instance types and deployment architecture. The benchmarking process will also evaluate the importance of usage factors such as model type, dataset size, number of users, for compute requirement estimation.

Requirement to be on campus: No

Supervisors: Prof Alan Fekete, Prof Michael Cahill, Dr Liyanage, A/Prof Uwe Roehm; in collaboration with Dr Anna Liu [AWS]

Eligibility: Experience with using cloud platforms such as AWS (at least some of AWS Lambda, Cognito, DynamoDB, Bedrock), and building data-backed applications. Understanding of the health care system is desirable.

Project Description:

This project will explore the use of Generative AI tools such as AWS Bedrock to create a virtual assistant which can enhance the productivity of health care providers, such as nurses in a hospital, or carers in an age care home.

The assistant would streamline administrative tasks, provide quick access to patient information, assist with documentation, and offer clinical guidelines based on AI-generated responses.

Requirement to be on campus: No

Supervisor: Prof. Eduardo Velloso

Eligibility: WAM>75 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application.

Project Description:

The majority of chatbots are focused on clinical, educational or enterprise outcomes. On character.ai we see the emergency of fantasy chatbots designed for cosy comfort. These experiences, however, are held back by 'general purpose' UX. Users frequently request enhancements like background music, thematic maps, and world-building tools for more immersion.

Simultaneously, we're seeing the rise of conversational role-playing game behaviour that mirrors narrative play in cosy games (Animal Crossing, Stardew Valley, good pizza-great pizza etc.). This convergence raises an intriguing research question: How might we reimagine chatbot experiences that focus on cosy comfort and fantasy immersion over productivity and 'assistance'?

This project will involve designing and developing a new chatbot for exploring alterative ways of interacting with LLMs aimed at enabling novel user experiences.

Requirement to be on campus: Yes * dependent on government’s health advice..

Supervisors: Prof. Eduardo Velloso

Eligibility: Preference will be given to candidates demonstrating proficiency with Unity 3D, OpenXR XR Interaction Toolkit, Meta XR SDK, and MR/VR application development experience.

Project Description:

This project introduces "Stand-In," an innovative solution that leverages embodied AI agents to represent absentees in meetings conducted in MR/VR.

Stand-In enables seamless meetings by allowing your personalized AI agent to attend on your behalf when you're unavailable. Rather than reschedule or cancel the meeting, your colleagues can proceed with the planned meeting by engaging with your AI agent.

During the meeting, your Stand-In can:

Provide immediate responses to questions within its knowledge domain
Suggest a delay response if the question is out of its knowledge
Capture the complete meeting content for your later review

Upon your return, you can immerse yourself in the recorded meeting through MR/VR technology, experiencing the interaction as if you had been present and addressing any questions asynchronously that required your personal attention.

This technology allows participants to interact naturally with representations of absentees while maintaining productive workflows and eliminating scheduling conflicts.

Requirement to be on campus: Yes * dependent on government’s health advice.

Supervisors: Prof. Eduardo Velloso

Eligibility: Preference will be given to candidates demonstrating proficiency with 3D Computer Vision, 3D Reconstruction, Unity 3D, OpenXR XR Interaction Toolkit, Meta XR SDK, and MR/VR application development experience.

Project Description:

Picture two people, working remotely from each other, who want to meet through mixed reality as if they were face-to-face. One can see the other in their own environment while being represented as an avatar---effectively blending their spaces. This works great if the spaces are similar. But what if they are different?

This project addresses this challenge through selective blending of individual objects—particularly specific furniture and collaboration surfaces—to create focused interaction zones while maintaining spatial coherence and workspace awareness between physical and virtual elements.

In addition, users are naturally sensitive to visual artifacts in familiar objects, and conventional approaches often introduce implausible geometries that break immersion. However, this project preserves visual high-fidelity reconstruction through precise 3D Gaussian, capturing geometric features, color, and density that enable natural object recognition and spatial registration.

Our project lets users finetune shapes by cutting them out and match them up in space by registering 3D shapes. This way, things from different places can be easily combined into a cohesive working environment.

Requirement to be on campus: Yes * dependent on government’s health advice.

Supervisor: Prof. Athman Bouguettaya

Eligibility: WAM>80 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application

Project Description:

Social media has become an integral part of our lives, with users constantly uploading content across various platforms. Unfortunately, some of this content includes untrustworthy images.

Traditional approaches to detecting fake images often rely on image processing, which can be costly and computationally demanding. More recent methods that analyse comments on posts may also fall short, as even fake posts can receive supportive comments.

We propose leveraging the metadata of an image, along with the associated post information, to assess its trustworthiness. As part of our approach, we will develop a comprehensive database of online images that can be queried based on their metadata. This database will later serve as a crucial resource for determining the trustworthiness of images. For instance, users will be able to search for images with specific metadata, such as shutter speed, which can aid in identifying potential inconsistencies and modifications in the images.

Requirement to be on campus: Yes * dependent on government’s health advice.

Supervisor: Prof. Athman Bouguettaya

Eligibility: WAM>80 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application.

Project Description:

The proliferation of AI-generated images presents a significant challenge in verifying their authenticity. Traditional image verification methods often fall short against the advanced techniques employed by generative AI, making it difficult to distinguish between genuine and AI-created images.

We propose an innovative approach to address this issue by focusing on the subtle traces that generative AI algorithms leave in the metadata of images. By detecting and analysing these indicators, we aim to assess the trustworthiness of AI-generated images with greater accuracy.

Our method will involve developing specialized tools to identify these metadata patterns, enabling us to differentiate between authentic and AI-generated images more effectively. This approach not only enhances the detection of untrustworthy content but also contributes to maintaining the integrity of online visual media. Through this research, we seek to provide a reliable solution for the growing challenge of AI-generated image verification in an increasingly digital world.

Requirement to be on campus: Yes * dependent on government’s health advice.

Supervisor: Prof. Athman Bouguettaya

Eligibility: WAM>80 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application

Project Description:

As social media continues to evolve, the challenge of identifying fake images becomes increasingly complex. Traditional methods that rely on image processing or basic metadata analysis often fall short in capturing the intricate semantics that differentiate genuine images from manipulated ones.

We propose using advanced models such as RoBERTa, T5, and GPT to assess the trustworthiness of online images by learning their underlying semantics. These models will be trained to understand the deeper contextual meanings within images, enabling more accurate detection of fake content. By leveraging this sophisticated semantic analysis, we aim to improve the reliability of identifying untrustworthy images on social media platforms.

Requirement to be on campus: Yes * dependent on government’s health advice.

Supervisor: Prof. Athman Bouguettaya

Eligibility: Good programming background in either Java or Python, and good knowledge on Algorithms.

Project Description:

The composition of crowdsourced IoT services poses several trust-related challenges. New composite services are created when a single service cannot satisfy the consumer's requirements. For instance, consider a crowdsourcing environment where IoT devices provide computing services. In such an environment, IoT service providers may offer computing resources (CPU, memory) to perform processing tasks for other IoT devices.

A computation resource-poor device, like a smartwatch, may use these resources to perform computationally intensive tasks like rendering a map. However, a potential IoT service consumer may have concerns regarding the trustworthiness of the service providers.

A distrustful service provider might not protect the privacy of consumers' data or provide unreliable performance. Similarly, an IoT service provider may have concerns regarding their consumers' trustworthiness. Malicious consumers may misuse IoT services by sending malicious software. Therefore, the trustworthiness of service providers and consumers needs to be assessed before new service compositions.

For such an assessment, we need to store historical information about service providers and consumers. It is important to investigate approaches that could help guarantee the integrity of the stored data (e.g., use blockchain-based approaches).

This project aims to identify or create real-world datasets for evaluating data integrity-preserving approaches, with an additional focus on developing algorithms to detect tampered trust data.

Requirement to be on campus: Yes * dependent on government’s health advice.

Supervisor: Prof. Athman Bouguettaya

Eligibility: Good Programming Skills; Data Handling and Management Skills; Experience in Drone Flight Simulators will be a plus

Project Description:

A continuous expansion of urban areas is leading to an increased demand for instant deliveries from warehouses to customers' doorsteps. Unmanned Aerial Vehicles (UAVs) or drones have the potential to serve customers with timely and cost-effective deliveries. Drones usually operate in a skyway network, which is an interconnected set of nodes. The nodes are building rooftops that serve as recharging stations or delivery destinations for drones.

Drones may recharge at nodes for long-mile deliveries as they are constrained by limited battery capacity. These nodes are connected through skyway segments, which multiple drones may share at the same time to transit between nodes. However, managing drone traffic in congested urban environments presents significant challenges. The risk of aerodynamic interference among multiple drones operating in shared skyway segments may impact drone delivery efficiency. As a result, it may impact the smooth operation of drones in skyway network.

This project focuses on developing real-time dynamic traffic management algorithms to optimize drone routes and minimize congestion and potential inter-drone interferences. We leverage advanced machine learning-based techniques and real-time data analytics to ensure the proactive detection of potential interference zones. The goal is to dynamically adjust drone routes based on status of the drone traffic in potential interference zones.

Requirement to be on campus: Yes * dependent on government’s health advice.

Supervisor: Brandon Syiem, Prof. Eduardo Velloso

Eligibility: WAM>75 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application.

Project Description:

Mixed reality (MR) enables users to access an immersive world that blends the digital and physical. This presents exciting opportunities to enhance the way we perform tasks that require us to analyze and document details about our surroundings.

For instance, imagine enabling criminal investigators to directly record evidence details in the exact location the evidence was found. This would alleviate their task of evidence documentation by removing the need to manually document the spatial layout of the scene and the location of the evidence within that layout.

Another example can be to enable firefighters to quickly highlight and share possible exit points with their team as they perform emergency recuses in an unfamiliar building.

While these interactions are possible with MR technology, research is needed to better understand the ways in which users can create annotation and anchor them meaningfully, effectively and efficiently within space. To address this gap, this project aims to design, develop and explore different ways in which users can create spatial anchors within MR environments.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisor: Clément Canonne

Eligibility: WAM>75 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application.

Project Description:

Data privacy is a key concern, with renewed attention due to the ubiquitous use of ML, AI, and broadly speaking data analysis both in companies and to guide public policy. Several approaches to preserve privacy of people’s data have been developed, such as, notably, differential privacy, a principled, mathematical framework for privacy-preserving algorithms; or secure multi-party computation.

One major obstacle to the widespread adoption of these approaches is that the general-public perception of what the privacy risks are is by and large disconnected from the formal guarantees these algorithms provide, often seen as overly conservative or “worst-case” and unrealistic.

This project consists in collating and describing various situations leading to privacy leakages of various kinds, and creating a simple communication tool to describe various privacy risks:

Real-life scenario
Privacy threat
Technology mitigating it in the form of “model cards” (which could also be printed and provided in outreach/communication events about data privacy).

Requirement to be on campus: No.

Supervisor: Dr Hazem El-Alfy

Eligibility: Student took a Machine Learning or AI class and has excellent Python coding skills using the Keras or PyTorch library.

Project Description:

Early detection of dental caries before they spread to multiple teeth can prevent serious oral health problems. This has so far relied on performing regular checkups, including X-rays and presenting them to experienced dentists. However, some caries may be missed in the process which is also costly.

In this project, we suggest the use the deep learning techniques to analyse dental X-rays for a quick and early detection of caries which removes some burden on dentists and lowers the involved costs. The student should be familiar with medical image segmentation techniques, machine learning, and have good coding skills with deep learning libraries.

Requirement to be on campus: No

Supervisor: Dr Hazem El-Alfy

Eligibility: Student took a Machine Learning or AI class and has excellent Python coding skills using the Keras or PyTorch library.

Project Description:

Potholes are a common scene in NSW streets after wet weather. It is estimated that the cost of repairing potholes reached up to $4 billion dollars in 2022. Compensations paid to the owners of damaged cars and handling liability claims also add up to the bill.

This project aims to devise an artificial intelligence software tool to detect potholes in images. Images can be collected by UAVs or traffic cameras, but that is out of the scope. So far, councils have relied on residents to report potholes in their areas, but this is a slow process which results in more cars getting damaged by the time action is taken.

The student participating in this project will survey the recent literature in the area, choose appropriate large image datasets and develop a deep-learning architecture to detect potholes. If promising, we can publish the results of this research in a reputable computer vision conference or journal.

Requirement to be on campus: No

Supervisors: Dr. Hong Jin Kang, Dr. Rahul Gopinath, Dr. Xi Wu, Dr. Huaming Chen

Eligibility:

Fast learner with strong programming skills.
Ability to work independently

Project Description:

This project evaluates the ability of LLMs to reason about software specifications, including constraints on program inputs. While LLMs have shown promise in test generation, they still struggle with complex reasoning and hallucinations.

In this project, you will:

Construct a small benchmark of programs.
Investigate and characterize LLM capabilities in synthesizing inputs that satisfy constraints.
(Stretch Goal) Compare LLM performance against traditional input generation methods.

This project may be extended into an Honours thesis, potentially leading to a publication at a top SE venue.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisors: Dr. Hong Jin Kang, Dr. Rahul Gopinath, Dr. Xi Wu, Dr. Huaming Chen

Eligibility:

Fast learner with strong programming skills.
Ability to work independently

Project Description:

This project explores how well LLMs can utilize a debugger for identifying software bugs. While LLM agents can use coding tools, their debugging capabilities remain underexplored.

In this project, you will:

Develop Tooling:
Enable an LLM to interact with a debugger, building on top of an LLM agent framework (e.g., OpenHands).
Evaluate Debugging Capabilities:
Assess whether LLMs can effectively insert breakpoints and analyze debugging sessions.
Determine if they can successfully identify software bugs.
(Stretch Goal) Investigate Debugger-Driven Bug Fixing:
Examine whether debugger use improves LLM-assisted bug fixing.

This project may be extended into an Honours thesis, potentially leading to a publication at a top SE venue.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisor: A/Prof. Lijun Chang

Eligibility: Good algorithm design and C (or C++) programming skills

Project Description:

We are nowadays facing a tremendous amount of large-scale social networks with millions or billions of edges. Thus, there is a need of designing efficient algorithms for processing large-scale graphs.

In this project, our aim is to design efficient algorithms to speed up graph processing on ever-growing large graph datasets. The problems that we will be investigating can be (1) dense subgraph (e.g., clique, near-clique) computation over a large sparse graph which finds one dense subgraph of the maximum size, or (2) dense subgraph enumeration which enumerates all maximal dense subgraphs.

Requirement to be on campus: No

Supervisors: A/Prof. Uwe Roehm, A/Prof. Prof. Michael Cahill

Eligibility: Some experience with Python is essential, some exposure to C, Unix and database internals would be helpful but is not required.

Project Description:

This project is about the experimental evaluation of NeurDB, a research prototype of an AI-powered autonomous data system by the National University of Singapore. The goal is to verify the reported performance results of their recent CIDR2025 paper with regard to in-database machine learning with NeurDB’s AI engine.

The NeurDB code and the evaluation datasets are available on GitHub. NeurDB extends PostgreSQL and is written in a combination of C and Python. If sufficient time is available, the project would further compare NeurDB’s performance with the MADlib machine learning extension for PostgreSQL, which is a well established Apache top-level project.

This project can also be conducted by two students, in which case both students would share the system deployment, and then one student would concentrate on evaluating NeurDB’s AI engine, while the second student would investigate the effectiveness of NeuroDB’s learned query optimiser with regard to OLAP querying.

Requirement to be on campus: No

Supervisors: Dr. Xi Wu, Dr. Rahul Gopinath, Dr. Hong Jin Kang, Dr. Huaming Chen

Eligibility: Strong knowledge of Mathematics, especially Discrete Mathematics; Good at programming

Project Description:

Node mobility, as one of the most important features of Mobile Ad Hoc Networks (MANETs), may affect the reliability of communication links in the networks, leading to abnormalities and decreasing the quality of service provided by MANETs. The mCWQ calculus (i.e., CWQ calculus with mobility) is recently proposed to capture the feature of node mobility and increase the communication quality of MANETs.

In this project, we aim to implement a reasoning system in proof assistant for the mCWQ calculus to prove its correctness. Our specifications and verifications are based on Hoare Logic.

Requirement to be on campus: No

Supervisors: Dr. Rahul Gopinath, Dr. Hong Jin Kang, Dr. Xi Wu, Dr. Huaming Chen

Eligibility:

Fast learner with strong programming skills.
Ability to work independently.

Project Description:

Large Language Models (LLMs) are rapidly improving as bug fixers. This project seeks to evaluate their effectiveness in identifying and fixing bugs in unseen programs.

Challenges:

Existing programs cannot be used for evaluation as LLMs might have already been trained on them.
A different approach is required for a fair assessment.

Approach:

Create a Simple Virtual Machine (VM):
- Implement a VM with a minimal number of opcodes that are easy to monitor.
Generate Bitcode and Test Cases:
- Use fast-failure-feedback to generate random streams of bitcode.
- Execute the bitcode using symbolic execution to determine its abstract domain.
- Generate multiple test cases based on this domain.

Introduce Mutations:
- Randomly mutate the bitcode, changing its semantics.
- Generate a test case that differentiates the mutated bitcode from the original.
Evaluate LLM Debugging Capabilities:
- Provide the LLM with the original test cases and the differentiating test case.
- Assess if the LLM can identify and repair the mutation.

The results will evaluate LLM capabilities in debugging novel programs. This project may be extended into an Honours thesis, potentially leading to a publication at a top SE venue.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisors: Dr. Rahul Gopinath, Dr. Hong Jin Kang, Dr. Xi Wu, Dr. Huaming Chen

Eligibility:

Fast learner with strong programming skills.
Ability to work independently

Project Description:

Software systems often contain vulnerabilities, and one of the best ways to eliminate them is through automatic testing (fuzzing). However, identifying when an execution is interesting, i.e., when it follows a unique path, is challenging. A possible solution is to characterize usual execution paths and detect deviations.

In this project, you will:

Track execution pathways within a program.
Abstract general execution paths as a context-free grammar (CFG).
Use the CFG to distinguish uncommon paths during fuzzing.

Your deliverable will include an implementation and evaluation of this technique.

This project may be extended into an Honours thesis, potentially leading to a publication at a top SE venue.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisors: Dr. Rahul Gopinath, Dr. Hong Jin Kang, Dr. Xi Wu, Dr. Huaming Chen

Eligibility:

Fast learner with strong programming skills.
Ability to work independently

Project Description:

There are fundamentally two ways to implement advanced techniques such as concolic execution and dynamic taint tracking:

Control-flow based: This method adds a shadow instruction to every instruction that carries symbolic values or taints. This ensures that meta-values are carried along, and at the end of execution, the abstract values in the shadow variables correspond to any execution that follows the same path as the concrete execution.
Data-flow based: This method adds meta-information to the variables themselves so that they carry abstract values throughout execution.

In this project, you will implement both techniques for concolic execution or dynamic taint tracking and compare their effectiveness. This project may be extended into an Honours thesis, potentially leading to a publication at a top SE venue.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisors: Prof. Seokhee Hong, Dr. Amyra Meidiana

Eligibility: Skills Required: Data Structure and Algorithms and Programming (Java, C++, Python, Javascript)

Project Description:

Technological advances have increased data volumes in the last few years, and now we are experiencing a “data deluge” in which data is produced much faster than it can be understood by humans.

These big complex data sets have grown in importance due to factors such as international terrorism, the success of genomics, increasingly complex software systems, and widespread fraud on stock markets.

Visualisation is a powerful tool to compute good geometric representation of abstract data to support analysts to find insights and patterns in big complex data sets.

This project aims to design, implement and evaluate new visualisation algorithms for scalable and faithful visualisation of big complex data, to enable humans to find ground truth structure in big complex data sets, such as social networks and biological networks.

These new visualisation methods are in high demand by industry for the next generation visual analytic tools.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisor: Dr. Sri AravindaKrishnan Thyagarajan

Eligibility: Background in linear algebra, probability, algorithms will be helpful

Project Description:

We are moving from classical systems to post quantum systems by replacing cryptographic components that are post-quantum secure. With recent NIST standardisation we have new post-quantum candidates for digital signatures and encryption that we anticipate will be in use within the next decade.

This project will involve studying and analysing new applications of these post quantum cryptographic algorithms as they are still in their early stages of development. Can we build solutions for achieving fairness in distributed applications with these post-quantum tools?

At the end of the project the student will be trained in understanding post quantum security and how modern applications can be made fundamentally more secure and fair.

Requirement to be on campus: No

Supervisor: Dr. Sri AravindaKrishnan Thyagarajan

Eligibility: Background in linear algebra, probability, algorithms will be helpful

Project Description:

We are transitioning to decentralized ecosystems where the control is no longer with singular entities. However, with decentralized trust comes rational behaviours where entities can form coalitions and deviate from honest behaviour if it is in their best interest.

In this project we will study such attackers who form rational coalitions and try to subvert honest users. We will focus on decentralized systems like blockchains where maximal extractable value (MeV) is a major problem resulting rational deviations earning millions of dollars.

Requirement to be on campus: No

Supervisor: Sri AravindaKrishnan Thyagarajan

Eligibility: Background in probability, algorithms will be helpful

Project Description:

Multiparty computation (MPC) is a setting where we have multiple parties with private data, jointly computing some function. At the end of the computation all of them receive some secret output. This topic has been studied in cryptography for over 4 decades.

A new model called topology hiding MPC was studied recently. In addition to privacy to honest user’s data, in this model, the topology of the network should remain hidden from an attacker.

This project will involve studying the topology hiding model and strengthening the privacy guarantees. It will also involve developing new real world applications and identify their use in modern technologies like blockchain, distributed computing etc.

Requirement to be on campus: No

Supervisor: Dr Vera Chung

Eligibility: WAM>85 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application

Skills Required: Advanced Reinforcement Learning and Multi-Agent Systems Theory

Project Description:

Decentralised multi-agent reinforcement learning is garnering attention for its potential to reveal novel communication protocols and coordination mechanisms.

This project is a literature review focused on the theoretical underpinnings and data collection related to emergent signalling dynamics within these systems.

The aim is to systematically gather and analyse existing studies, synthesising data on communication emergence, decentralised coordination strategies, and the interplay of learning dynamics. Additionally, the review will employ advanced bibliometric techniques and rigorous content analysis to map the evolution of theoretical approaches in this domain.

By integrating insights from diverse sources, the study aims to uncover emerging trends and latent challenges. Ultimately, this evaluation will highlight key opportunities for future research and lay a robust theoretical foundation to guide subsequent empirical investigations in decentralised multi-agent reinforcement learning.

Requirement to be on campus: No

Supervisors: Dr Ying Zhou (Project Lead), A/Prof Simon Poon, Dr Yang Lu

Eligibility: Preference will be given to students with strong interests in applied data science in business process improvement. Students who have completed 3^rd year data science UoS are advantageous.

Project Description:

This project explores students study pathways over their coursework degree candidature by analysing students’ enrolment data using process mining in conjunction with other data-driven techniques and statistical analysis to identify enrolment patterns to improve students’ learning outcomes and patterns that would allow early detection of students with increased risk of dropping out.

By analysing course selection behaviours, enrolment sequences, and historical academic data, our aim to uncover hidden patterns could provide several benefits:

The conformance of student enrolment patterns in relation to the formal degree process models
To identify bottlenecks and limitation in the offerings to support future resource planning
Using data-driven techniques to discover other efficient and effective process models to support students’ study pathways

Process-oriented machine learning techniques will be used to help analyse students’ decision pathways (as trace), while predictive models will be used to identify risk and enabling factors that could affect learning outcomes. These insights will enable us to implement proactive “interventions”, such as personalized academic support and early advising, to improve retention and student success.

Requirement to be on campus: No

Supervisor: A/Prof. Qiang Tang

Eligibility: WAM above 75 and fast learner

Project Description:

Many basic Internet services such as cloud storage, key management in custody services, backups for secure messaging and many others, still do not have end to end security. The situation gets even worse when we consider broader online collaborations like Git services, Googledoc and many more.

In this project, we will explore how to design and build end to end encrypted online services like above that can be compatible with existing infrastructures and with minimal overheads.

Requirement to be on campus: Yes *dependent on government’s health advice.

Supervisor: Dr. Suranga Seneviratne

Eligibility:

Must be an Australian citizen, Australian permanent resident, or New Zealand Special Category Visa holder
Must be familiar with machine learning and deep learning and have taken related units
Must be fluent in Python programming and PyTorch

Project Description:

Behavioural biometrics are crucial for continuous authentication, particularly as zero-trust architectures gain prominence. Systems can verify user identity in real-time by analysing multimodal data streams – such as keystroke dynamics, mouse movements, gait, and touch patterns.

This continuous verification enhances security by detecting anomalies and preventing unauthorised access. However, real-world data is often noisy, incomplete, and subject to environmental variability, posing significant challenges for reliable classification and detection.

To this end, this project offers two potential directions. The first focuses on developing and evaluating JEPA-based models by Meta to learn robust representations from behavioural biometrics data, addressing inconsistencies and improving predictive accuracy.

The second is to explore whether the in-context learning abilities of sensor foundation models can be used for reliable continuous user authentication. This direction is based on the recent work from Google.

Requirement to be on campus: No

Supervisor: Dr Sasha Rubin, Pavle Subotic

Eligibility: Candidates should have a mathematical logic background. Experience with model checking, theorem proving or similar would be good Some programming knowledge would be good.

Project Description:

This project is to explore compositional approaches to verification of concurrent systems. The idea is that given a Owicki-Gries encoding a parametric system can be proven safe by instanting only N systems.

These are called N-index invariants. However for SMT solvers to find such invariants the system must be encoded in a supported theory e.g., presburger arithmetic. The goals of the project are to find out using the encodings suggested in https://psubotic.github.io/papers/HCVS.pdf

To what extent can we encode consensus protocols in recursive CHC schemata? We can start with the simplest encoding (e.g., ben or) and expand to more complex encodings.
If we cant, we want to find out why? And what the limitations of these encodings are?
If we can, can we automatically convert a higher level spec of the protocol to CHC?

Requirement to be on campus: No

Supervisors: Dr. Sasha Rubin, Pavle Subotic

Eligibility: Experience in Golang or very eager to learn Golang. Experience in programming and software development.   

Project Description:

The Sonic consensus protocol is programmed in a way that allows vectorization and use of Single Instruction Multiple Data (SIMD) operations. However, existing popular libraries are not integer based. Integer based libraries exist but lack the full set of operations required and have very limited testing, making them difficult to use. We want to develop a Golang Integer SIMD library.

Goals

We need a SIMD library in golang with the following operations:

vector +/- vector
vector +/- constant
vector * constant
init vector with constant (init + fill)
vector >/>= constant -> bool vector
vector </<= constant -> bool vector

The code must be well tested and read for use in production. It will likely be open sourced.

The candidate will be supervised day to day by Pavle Subotic (https://psubotic.github.io) and Filip Drobnjakovic of Sonic Labs.

Requirement to be on campus: No

Supervisor: Dr. Sasha Rubin

Eligibility: WAM>75 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application.

Project Description:

Decision problems in logic typically focus on determining whether sentences of first-order logic (FO) selected from a given class are logically valid or not. Famous such classes known to have decidable problems are the Gödel class (sentences that start with two existential quantifiers followed by any number of universal quantifiers) and the two-variable fragment of FO (sentences that use at most two-individual variables).

Gurevich and Shelah provided a novel proof of the finite model property (that every satisfiable formula has a finite model) and decidability for the Gödel class using the probabilistic methods, particularly using finite random structures.

The goal of this project is to understand the reach and possible limitations of probabilistic methods in establishing the finite model property and decidability for the two-variable fragment of first-order logic.

This project will suit a student who is very familiar with first-order logic, and has a background in Pure Mathematics or Theoretical Computer Science.

Requirement to be on campus: No

Supervisor: Dr. Sasha Rubin

Eligibility: WAM>75 and Undergraduate candidates must have already completed at least 96 credit points towards their undergraduate degree at the time of application.

Project Description:

Planning is part of the symbolic/logic approach to AI which involves finding a finite- state program that tells an agent what to do in every state. The states and possible actions are described declaratively (example).

One approach to building a planner is to reduce the given planning problem to the satisfiability problem, and implement this as a reduction to SAT solvers or in declarative programming languages such as ASP.

There are a few possible topics for this project, depending on the interest and skill of the student.

For example:

Build a declarative planner for modern (decision-theoretic) solutions such as non-dominated solutions.
Build a declarative planner that solves “lifted planning” for standard solutions (such as strong or strong-cyclic) by reducing to automated theorem proving.

All of the work will have some mathematical component, and may include a coding component depending on the topic. This project will suit a student who achieved an HD in a course on Theoretical Computer Science.

Requirement to be on campus: No

Computer science internships

List of available projects

Other research engineering internships

Computer science internships

List of available projects

CS2025/1 Mitigate Hallucination in Large-Vision-Language Model

CS2025/2 The Crucial Role of Curriculum Design in Bridging the Industrial and Academic Gaps in Applied Units

CS2025/3 GDPR Compliance Assistant: A Smart Tool for Australian Businesses to Navigate Data Privacy Regulations

CS2025/4 Handwritten Code Recognition for Pen-and-Paper based Programming Exam

CS2025/5 Notional Machines for Object Oriented Programming

CS2025/6 Distributed and Optimised LLM Inference and Fine-tuning in Large Language Model

CS2025/7 Investigating Fairness and Robustness in Algorithm Design for Distributed Computing

CS2025/8 Online Task Assignment in Edge Computing with Controllable Processing Time

CS2025/9 Benchmark for tamper-evident databases

CS2025/10 Fuzz testing the Ethereum database layer

CS2025/11 Database Performance when Values Get Longer

CS2025/12 Performance of FerretDB

CS2025/13 AI Infrastructure resource estimation and mapping

CS2025/14 AI-Powered Virtual Assistant for Healthcare Providers Using Generative AI Tools

CS2025/15 Cosy chatboxes

CS2025/16 Embodied AI Agents for Asynchronous Collaboration in Mixed Reality (MR/VR)

CS2025/17 Object-Level Blended Reality via 3D Gaussian Shape Editing and Registration

CS2025/18 Creating a Metadata-Driven Database for Detecting Fake Images on Social Media

CS2025/19 Detecting Fake AI-Generated Images

CS2025/20 Detecting Fake Social Media Images

CS2025/21 Preserving Integrity of Trust-Related Data in Crowdsourced IoT Services

CS2025/22 AI-driven Drone Traffic Management

CS2025/23 Multi-modal Spatial Annotations in Mixed Reality

CS2025/24 Privacy Cards: Communicating Data Privacy Risks

CS2025/25 Dental Caries Detection In X-Ray Images using Deep Learning

CS2025/26 Pothole detection from street images

CS2025/27 Assessing LLMs' Capabilities in Reasoning About Software Specifications

CS2025/28 Assessing LLMs' Capabilities in Debugger Use

CS2025/29 Large-Scale Graph/Social Network Analysis

CS2025/30 Evaluating NeurDB, an AI-powered Autonomous Database

CS2025/31 Implementation of the Hoare-style Reasoning System in Proof Assistant

CS2025/32 Evaluating LLMs in Debugging Unseen Programs

CS2025/33 Characterising Execution Pathways for Fuzzing

CS2025/34 Comparing Control-Flow and Data-Flow Based Execution Tracking

CS2025/35 Scalable and Faithful Visual Analytics of Big Complex Data

CS2025/36 Efficient Post Quantum Fair Exchange

CS2025/37 Rational attackers in distributed applications

CS2025/38 Topology hiding multi party computation

CS2025/39 Investigating Emergent Signalling Dynamics in Decentralised Multi-Agent Reinforcement Learning

CS2025/40 Exploration of Students Enrolments using Process Mining

CS2025/41 End to End Encrypted Online Collaborations

CS2025/42 Multimodal Behavioural Biometrics Under Challenging Conditions

CS2025/43 Compositional Verification of Consensus Protocols

CS2025/44 A SIMD Library for Integers

CS2025/45 Probabilistic Methods in Decision Problems

CS2025/46 Decision-Theoretic Planning

Other research engineering internships