Generative AI is being readily adopted by numerous industries to help simplify, innovate, and improve efficiencies. Coincidentally, these real-world efficiencies also come at a cost of high computing processing needs which are powered by a lot of resources.
Artificial intelligence (AI) puts a massive strain on data centres, leading to huge energy consumption and carbon emissions. Models like ChatGPT-3, with 175 billion parameters, require thousands of GPUs to work and consume over 1,287-megawatt hours of electricity, emitting 552 tonnes of CO2 in the process. This increases as each new AI model matures, increasing its environmental footprint as a result.
As the demand for AI increases, so too does the demand for a more sustainable approach to utilising AI with minimal impact to the environment.
This is where distributed AI systems come into the picture.
Distributed artificial intelligence (AI) systems are an intelligent type of AI that mimic human processing techniques. The processing effort is spread across a wide number of systems – or nodes – instead of concentrating them on one.
For example, Siemens implemented these AI models across its production lines, boosting efficiency by 20% and cutting unplanned downtime by 50%.
Professor Albert Zomaya is a leading expert on high-performance computing, and his recent studies are focused on the high energy required to power generative AI – and how to improve it.
“As these distributed AI systems scale up, a new issue called ‘Flux Uncertainty’ has surfaced,” he explains. “This impacts the stability of the system and can affect the energy efficiency of the distributed system.”
It’s the energy inefficiencies that make the demand for generative AI so volatile, making a serious impact on the environment.
Flux Uncertainty refers to the complex interactions between inconsistencies in data, variability in AI models, and the fluctuating availability of computing resources. While traditional systems experience issues in isolation, distributed AI systems are prone to unpredictable disruptions across the system.
Disruptions like data transmission delays or shortage of resources to run the systems can snowball, causing inefficiencies and lowering performance. It can be any unknown factor that could negatively impact the system’s decision-making due to incomplete information.
Professor Zomaya believes managing this Flux Uncertainty is essential for maintaining the reliability and effectiveness of distributed AI systems.
Research in addressing uncertainty has been focused on addressing singular problems, but while these have some promise in improving system efficiencies and resource allocation, a more comprehensive approach is needed.
“This is where our research comes in,” he says.
We’re aiming to systematically address Flux Uncertainty by developing robust methods to quantify uncertainty and create efficient resource management strategies for distributed AI systems.
Recent advances in AI scheduling and performance improvement have focused on three key areas:
One study improved scheduling performance by using extra information (or "guidance") beyond standard problem inputs, resulting in faster execution times. Another approach focused on selecting the best algorithm for a specific task, showing a 75% improvement in performance compared to using a single, one-size-fits-all algorithm.
Building on this broad approach, Professor Zomaya and his team’s research on learning-based methods has shown promise in improving decision-making by learning from previous experiences and speeding up algorithms without requiring significant modifications.
Professor Zomaya’s research in this field is targeted at enhancing system performance by 35-65%, leading to significant productivity gains and operational cost reductions.
“Our research goals look at understanding the nature of Flux Uncertainty and creating scheduling frameworks that account for it,” he explains.
We’re developing methods for selecting the best algorithms for specific applications. This involves building theoretical models to predict performance reliably and deploying practical solutions in real-world systems.
Their research aims to broaden the approach, moving away from single-use solutions and methods to improve performance and towards methods that can be applied across different scenarios. These methods will help bridge the gap between data collection and algorithm performance, allowing for more accurate decision-making even when faced with incomplete or uncertain information.
Doing so helps create practical tools and benchmarks for measuring performance in real-world applications, ensuring that the proposed solutions can be effectively deployed in industries such as healthcare, manufacturing, and energy.
“How we address uncertainty metrics, scheduling algorithms, and real-world performance is through our research,” says Professor Zomaya.
We’re looking to build efficient, trustworthy, and resilient distributed AI systems that can handle the complexities of modern computing environments.
For industries like energy, health, manufacturing, and other cloud and edge-based services that rely heavily on distributed systems, these improvements could be achieved without significant infrastructure changes, making uncertainty management an attractive upgrade.
Explore more research in artificial intelligence, computer science, and more.