Where Should the AI Think? Dynamic Placement of Large Language Model Services in Edge Networks

Background

Large Language Models (LLMs) are rapidly becoming the foundation for intelligent assistants, autonomous systems and interactive applications. However, running advanced AI models requires significant computational resources and often introduces latency that can negatively impact user experience.

Future applications such as real-time translation, intelligent transport systems, augmented reality assistants and emergency response copilots will require AI inference to occur closer to users than traditional cloud architectures allow.

Research Challenge

Large Language Models (LLMs) are increasingly being integrated into real-time applications, including intelligent assistants, autonomous vehicles, augmented reality systems, digital healthcare services and emergency response platforms. However, these models require substantial computational resources, making it challenging to deliver low-latency AI services to users at scale.

A key challenge is determining where AI inference should execute across cloud and edge infrastructure. While cloud-based LLMs provide access to powerful computational resources, they can introduce latency, increase network traffic and create privacy concerns. Edge-based deployment can improve responsiveness and reduce data movement but is constrained by limited computing and energy resources.

This project investigates how LLM-based services can be dynamically distributed across cloud-edge environments to optimise performance, cost and resource utilisation. The research will explore challenges such as intelligent inference placement, workload migration, resource-aware model deployment, user mobility, and balancing the trade-offs between latency, accuracy and operational cost.

Example application domains include real-time language translation, AI copilots for emergency responders, intelligent transport systems, augmented reality assistants, smart-city digital services and next-generation autonomous systems. The work will involve designing and evaluating novel deployment strategies for large-scale generative AI services operating across distributed computing infrastructures.

Topics

Large Language Models; Edge AI; Generative AI systems; Hybrid cloud-edge architectures; AI inference optimisation; Resource-aware deployment

Impact

As generative AI becomes embedded within everyday life, society will increasingly depend on infrastructure capable of delivering intelligent services in real time. This project addresses one of the most important challenges in the deployment of next-generation AI applications.