Speech Graphics: Revolutionizing Facial Animation with Audio-Driven AI

Headquartered in Edinburgh Speech Graphics is Powering Immersive Characters in Gaming, Film, and the Metaverse.

digitalscotland September 3, 2025

0 578 8 minutes read

This entry is part 10 of 15 in the series Showcasing Scotland's Digital Leaders

Speech Graphics, an Edinburgh-based company, is transforming digital character animation through AI that generates realistic facial movements directly from audio input.

By automating lip synchronization and emotional expressions, the technology eliminates the need for costly and time-consuming motion capture, enabling more immersive experiences in gaming, film, and the metaverse.

Founded in 2010 as a spin-out from the University of Edinburgh’s School of Informatics, Speech Graphics draws on over 20 years of research in speech technology, linguistics, AI, and procedural animation.

The company maintains its headquarters in Edinburgh, with additional offices in Budapest, Singapore, and San Francisco. It has raised approximately $9.62 million in funding and employed 42 people as of 2022. Recognized for innovation, Speech Graphics has received the John Logie Baird Award and the TIGA Award for Best Animation Supplier.

Core Technology and Products

At the heart of Speech Graphics’ offerings is its audio-driven approach. Instead of relying on video-based facial capture or manual keyframing, the system analyzes speech audio to produce synchronized lip movements, facial expressions, and emotional nuances. This method ensures natural, believable performances across multiple languages and speaking styles.

The company’s flagship products include:

SGX: Designed for offline, high-fidelity animation. It delivers production-quality results ideal for cinematics, cutscenes, and film/TV projects where visual precision is paramount.
SG Com: Optimized for real-time applications, achieving low latency of around 50 milliseconds. Running efficiently on CPU, it supports consoles, mobile devices, and custom game engines without heavy GPU demands.
Rapport Platform: Extends the technology to interactive, AI-driven characters. This enables dynamic conversations and responses in enterprise settings such as customer service, healthcare training, education, and metaverse environments.

These tools integrate seamlessly into major engines like Unreal and Unity, making adoption straightforward for developers.

Industry Impact and Notable Clients

Speech Graphics technology has powered facial animation in numerous high-profile titles. It contributed to The Last of Us Part II, Hogwarts Legacy (including multi-language support), and Shadow of the Tomb Raider. Major clients include Warner Bros., Microsoft, and Square Enix. In Hogwarts Legacy, the system handled dialogue across eight languages, demonstrating its scalability and linguistic versatility.

Beyond gaming, the technology reduces costs in film and music video production compared to traditional motion capture. In the emerging metaverse and interactive virtual assistant (IVA) sectors, Rapport facilitates lifelike avatars that respond naturally to voice input.

The broader facial animation software market serves entertainment as well as enterprise applications like healthcare communication aids and training simulations. Collaborations, such as those with UC San Francisco and Berkeley on brain-computer interfaces for speech and facial synthesis, highlight potential medical uses for restoring communication abilities.

AI’s Broader Role in Game Development in 2026

Speech Graphics operates within a rapidly evolving AI landscape in game development. Traditional pipelines involve labor-intensive modeling, texturing, scripting, testing, and iteration. In 2026, AI acts as a collaborative partner, speeding up prototyping, generating assets, and enabling more responsive, personalized gameplay.

Procedural content generation (PCG) has long created varied environments—think No Man’s Sky’s vast planets or Minecraft’s worlds—using algorithms and randomness. Generative AI advances this further by producing context-aware assets, including 3D models, textures, animations, dialogue, and adaptive level designs based on natural language prompts. Tools like Rosebud AI, Ludo.ai, and Promethean AI lower barriers for indie developers and small teams.

Key AI applications in the 2026 pipeline include:

Asset Creation and Visuals: Generating concept art, upscaling textures, modeling, and animating characters or environments.
Smarter NPCs: Dynamic characters with memory, goals, and emotional reactions, leading to emergent storytelling and “agentic” behaviors.
Testing and Optimization: Automated QA, bug detection, path simulation, and difficulty balancing.
Code and Design Assistance: Large language models aid scripting, architecture decisions, and rapid iteration.

By the end of 2025, over 4,000 Steam games had incorporated AI elements, with expectations that roughly one in three 2026 releases would involve AI in some capacity. The global games market, valued at $217 billion in 2022, is projected to reach $583 billion by 2030, growing at a 13.2% CAGR—fueling demand for efficient animation tools like those from Speech Graphics.

NVIDIA DLSS 5: A Landmark in Real-Time Graphics

A standout AI advancement in 2026 is NVIDIA’s DLSS 5, unveiled at GTC in March 2026. Building on earlier versions focused on upscaling and frame generation, DLSS 5 introduces real-time neural rendering that dramatically enhances visual fidelity.

DLSS 5 goes further by infusing photorealistic effects—subsurface scattering, fabric details, hair interactions, advanced lighting, shadows, and rim lighting—directly into the rendering pipeline. It uses semantic understanding of scene elements (characters, materials, environments) for temporally stable results that respect artistic intent. Developers retain controls for intensity, color, and masking.

Demos in titles like Resident Evil Requiem, Hogwarts Legacy, Starfield, and Assassin’s Creed Shadows showcase deeper lighting, material realism, and fine details. Optimized for the upcoming RTX 50-series (Blackwell architecture), with strong performance on the RTX 5090, DLSS 5 is expected to become a default feature in many engines via NVIDIA Streamline. It represents a potential “GPT moment for graphics,” bridging real-time and cinematic quality while maintaining performance on consumer hardware.

Opportunities and Challenges Ahead

AI-driven tools like Speech Graphics and DLSS 5 promise significant benefits. They democratize high-quality production, allowing indie creators to compete on scope and freeing larger studios to focus on creative vision. Hyper-personalized gameplay, adaptive narratives, persistent worlds, and enhanced user-generated content become feasible. Neural rendering and world models could enable infinite, responsive experiences.

However, challenges remain. Critics warn of “AI slop”—generic, inconsistent, or artifact-ridden outputs that dilute artistic quality or produce uncanny results. Maintaining coherence, style, and emotional depth requires strong human oversight; AI functions best as an augmentative “centaur” partnership rather than a full replacement.

Ethical and practical concerns include potential job displacement in entry-level art and testing roles, issues around training data copyright, privacy in personalized experiences, and the risk of over-reliance leading to homogenized content. Technical hurdles involve ensuring frame consistency, managing computational costs, integrating with existing engines, and aligning outputs with specific aesthetics. Consumer reactions are mixed: excitement about innovation alongside worries about authenticity and rushed productions.

Successful implementations emphasize quality control and human guidance to preserve substance alongside speed and scale.

A Collaborative Future for Digital Entertainment

In 2026, AI stands at an inflection point in game development and digital media. Companies like Speech Graphics exemplify how specialized audio-driven AI can solve persistent bottlenecks in character animation, while broader tools like DLSS 5 push real-time visuals toward unprecedented realism.

The future lies in balanced collaboration: AI accelerating production and expanding creative possibilities without supplanting human artistry. This approach can amplify storytelling, deepen player empathy, and open doors for diverse voices and surprising interactive experiences.

As the metaverse grows—projected to reach $1.3 trillion by 2030—and interactive virtual assistants expand, technologies that make digital characters feel truly alive will play a central role. Speech Graphics, rooted in Scottish innovation, is well-positioned to contribute to this more responsive, personal, and immersive era of entertainment and beyond.

AI is transforming metaverse avatars from static digital representations into dynamic, lifelike entities capable of natural conversation, emotional expression, and real-time interaction. As virtual worlds evolve in 2026, AI bridges the gap between user intent and immersive presence, making avatars feel more human and enabling deeper social, professional, and entertainment experiences.

Evolution of Avatars in the Metaverse

The concept of avatars in persistent virtual spaces traces back to Neal Stephenson’s 1992 novel Snow Crash, which envisioned diverse, expressive digital selves interacting in a shared “metaverse.” Early platforms like Second Life introduced customizable 3D characters, but they often felt cartoonish or robotic.

In 2026, AI shifts avatars toward hyper-realism and autonomy. User-created avatars now replicate real-time facial expressions, body language, and emotional responses via VR headset cameras or audio input. AI also powers intelligent NPCs (non-player characters) that adapt to conversations, creating emergent storytelling in games and social spaces.

Key enablers include:

Generative AI for quick avatar creation from photos, videos, or text prompts.
Real-time animation systems that sync lip movements, gestures, and micro-expressions to speech or user input.
Multimodal AI combining natural language processing (NLP), computer vision, and emotional intelligence for context-aware behavior.

Core AI Technologies Powering Metaverse Avatars

Several technologies make avatars responsive and believable:

Audio-Driven Facial Animation: Systems analyze speech audio to generate synchronized lip movements and emotional expressions without manual keyframing or heavy motion capture. This runs efficiently in real time (low latency, ~50-100ms) on standard hardware like CPUs, suiting VR/AR headsets and mobile devices.
Generative Models and Neural Rendering: Diffusion models, transformers, and neural radiance fields (NeRFs) create or enhance avatars. Tools generate full-body movements, dynamic backgrounds, or photorealistic details like skin subsurface scattering and hair. NVIDIA’s Audio2Face and RTX technologies exemplify this, animating 2D/3D avatars from audio while delivering real-time path-traced realism.
Emotional and Conversational AI: Avatars detect tone, adapt responses, and display nuanced emotions (frustration, empathy, hesitation). Large language models (LLMs) combined with vision enable “listening” behaviors—head nods, gaze shifts, and reactive expressions—making interactions feel reciprocal.
Avatar Creation and Personalization: AI tools convert selfies or scans into 3D models with customizable features (hairstyles, clothing, expressions). Platforms support interoperability, allowing avatars to move across worlds while retaining consistent identity and style.

Spotlight: Speech Graphics and the Rapport Platform

Edinburgh-based Speech Graphics leads in audio-driven animation, with technology proven in games like The Last of Us Part II, Hogwarts Legacy, and Shadow of the Tomb Raider. Their Rapport platform extends this to interactive AI avatars for the metaverse and enterprise.

Rapport enables users to build, animate, and deploy emotionally intelligent virtual characters quickly. Key features:

Real-time 3D rendering with precise lip-sync and full emotional range.
Integration for training, customer service, education, and marketing.
Recent partnerships (e.g., with Neuphonic for ultra-low-latency on-device voice) deliver sub-100ms photorealistic digital humans on standard CPU hardware.

Rapport avatars react dynamically in conversations, making them suitable for role-play simulations, language learning, medical training, and metaverse social spaces. This moves beyond simple nodding avatars to ones that convey defensiveness, surprise, or empathy.

Broader Ecosystem and Applications in 2026

Gaming and Entertainment: AI avatars create smarter NPCs and personalized player experiences. Hyper-realistic characters enhance immersion in persistent worlds.
Enterprise and Training: Companies use interactive avatars for simulations, customer support, and virtual meetings. Emotional intelligence builds trust and engagement.
Education and Healthcare: Adaptive tutors respond to student emotions; medical avatars simulate patient interactions.
Social and Brand Experiences: Virtual influencers and brand ambassadors host events or personalize chats in metaverse environments.
Hybrid Realities: Integration with AR/VR and wearables blends physical and digital presence.

NVIDIA’s ACE suite and tools like Synthesia, HeyGen, or Ready Player Me complement these efforts, offering scalable creation pipelines for metaverse-ready assets.

Opportunities

AI democratizes high-quality avatar production, lowering barriers for indie creators and small teams while allowing studios to focus on creativity. Benefits include:

Hyper-personalization: Avatars that evolve with users or adapt to contexts.
Scalability: Real-time interactions at massive scale without proportional human effort.
Accessibility: Tools for users with disabilities (e.g., brain-computer interfaces or voice-driven controls).
New Economies: Interoperable avatars, virtual influencers, and user-generated content drive metaverse commerce and social platforms.

The global AI avatar market is expanding rapidly, projected to grow significantly as metaverse and mixed-reality adoption increases.

Looking Ahead

In 2026, AI makes metaverse avatars central to more natural, empathetic digital interactions. Technologies like Speech Graphics’ Rapport, combined with neural rendering and multimodal models, push toward avatars that not only look real but feel responsive and alive.

The future emphasizes balanced human-AI collaboration: AI accelerates creation and responsiveness, while human creativity ensures emotional depth, cultural nuance, and artistic intent. As spatial computing, 5G/edge networks, and interoperable standards mature, avatars could become persistent digital extensions of ourselves—companions, collaborators, and representations across evolving virtual landscapes.

This convergence promises richer storytelling, inclusive training, personalized commerce, and social connections, but success depends on addressing technical, ethical, and creative challenges thoughtfully. Companies rooted in proven animation tech, like Speech Graphics, are well-placed to lead this shift toward more immersive and human-centered virtual worlds.