Google Launches New LiteRT Accelerator to Speed Up AI Workloads on Snapdragon Android Devices

2025-12-01

Google has introduced a new accelerator for LiteRT called Qualcomm AI Engine Direct (QNN), designed to enhance AI performance on Qualcomm-powered Android devices featuring Snapdragon 8 SoCs. This accelerator delivers dramatic speedups—up to 100x faster than CPU execution and up to 10x faster than GPU processing.

Although modern Android devices commonly include GPU hardware, Google software engineers Lu Wang, Wiyi Wanf, and Andrew Wang note that relying solely on GPUs for AI workloads can create performance bottlenecks. For instance, they explain that running a compute-intensive text-to-image generation model alongside real-time camera processing using ML-based segmentation can overwhelm even high-end mobile GPUs, leading to stuttering and frame drops that degrade user experience.

Fortunately, many contemporary mobile devices now integrate Neural Processing Units (NPUs)—dedicated AI accelerators that significantly outperform GPUs on AI tasks while consuming far less power.

Developed in close collaboration with Qualcomm, QNN replaces the previous TFLite QNN delegate. It offers a unified and streamlined workflow by integrating a broad set of SoC compilers and runtimes through a simplified API for developers. Supporting 90 LiteRT operations, QNN aims to enable full-model delegation, a critical factor for achieving peak performance. The solution also includes specialized kernels and optimizations that further accelerate large language models like Gemma and FastVLM.

Google benchmarked QNN across 72 machine learning models, with 64 successfully achieving full NPU delegation. Results showed performance gains of up to 100x over CPU and up to 10x over GPU execution.

On Qualcomm’s latest flagship SoC, the Snapdragon 8 Elite Gen 5, the improvements are especially striking: more than 56 models ran in under 5 milliseconds on the NPU, compared to only 13 models achieving that speed on the CPU. This breakthrough unlocks real-time AI experiences previously unattainable on mobile devices.

Google engineers also built a proof-of-concept application leveraging an optimized version of Apple’s FastVLM-0.5B vision encoder model. The app can interpret live camera scenes nearly instantaneously. On the Snapdragon 8 Elite Gen 5 NPU, it achieves a Time-to-First-Token (TTFT) of just 0.12 seconds on 1024×1024 images, with prefill speeds exceeding 11,000 tokens per second and decoding throughput surpassing 100 tokens per second. The model was optimized using int8 weight quantization and int16 activation quantization—a key enabler for tapping into the NPU’s most powerful and high-speed int16 kernels, according to Google engineers.

QNN currently supports a limited subset of Android hardware, primarily devices powered by Snapdragon 8 and Snapdragon 8+ SoCs. Developers interested in getting started can refer to the NPU acceleration guide and download LiteRT from GitHub.