LiteRT-LM
Google's open-source production inference engine for running LLMs on-device — phone, tablet, Raspberry Pi
LiteRT-LM is Google's newly released open-source inference framework for deploying large language models directly on edge devices — Android phones, iPhones, desktops, and IoT hardware like Raspberry Pi. It is the LLM-focused successor to TensorFlow Lite, offering hardware acceleration via GPU and NPU, full cross-platform support, multimodal inputs (vision + audio), and built-in tool-use and function calling for agentic workflows. Supported model families include Gemma 4, Llama, Phi-4, and Qwen. Released in April 2026 alongside the Google AI Edge Gallery app (which lets users try models like Gemma 4 E2B on-device with no cloud dependency), LiteRT-LM is Google's clearest statement yet that on-device AI is a production priority, not a research demo. The framework targets latency reduction and privacy preservation — two requirements that cloud inference cannot satisfy for sensitive use cases like healthcare, finance, and industrial IoT. For indie developers and small teams, LiteRT-LM removes the last major barrier to shipping truly private AI applications: you no longer need to manage cloud infrastructure, pay per-token fees, or expose user data to an external API. A Gemma 4 E4B model running locally via LiteRT-LM on a mid-range Android phone is now a viable production setup. This is infrastructure-level news that will quietly power a generation of privacy-first apps.
Panel Reviews
“”
“”
“”
“”
Community Sentiment
“”
“”
“”