What is WebLLM?

WebLLM introduces a method for deploying and running large language models (LLMs) entirely on the client-side, directly within a web browser. Leveraging WebGPU for hardware acceleration, it addresses the challenges of server dependency and high computational costs typically associated with generative AI models. This approach allows developers to integrate powerful AI capabilities into web applications, offering benefits such as reduced operational expenses, enhanced user privacy as data processing occurs locally, and increased potential for personalization.

The engine provides robust functionality, including full compatibility with the OpenAI API standard, supporting features like JSON-mode, function-calling, and streaming for real-time interactions. WebLLM natively supports a wide array of popular open-source models, such as Llama, Phi, Gemma, and Mistral, and facilitates the integration of custom models in the MLC format. Integration into projects is streamlined through standard package managers (NPM, Yarn) or CDN links, complemented by comprehensive examples. It also supports Web Workers and Service Workers for optimizing performance and managing model lifecycles efficiently, along with capabilities for building Chrome extensions.

Features

In-Browser Inference: Leverages WebGPU for hardware-accelerated LLM operations directly within web browsers.
Full OpenAI API Compatibility: Supports JSON-mode, function-calling, streaming, and more.
Extensive Model Support: Natively supports models like Llama, Phi, Gemma, RedPajama, Mistral, Qwen, etc.
Custom Model Integration: Allows integration and deployment of custom models in MLC format.
Plug-and-Play Integration: Easy integration via NPM, Yarn, or CDN with examples.
Streaming & Real-Time Interactions: Enables real-time output generation for interactive applications.
Web Worker & Service Worker Support: Offloads computations for optimized UI performance and model lifecycle management.
Chrome Extension Support: Enables building Chrome extensions using WebLLM.