Four Sizes

Gemma 4 Model Family

From mobile devices to developer workstations — a size optimized for every deployment target.

Gemma 4 E2B

Mobile · Edge

Engineered for maximum memory efficiency on edge devices. Activates an effective 2B parameter footprint to preserve RAM and battery life. Runs completely offline.

Parameters

Effective 2B active

Context

128K tokens

Hardware

Phones, Raspberry Pi, Jetson Nano

Native audio inputNear-zero latencyFully offline

Gemma 4 E4B

Mobile · Edge

Higher capability edge model with audio and visual understanding. Integrates with Android AICore Developer Preview and ML Kit GenAI Prompt API for production use.

Parameters

Effective 4B active

Context

128K tokens

Hardware

Android, iOS, Edge GPUs

Audio + visionAICore previewML Kit GenAI API

Gemma 4 26B MoE

Workstation · #6 Open

Mixture-of-Experts model activating only 3.8B parameters during inference for exceptional tokens-per-second throughput. #6 open model on Arena AI leaderboard.

Parameters

26B total, 3.8B active (MoE)

Context

256K tokens

Hardware

Consumer GPU, Single H100

Ultra-fast inferenceLow latencyMoE efficiency

Gemma 4 31B Dense

Workstation · #3 Open

Maximum raw quality and capability. The premier model for fine-tuning and research. Fits on a single 80GB H100 GPU. Currently the #3 open model in the world on Arena AI text leaderboard.

Parameters

31B dense

Context

256K tokens

Hardware

Single 80GB H100 (fp16)

Highest qualityFine-tuning championArena AI #3

Benchmarks

Benchmark Performance

Evaluated against industry-standard datasets. See the full model card for additional benchmarks.

Benchmark	Task	31B Dense	26B MoE	Gemma 3 27B (prior gen)
Arena AI (Text)	Human preference	1452	1441	1365
MMMLU	Multilingual Q&A	85.2%	82.6%	67.6%
MMMU Pro	Multimodal reasoning	76.9%	73.8%	49.7%
AIME 2026	Mathematics	89.2%	88.3%	20.8%
LiveCodeBench v6	Competitive coding	80.0%	77.1%	29.1%
GPQA Diamond	Scientific knowledge	84.3%	82.3%	42.4%
τ2-bench Agentic	Agentic tool use	86.4%	85.5%	6.6%