What makes Gemma 4 exceptional
Six breakthrough capabilities that place Gemma 4 at the frontier of open-source AI.
Advanced Reasoning
Multi-step planning and deep logic. Major improvements in math (AIME 2026: 89.2%) and instruction-following benchmarks, enabling complex problem decomposition.
Agentic Workflows
Native function calling, structured JSON output, and system instructions. Build autonomous agents that interact with tools and APIs to execute complex workflows reliably.
Code Generation
High-quality offline code generation. Turn your workstation into a local-first AI code assistant. Scores 80% on LiveCodeBench v6 competitive coding problems.
Multimodal — Vision, Video & Audio
All models natively process images and video at variable resolutions, excelling at OCR and chart understanding. E2B/E4B also support native audio input for speech recognition.
140+ Languages
Natively trained on over 140 languages. Build inclusive, high-performance applications for a global audience with state-of-the-art multilingual understanding (MMMLU: 85.2%).
Ultra-Long Context
Process long-form content seamlessly. Edge models support a 128K context window; larger models extend to 256K tokens — pass entire repositories or long documents in one prompt.
Build real autonomous agents
Gemma 4's native function-calling and structured output support enables developers to create agents that don't just chat — they take actions. Connect to APIs, read files, navigate the web, and complete multi-step tasks.
- ✓ Native function calling with type-safe JSON schema
- ✓ Structured output for reliable pipeline integration
- ✓ System instruction support for role customization
- ✓ τ2-bench Agentic score: 86.4% (31B)
tools = [
{
"name": "search_web",
"description": "Search the internet for info",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
}
}
}
]
# Gemma 4 native function calling
response = model.generate(
messages=messages,
tools=tools,
tool_choice="auto"
) See, hear, and understand everything
All Gemma 4 models process images and video at variable resolutions. The compact E2B and E4B models extend this with native audio input, enabling real-time speech recognition and multimodal understanding on edge devices.
Explore model sizes →140+ languages, natively
Not just translation — deep semantic understanding across 140+ languages, scoring 85.2% on MMMLU (multilingual QA benchmark).
Start exploring Gemma 4
See benchmark data and hardware specs for all model sizes.