Core Capabilities

What makes Gemma 4 exceptional

Six breakthrough capabilities that place Gemma 4 at the frontier of open-source AI.

🧠

Advanced Reasoning

Multi-step planning and deep logic. Major improvements in math (AIME 2026: 89.2%) and instruction-following benchmarks, enabling complex problem decomposition.

🤖

Agentic Workflows

Native function calling, structured JSON output, and system instructions. Build autonomous agents that interact with tools and APIs to execute complex workflows reliably.

💻

Code Generation

High-quality offline code generation. Turn your workstation into a local-first AI code assistant. Scores 80% on LiveCodeBench v6 competitive coding problems.

👁️

Multimodal — Vision, Video & Audio

All models natively process images and video at variable resolutions, excelling at OCR and chart understanding. E2B/E4B also support native audio input for speech recognition.

🌍

140+ Languages

Natively trained on over 140 languages. Build inclusive, high-performance applications for a global audience with state-of-the-art multilingual understanding (MMMLU: 85.2%).

📄

Ultra-Long Context

Process long-form content seamlessly. Edge models support a 128K context window; larger models extend to 256K tokens — pass entire repositories or long documents in one prompt.

Agentic AI

Build real autonomous agents

Gemma 4's native function-calling and structured output support enables developers to create agents that don't just chat — they take actions. Connect to APIs, read files, navigate the web, and complete multi-step tasks.

  • Native function calling with type-safe JSON schema
  • Structured output for reliable pipeline integration
  • System instruction support for role customization
  • τ2-bench Agentic score: 86.4% (31B)
agent_example.py
tools = [
  {
    "name": "search_web",
    "description": "Search the internet for info",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {"type": "string"}
      }
    }
  }
]

# Gemma 4 native function calling
response = model.generate(
  messages=messages,
  tools=tools,
  tool_choice="auto"
)
🖼️
Variable resolution
Image understanding
🎬
Native support
Video processing
🎙️
Speech recognition
Audio input (E2B/E4B)
📊
MMMU Pro: 76.9%
Chart & OCR
Multimodal

See, hear, and understand everything

All Gemma 4 models process images and video at variable resolutions. The compact E2B and E4B models extend this with native audio input, enabling real-time speech recognition and multimodal understanding on edge devices.

Explore model sizes →
Global Reach

140+ languages, natively

Not just translation — deep semantic understanding across 140+ languages, scoring 85.2% on MMMLU (multilingual QA benchmark).

العربية
Arabic
中文
Chinese
English
English
Français
French
Deutsch
German
हिन्दी
Hindi
Español
Spanish
+133 more
languages

Start exploring Gemma 4

See benchmark data and hardware specs for all model sizes.