Core Capabilities

What makes Gemma 4 exceptional

Six breakthrough capabilities that place Gemma 4 at the frontier of open-source AI.

🧠

Advanced Reasoning

Multi-step planning and deep logic. Major improvements in math (AIME 2026: 89.2%) and instruction-following benchmarks, enabling complex problem decomposition.

🤖

Agentic Workflows

Native function calling, structured JSON output, and system instructions. Build autonomous agents that interact with tools and APIs to execute complex workflows reliably.

💻

Code Generation

High-quality offline code generation. Turn your workstation into a local-first AI code assistant. Scores 80% on LiveCodeBench v6 competitive coding problems.

👁️

Multimodal — Vision, Video & Audio

All models natively process images and video at variable resolutions, excelling at OCR and chart understanding. E2B/E4B also support native audio input for speech recognition.

🌍

140+ Languages

Natively trained on over 140 languages. Build inclusive, high-performance applications for a global audience with state-of-the-art multilingual understanding (MMMLU: 85.2%).

📄

Ultra-Long Context

Process long-form content seamlessly. Edge models support a 128K context window; larger models extend to 256K tokens — pass entire repositories or long documents in one prompt.

Agentic AI

Build real autonomous agents

Gemma 4's native function-calling and structured output support enables developers to create agents that don't just chat — they take actions. Connect to APIs, read files, navigate the web, and complete multi-step tasks.

✓ Native function calling with type-safe JSON schema
✓ Structured output for reliable pipeline integration
✓ System instruction support for role customization
✓ τ2-bench Agentic score: 86.4% (31B)

agent_example.py

tools = [
  {
    "name": "search_web",
    "description": "Search the internet for info",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {"type": "string"}
      }
    }
  }
]

# Gemma 4 native function calling
response = model.generate(
  messages=messages,
  tools=tools,
  tool_choice="auto"
)

🖼️

Variable resolution

Image understanding

🎬

Native support

Video processing

🎙️

Speech recognition

Audio input (E2B/E4B)

📊

MMMU Pro: 76.9%

Chart & OCR

Multimodal

See, hear, and understand everything

All Gemma 4 models process images and video at variable resolutions. The compact E2B and E4B models extend this with native audio input, enabling real-time speech recognition and multimodal understanding on edge devices.

Explore model sizes →

Global Reach

140+ languages, natively

Not just translation — deep semantic understanding across 140+ languages, scoring 85.2% on MMMLU (multilingual QA benchmark).

العربية

Arabic

中文

Chinese

English

Français

French

Deutsch

German

हिन्दी

Hindi

Español

Spanish

+133 more

languages

Start exploring Gemma 4

See benchmark data and hardware specs for all model sizes.

Compare models → Quick start guide