Quick Setup

Start building with Gemma 4

Choose your preferred platform and have Gemma 4 running in minutes.

Choose your model size

E2B/E4B for mobile/edge, 26B MoE for fast inference, 31B Dense for max quality.

Pick your platform

Hugging Face Transformers, Ollama for local inference, or Google AI Studio for instant cloud access.

Download & run

Model weights are available under Apache 2.0 — free for commercial and research use.

Ollama (fastest local setup)

bash

# Install Ollama: https://ollama.com
ollama pull gemma4:27b
ollama run gemma4:27b

Hugging Face Transformers

python

from transformers import pipeline
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-4-31b-it",
    device="cuda",
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "user", "content": [
        {"type": "text", "text": "Explain quantum entanglement simply."}
    ]}
]

output = pipe(text=messages, max_new_tokens=512)
print(output[0]["generated_text"][-1]["content"])

Download from your preferred platform

Apache 2.0 — available on all major AI platforms.

Which model for your hardware?

Hardware	Recommended Model	Setup
Android / iPhone	E2B or E4B	Google AI Edge Gallery app
Raspberry Pi / Jetson Nano	E2B	LiteRT-LM or llama.cpp
Gaming GPU (8-16GB VRAM)	26B MoE (quantized)	Ollama or LM Studio
Gaming GPU (24GB+ VRAM)	31B Dense (quantized)	Ollama or LM Studio
Single H100 80GB (fp16)	31B Dense (full precision)	vLLM or Hugging Face
Google Colab Free	26B MoE (int4)	Hugging Face Transformers

🎯

Fine-tune Gemma 4 on your data

Use LoRA / QLoRA to fine-tune Gemma 4 on a single consumer GPU. Compatible with Unsloth, TRL, and Axolotl.

Unsloth ↗ Vertex AI ↗ Google Colab ↗

Explore more

Model comparison → Use cases & examples