Quick Setup
Start building with Gemma 4
Choose your preferred platform and have Gemma 4 running in minutes.
1
Choose your model size
E2B/E4B for mobile/edge, 26B MoE for fast inference, 31B Dense for max quality.
2
Pick your platform
Hugging Face Transformers, Ollama for local inference, or Google AI Studio for instant cloud access.
3
Download & run
Model weights are available under Apache 2.0 — free for commercial and research use.
# Install Ollama: https://ollama.com
ollama pull gemma4:27b
ollama run gemma4:27b from transformers import pipeline
import torch
pipe = pipeline(
"image-text-to-text",
model="google/gemma-4-31b-it",
device="cuda",
torch_dtype=torch.bfloat16,
)
messages = [
{"role": "user", "content": [
{"type": "text", "text": "Explain quantum entanglement simply."}
]}
]
output = pipe(text=messages, max_new_tokens=512)
print(output[0]["generated_text"][-1]["content"]) Download from your preferred platform
Apache 2.0 — available on all major AI platforms.
Which model for your hardware?
| Hardware | Recommended Model | Setup |
|---|---|---|
| Android / iPhone | E2B or E4B | Google AI Edge Gallery app |
| Raspberry Pi / Jetson Nano | E2B | LiteRT-LM or llama.cpp |
| Gaming GPU (8-16GB VRAM) | 26B MoE (quantized) | Ollama or LM Studio |
| Gaming GPU (24GB+ VRAM) | 31B Dense (quantized) | Ollama or LM Studio |
| Single H100 80GB (fp16) | 31B Dense (full precision) | vLLM or Hugging Face |
| Google Colab Free | 26B MoE (int4) | Hugging Face Transformers |
🎯
Fine-tune Gemma 4 on your data
Use LoRA / QLoRA to fine-tune Gemma 4 on a single consumer GPU. Compatible with Unsloth, TRL, and Axolotl.