快速配置

开始使用 Gemma 4 构建应用

选择你喜欢的平台，几分钟内即可运行 Gemma 4。

选择模型型号

E2B/E4B 适用于移动/边缘端，26B MoE 适合快速推理，31B Dense 追求最高质量。

选择部署平台

Hugging Face Transformers、用于本地推理的 Ollama，或者用于即时云端访问的 Google AI Studio。

下载并运行

模型权重基于 Apache 2.0 协议 —— 商业和研究用途均免费。

Ollama（最快的本地部署方式）

bash

# 安装 Ollama: https://ollama.com
ollama pull gemma4:27b
ollama run gemma4:27b

Hugging Face Transformers

python

from transformers import pipeline
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/gemma-4-31b-it",
    device="cuda",
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "user", "content": [
        {"type": "text", "text": "用简单的语言解释量子纠缠。"}
    ]}
]

output = pipe(text=messages, max_new_tokens=512)
print(output[0]["generated_text"][-1]["content"])

下载模型权重

Hardware Guide

Hardware	Recommended Model	Setup
Android / iPhone	E2B or E4B	Google AI Edge Gallery app
Raspberry Pi / Jetson Nano	E2B	LiteRT-LM or llama.cpp
Gaming GPU 8-16GB VRAM	26B MoE (quantized)	Ollama or LM Studio
Gaming GPU 24GB+ VRAM	31B Dense (quantized)	Ollama or LM Studio
Single H100 80GB (fp16)	31B Dense (full precision)	vLLM or Hugging Face
Google Colab Free	26B MoE (int4)	Hugging Face Transformers

模型对比 → 使用场景