Infographic: Google Gemini Model Family

GEMINI MODEL FAMILY

An in-depth look at the architecture, performance, and capabilities of Google's multimodal AI.

General Description: The Native Multimodal Brain

Unlike models that simply assemble components, Gemini was designed from the ground up to understand and reason fluidly across multiple modalities of information simultaneously. It is a single, cohesive AI, not a collection of parts.

📝

Text

🖼️

Images

🔊

Audio

🎬

Video

💻

Code

Key Milestones

Gemini's evolution has been rapid, introducing significant improvements in architecture and capacity in a short period.

December 2023

Gemini 1.0 Launch (Pro, Ultra, Nano).

February 2024

Gemini 1.5 Pro launch with MoE architecture.

Cutting-Edge Performance: Pushing the Limits

Gemini 1.0 Ultra set a new standard in massive multitasking language comprehension (MMLU), a key metric that assesses knowledge and problem-solving ability.

90.0%

MMLU Score

First model to surpass human expert-level performance.

The Quantum Leap in the Context Window

The context window defines how much information a model can process in a single query. Gemini 1.5 Pro, with its Mix of Experts (MoE) architecture, represents a monumental leap forward, enabling the analysis of entire codebases, whole books, or long video recordings in a single run.

Benchmark Mastery

The model has demonstrated state-of-the-art performance in the vast majority of the most widely used academic benchmarks for evaluating LLMs. Out of 32 key tests, it leads in 30.

Architecture and Efficiency

  • ⚙️
    Transformer Base: Optimized for maximum scalability and efficiency.
  • ⚡️
    TPU Infrastructure: Co-designed to run on Google's Tensor Processing Units (TPUs), achieving greater speed and lower cost.
  • 🌍
    Reduced Carbon Footprint: Trained in data centers that operate with a high percentage of carbon-free energy.

Safety and Ethics

  • 🛡️
    Comprehensive Assessments: Adversarial testing ("red teaming") to identify and mitigate risks of bias and toxicity.
  • 🚫
    Security Classifiers: Active filters to prevent the generation of content that violates usage policies.
  • 💧
    SynthID Watermarking: Embed an imperceptible digital watermark in generated images to identify them as created by AI.

Infographic generated from the Google Gemini model's technical specifications. Visualizations created with Chart.js and Tailwind CSS.

Table of Contents