Unveiling "Snowbunny": A Deep Dive into the Leaked Gemini 3.5 Capabilities

In the fast-paced world of Large Language Models (LLMs), the rumor mill is rarely quiet. However, recent activity within the developer community has shifted from speculation to genuine excitement regarding a new, unreleased checkpoint appearing in Google AI Studio.

Affectionately dubbed "Snow bunny" by the community, this model, widely believed to be Gemini 3.5 or a high-powered iteration of Gemini 3, appears to be a significant generational leap rather than a simple incremental update.

Gemini 3.5: The "Snow Bunny" Leaks

In this post, we will explore what we know about this mysterious model, its reported "Falcon" variants, and the impressive feats of engineering it has allegedly performed in the wild.


What is Gemini 3.5 / "Snowbunny"?

While Google has officially released impressive specs for Gemini 3 Flash, the "Snowbunny" phenomenon relates to specific internal checkpoints (often identified by IDs starting with DN9 or D13) that users have encountered via A/B testing on Google AI Studio.

Unlike standard chatbots that specialize in conversation, early testers describe Snowbunny as a frontier-class engine built for full-stack creation. It reportedly sits on top of the Gemini 3 family’s multimodal backbone but pushes the boundaries of speed, reasoning depth, and agentic workflow.

The "Falcon" Variants

The leaks suggest a diverse architecture, potentially splitting capabilities across specialized models to optimize performance:

  • Fierce Falcon: A model variant reportedly optimized for pure speed and logical deduction.
  • Ghost Falcon: A creative powerhouse designed to handle UI generation, complex visuals, and audio synthesis.

This separation of concerns suggests a future where users won't just chat with an AI, but will interact with a system that dynamically routes queries to the most capable "brain" for the task.


Key Features & Observed Capabilities

The buzz around Gemini 3.5 isn't just about benchmark numbers; it is about behavior. Here are the most distinct capabilities reported by early testers:

1. The "One-Shot" Coding Revolution

Perhaps the most startling observation is the model's ability to architect entire systems in a single prompt.

  • 3,000 Lines of Code: Whereas most models truncate output or lose coherence after a few hundred lines, Snowbunny has reportedly generated over 3,000 lines of working code in one go.
  • The Game Boy Emulator: In a widely shared demonstration, the model was asked to build a Nintendo Game Boy emulator. It didn't just write a script; it purportedly architected the CPU emulation, memory management, and display logic, resulting in a functional emulator that could boot ROMs, all from a single prompt.
  • OS Simulations: Users have documented the model creating functional simulations of MacOS and Windows (complete with working file explorers and calculators) indistinguishable from the real interface.

2. "Vibe Coding" and Frontend Mastery

Snowbunny appears to embrace a philosophy developers are calling "Vibe Coding." Instead of requiring rigid technical specifications, the model understands high-level aesthetic descriptions (e.g., "minimalist fintech dashboard with neon accents") and translates them into production-ready HTML/CSS/JS.

This represents a shift from a "coding assistant" that offers snippets to a "junior engineer" that delivers a deployable first draft.

3. True Multimodality: SVG and Native Audio

Gemini 3.5 seems to be breaking down the barriers between different media types.

  • Complex Vector Graphics: In the "Pelican on a Bicycle" test, a benchmark for spatial understanding, the model reportedly generates valid, complex SVGs where objects are positioned logically, rather than creating a jumbled mess of paths.
  • Native Audio Composition: Beyond simple text-to-speech, leaks indicate the model can generate original music tracks (waveform-level audio) that fit a specific context, potentially allowing creators to generate code, visuals, and soundtracks within a single workflow.

4. System 2 "Deep Think"

Similar to other reasoning-focused models, Gemini 3.5 reportedly features a "Deep Think" toggle. This allows the model to pause and engage in "System 2" reasoning, breaking down complex logic puzzles step-by-step before answering.


The Numbers: Benchmarks and Performance

While we must approach leaked benchmarks with caution until officially verified, the reported numbers paint a picture of a new market leader:

  • Reasoning Dominance: Leaked data suggests an 80% score on hard reasoning benchmarks, a significant margin over the ~55% average of current competitors.
  • Humanity’s Last Exam: On this notoriously difficult test, the model reportedly scores between 33.7% and 40% without external tools, vastly outperforming current frontier models which often sit around 17%.
  • Beating the Unreleased: In head-to-head comparisons found in the leaks, Snowbunny reportedly outperforms hypothetical or unreleased competitors like GPT-5.2 and Claude Opus 4.5.
  • SWE-bench Verified: With a reported score of 78%, the model demonstrates a high capacity for resolving real-world software engineering issues.

A Shift in Philosophy: From Chatbot to Builder

If these leaks are accurate, Gemini 3.5 marks a conceptual shift in how we utilize AI. We are moving away from the era of the "Chatbot", a tool for Q&A and text summarization—toward the era of the "Simulation Engine."

The ability to generate comprehensive artifacts, whether they are full web applications, physics simulations, or creative media suites, suggests that Google is aiming to reduce the friction between idea and execution to near zero.


Important Note: Awaiting Confirmation

As excited as we are about these developments, it is crucial to remain objective. The information above is based on leaked internal checkpoints, A/B testing observations, and community reports.

  • Subject to Change: Features, pricing, and specific benchmark scores may change significantly before the official release.
  • Availability: Currently, access to "Snowbunny" is random and restricted to experimental environments within Google AI Studio.

We recommend treating this information as a preview of what is possible, rather than a guaranteed spec sheet.

Stay Connected: We will continue to monitor the situation closely. Be sure to follow our blog for the official announcement and verified performance metrics as soon as Google makes them public.


Disclaimer: This article is based on leaked information and community testing regarding the unreleased Gemini 3.5 model. All features and figures are subject to confirmation by Google.