Building a Marshmallow Castle in Google's New AI World Generator

2026-01-30

Google DeepMind is granting access to Project Genie, an AI tool capable of crafting interactive game worlds from text prompts or images.

Starting Thursday, Google AI Ultra subscribers in the United States can test this experimental research prototype, powered by Google's latest world model, Genie 3, alongside its image generation model Nano Banana Pro and Gemini.

This move, five months after the research preview release of Genie 3, aims to gather user feedback and training data as DeepMind accelerates development of more powerful world models.

World models are AI systems that generate internal representations of environments, enabling them to predict future outcomes and plan actions. Many AI leaders, including DeepMind, consider world models a crucial step toward Artificial General Intelligence (AGI). However, in the near term, labs like DeepMind envision a market strategy starting with video games and other entertainment, later expanding to training embodied agents (i.e., robots) within simulations.

DeepMind's release of Project Genie comes as competition in world models intensifies. Late last year, Fei-Fei Li's World Labs launched its first commercial product, Marble. AI video generation startup Runway also recently introduced a world model. AMI Labs, the startup founded by former Meta chief scientist Yann LeCun, will likewise focus on developing world models.

"I think it's exciting to be able to get it into more people's hands and get their feedback," said Shlomi Fruchter, DeepMind's Research Director, his face alight with enthusiasm for Project Genie's launch during a video interview.

DeepMind researchers candidly acknowledge the tool's experimental nature. It can be inconsistent, sometimes impressively generating playable worlds and other times producing confusing results. Here's how it works.

You begin by creating a "world sketch" with a text prompt describing the environment and protagonist. You can then control the protagonist from a first-person or third-person perspective. Nano Banana Pro creates an image from the prompt, which you can theoretically modify before Genie uses it as the starting point for an interactive world. Modifications often work, but the model occasionally errs, such as giving purple hair when you request green.

You can also use real photographs as a foundation for the model to build a world, with similarly mixed success. (More on that later.)

Once satisfied with the image, Project Genie creates an explorable world within seconds. You can also remix existing worlds by building upon their prompts or explore curated worlds via a gallery or random tool for inspiration. You can then download a video of the world you just explored.

DeepMind currently limits world generation and navigation to 60 seconds, partly due to budget and computational constraints. Because Genie 3 is an autoregressive model requiring significant dedicated computing resources, this limits the amount DeepMind can provide to users.

"The reason we limit it to 60 seconds is because we want more users to be able to use it," Fruchter explained. "Essentially, when you use it, there is a chip that is dedicated to you and dedicated to your session."

He added that extending beyond 60 seconds offers diminishing returns for testing.

"The environments are fun, but the dynamism of the environment is somewhat limited due to the level of interactivity. However, we see this as a limitation we want to improve."

Whimsy Works, Realism Doesn't

During my testing, safety guardrails were active. I could not generate anything resembling nudity or worlds related to Disney or other copyrighted material. (In December, Disney issued a cease-and-desist notice to Google, alleging the company's AI models infringed copyright by generating unauthorized content through training on Disney characters and intellectual property, among other actions.) I couldn't even get Genie to generate worlds of mermaids exploring underwater fantasies or ice queens in their winter castles.

Despite this, the demo was impressive. My first world attempted to fulfill a small childhood fantasy: exploring a cloud castle made of marshmallows, surrounded by rivers of chocolate sauce and candy trees. (Yes, I was a chubby kid.) I requested a claymation style, and it delivered a fantastical world my younger self would have adored; the castle's pastel and white spires and turrets looked fluffy and delicious, as if you could tear off a piece to dip in the chocolate moat.

That said, Project Genie has issues to resolve.

The model excels at creating worlds based on artistic prompts, such as using watercolor, anime style, or classic cartoon aesthetics. However, it often falters when generating photorealistic or cinematic worlds, typically appearing more like video games than real people and scenes.

It also doesn't always respond well to real photographs. When I provided a photo of my office and asked it to build a world based on it, I received a world with some of the same furniture—a wooden desk, plants, a gray sofa—but in a different layout. It looked cold, digital, and unreal.

When I gave it a photo of a plush toy on my desk, Project Genie animated the toy navigating the space, with other objects occasionally reacting as it passed by.

This interactivity is something DeepMind is working to improve. Several times, my character walked directly through walls or other solid objects.

When DeepMind initially released Genie 3, researchers emphasized that the model's autoregressive architecture meant it could remember what it generated. I wanted to test this by returning to parts of an environment it had already created to see if they remained the same. Most of the time, the model succeeded. In one case, I generated a cat exploring another desk; only once, when I turned back to the right side of the desk, did the model generate a second cup.

The most frustrating part for me was the navigation, using arrow keys to look around, the spacebar to jump or ascend, and W-A-S-D keys to move. Not being a gamer, this wasn't intuitive for me, but these keys were often unresponsive or sent you in the wrong direction. Attempting to walk from one side of a room to a doorway on the other often devolved into a chaotic zigzag motion, like trying to steer a shopping cart with a broken wheel.

Fruchter assured me his team is aware of these shortcomings, reiterating that Project Genie is an experimental prototype. He said the team aims to enhance realism and improve interactive capabilities in the future, including giving users more control over actions and the environment.

"We don't see [Project Genie] as an end-to-end product people can use every day, but we think there is already something interesting and unique that you cannot get in other ways," he said.