Ai2 Launches Open AI Model to Empower Robot Action Planning in 3D Space AI NEWS

Home
AInews
Ai2 Launches Open AI Model to Empower Robot Action Planning in 3D Space

Ai2 Launches Open AI Model to Empower Robot Action Planning in 3D Space

2025-08-12

Seattle-based Allen Institute for Artificial Intelligence (Ai2) today announced the launch of MolmoAct 7B, an open-source embodied AI model that enables robots to "think" before executing physical actions, significantly enhancing their decision-making capabilities. This 7-billion-parameter model represents a paradigm shift in robotic intelligence by introducing action reasoning capabilities that transcend traditional visual language models. Unlike conventional robotics models that rely on visual inputs to interpret environments - such as analyzing images for furniture assembly guidance or executing simple object manipulation tasks - MolmoAct introduces Ai2's novel Action Reasoning Model (ARM) architecture. This innovative framework transforms natural language commands into precise physical action sequences through 3D spatial mapping and trajectory planning, as explained by computer vision lead Ranjay Krishna: "Once it perceives the environment, the model generates a 3D representation and formulates a motion plan before executing any physical movement." The model was trained on a curated 18-million-sample dataset comprising real-world scenarios from kitchens and bedrooms, focusing on goal-oriented tasks like bed-making and laundry folding. This contrasts with industry competitors who often employ black-box approaches - such as NVIDIA's GR00T-N2-2B (trained on 6-million samples across 1024 H100 GPUs) or Physical Intelligence's pi-zero - by providing full transparency through open-source code, weights, and evaluation metrics. Key innovations include pre-execution trajectory visualization allowing users to modify planned actions via natural language instructions or touchscreen adjustments. When evaluated on SimPLER benchmark tests simulating common household tasks, MolmoAct achieved 72.1% success rate, outperforming models from Physical Intelligence, Google, Microsoft, and NVIDIA. CEO Ali Farhadi emphasized this breakthrough as "not just releasing a model, but laying the foundation for AI's new era in physical world applications," with Krishna adding, "Our mission is real-world deployment - anyone can download and customize our model for specific purposes." This open approach addresses industry transparency challenges by eliminating the opaque transformer architectures typical in commercial solutions. The model's training infrastructure utilized 256 NVIDIA H100 GPUs for initial pre-training (completed in one day) and 64 GPUs for fine-tuning (2 hours), representing a more efficient training methodology compared to existing solutions.

Warp

Warp - AI coding using the terminal

Pixop

Pixop - AI video enhancement and upscaling platform

Swimm

Swimm - Reverse engineer your code

Retell AI

Retell AI - AI voice and chat agents that can make calls and send chat messages

Muset

Muset - The AI-native workspace for deep creators

Glasp

Glasp - Highlight and summarize web content effortlessly

Hybrid AI

Hybrid AI - Build custom AI agents for Web3 data

RECENT AI TOOLS

ScrapFly

Warp

Pixop

Swimm

Retell AI

RECENT AI NEWS

OpenAI's Non-Profit Parent Company Will Receive Over $100 Billion in Shares from Its Profit-Making Unit

F5 Acquires AI Security Company CalypsoAI for $180 Million

Microsoft Visual Studio 2026 Introduces “AI Integration into Workflows”

NVIDIA Supports QuEra in Expanded $230M Funding Round

FTC Investigates AI Chatbot Companions from Companies like Meta and OpenAI

OpenAI Partners with Oracle on $300 Billion Cloud Computing Agreement to Advance AI Development

Microsoft and OpenAI Continue to Surpass Partnership Boundaries

Arm Launches Lumex Chip Series Optimized for Mobile AI

RECENT AI TOOLS