AWS Expands Nova Foundation Model with Enhanced Multimodal Support

2025-12-03

With the launch of Nova Forge—a platform designed for creating customized versions of the Nova foundation models—Amazon Web Services (AWS) has unveiled four new AI models that significantly expand its generative AI offerings in multimodal reasoning, speech processing, and UI automation.

Announced today at AWS’s re:Invent conference in Las Vegas, these new additions to the Nova family are each tailored for distinct levels of reasoning complexity and multimodal capabilities.

From Foundational to Advanced

Nova 2 Lite is positioned as a cost-efficient inference model optimized for everyday workloads. It processes text, images, and video to generate textual outputs for applications such as customer service chatbots, document analysis, and business automation. Users can adjust the depth of step-by-step reasoning the model performs, allowing fine-tuned trade-offs between latency and accuracy. Lite also features built-in web grounding and code execution, enabling it to incorporate up-to-date information into its responses.

AWS describes Nova 2 Pro as its most powerful reasoning model to date. Supporting inputs in text, image, video, and speech, it’s engineered for advanced tasks requiring long-horizon planning, intricate instructions, or agent-based coding. Like Lite, Pro includes web search and code execution capabilities. Additionally, it can serve as a “teacher” model for distillation, helping customers develop smaller, specialized variants for specific use cases.

Nova 2 Sonic is a speech-to-speech model that unifies understanding and generation across both text and voice. It enables real-time, multilingual conversational interactions while running background tasks asynchronously. With a context window of 1 million tokens—equivalent to roughly 75,000 lines of code or 1,500 pages of text—Sonic is built for interactive voice systems and integrates seamlessly with Amazon Connect, telephony partners, and conversational AI frameworks.

Nova 2 Omni marks the first Nova model purpose-built for fully multimodal generation. It accepts text, images, video, and speech as input and can produce both text and images as output. Designed to handle large volumes of mixed-media content—such as lengthy documents, videos, and audio files—within a single workflow, Omni eliminates the need to chain together multiple specialized models. For instance, it can ingest an entire product catalog and autonomously generate comprehensive, multi-channel marketing campaigns from the source material.

The new Nova 2 models are now generally available. Developers can prototype applications using Nova tools at nova.amazon.com/dev, while enterprises can deploy the models on Amazon Bedrock with standard enterprise-grade security, privacy, and scalability controls.