DeepSeek Releases "Sparse Attention" Model, Halving API Costs AI NEWS

Home
AInews
DeepSeek Releases "Sparse Attention" Model, Halving API Costs

DeepSeek Releases "Sparse Attention" Model, Halving API Costs

2025-09-30

DeepSeek has introduced a new experimental model called V3.2-exp aimed at significantly reducing inference costs during long-context operations. The announcement was made via a post on Hugging Face, where the team also shared a link to the corresponding academic paper hosted on GitHub.

The central feature of this new model is DeepSeek Sparse Attention, a sophisticated system illustrated in the diagram below. At its core, the system uses a component known as the "Lightning Indexer" to prioritize specific excerpts within the context window. Following this, a separate mechanism, the "Fine-Grained Token Selection System," selects particular tokens from these excerpts to load into the limited attention window. Together, these components enable the sparse attention model to operate efficiently over long contexts with relatively low server overhead.

The benefits of this system are particularly evident in long-context scenarios. Early tests by DeepSeek revealed that the cost of a simple API call could be reduced by up to 50%. While further validation is needed to confirm these results, the model’s open-weight release on Hugging Face means third-party evaluations of the paper’s claims should emerge soon.

This latest advancement is part of a broader wave of innovations targeting inference costs—the server expenses associated with running pre-trained AI models, distinct from training costs. In DeepSeek’s case, researchers have been exploring ways to make the fundamental transformer architecture more efficient, achieving notable success.

Based in China, DeepSeek has played an unconventional role in the AI boom, especially for those viewing AI research as a key front in the U.S.-China tech rivalry. Earlier this year, the company gained attention for its R1 model, which was primarily trained using reinforcement learning at a fraction of the cost of U.S. counterparts. However, the model did not spark the widespread revolution in AI training that some had anticipated, and the company has since stepped back from the spotlight.

While the new “sparse attention” approach may not generate the same level of excitement as R1, it could still offer valuable insights to U.S.-based providers looking to keep inference costs under control.

SnapCalorie

AI photo calorie tracker for accurate nutrition

Supio

**AI legal assistant for personal injury cases**

TTS Maker

Free AI tool for converting text to speech

HireEZ

AI recruiting platform for easier hiring processes

Roam Around

AI travel planner that creates personalized itineraries

Cal AI

Instant calorie tracking with AI assistance

Magic Light AI

Transform scripts into engaging animated videos

RECENT AI TOOLS

Giskard AI

SnapCalorie

Supio

TTS Maker

HireEZ

RECENT AI NEWS

Elon Musk Plans to Launch "Grokipedia," an AI-Powered Wikipedia Competitor

Meta to Personalize Ads and News Feeds Using User AI Chat Data

Brave Updates AI-Powered Search with Enhanced Answer Feature

DeepSeek Releases "Sparse Attention" Model, Halving API Costs

OpenAI Launches Safety Routing System and Parental Controls for ChatGPT

OpenAI Launches New Shopping System to Challenge Google and Amazon

Claude Sonnet 4.5 by Anthropic Can Self-Program for Over 30 Hours

Hugging Face Launches mmBERT: A Multilingual Encoder Supporting Over 1,800 Languages

RECENT AI TOOLS