DeepSeek releases 'Sparse Attention' model that cuts API costs by 50%

Ken Metral

29 Sep 2025 — 1 min read

Credit: DeepSeek

Researchers at DeepSeek have introduced a new experimental AI model, V3.2-exp, designed to dramatically reduce inference costs during long-context operations. The announcement was made Monday via a post on Hugging Face, accompanied by a linked academic paper published on GitHub.

A New Take on Attention

The standout feature of V3.2-exp is DeepSeek Sparse Attention, a novel system built to optimize how transformer models handle long sequences of text.

The process has two major components:

Lightning Indexer – prioritizes which excerpts from the context window deserve focus.
Fine-Grained Token Selection System – further narrows down the important tokens from those excerpts to feed into the model’s limited attention window.

By combining these steps, the model can handle much larger chunks of context without requiring proportionally higher compute power.

Big Savings on Inference

In preliminary tests, DeepSeek reported that the cost of a simple API call could be cut by up to 50% in long-context use cases. While these results are early, the company has released the model as open-weight on Hugging Face, allowing independent researchers to quickly test and validate its performance claims.

Inference costs — the server expenses involved in running a trained AI model — have become a major bottleneck for scaling generative AI systems. V3.2-exp represents one of several recent attempts to make the transformer architecture more efficient and cost-effective.

DeepSeek’s Unique Position

DeepSeek, a China-based research company, has drawn global attention for its unconventional approach to AI development. Earlier this year, its R1 model, trained primarily through reinforcement learning at a fraction of the cost of Western competitors, stirred excitement about low-cost AI training. Despite the hype, R1 did not trigger the expected industry shift, and DeepSeek’s presence in the global spotlight faded.

The new sparse attention technique may not generate the same level of fanfare, but it underscores DeepSeek’s ongoing role as an innovator in cost reduction. If widely adopted, the system could influence U.S. and other international providers grappling with the rising costs of deploying large AI models.

Microsoft launches Microsoft 365 Premium, phases out Copilot Pro

Microsoft announced Wednesday that it will discontinue promoting its standalone AI subscription for consumers and instead introduce a new offering that bundles advanced AI features with its traditional productivity suite. The new plan, Microsoft 365 Premium, will cost $19.99 per month, combining the benefits of Copilot Pro—launched in

JPMorgan is rolling out LLM Suite, using AI to automate work and reshape banking

Deep within JPMorgan Chase’s data centers, an artificial intelligence project is quietly reshaping the world’s largest bank. Known as LLM Suite, the platform is designed to harness large language models (LLMs) from leading AI developers, including OpenAI and Anthropic, and bring them directly into the bank’s workflows.

Peloton debuts Cross Training Series with new machines and AI coaching

Peloton is trying to stage a comeback. After years of turbulence following its pandemic-fueled boom, the connected fitness company is looking to reset its image — and its product strategy — with a sweeping hardware refresh and a major AI push. On Wednesday, Peloton unveiled the Cross Training Series, a complete overhaul

Amazon launches Amazon Grocery brand with 1,000+ low-cost items under $5

Amazon is expanding its footprint in the grocery sector with the launch of Amazon Grocery, a new private-label brand announced on Wednesday. The line combines the company’s existing Amazon Fresh and Happy Belly products into a single collection, offering more than 1,000 grocery items — most priced under $5.