DeepSeek releases 'Sparse Attention' model that cuts API costs by 50%

Cosmico - DeepSeek releases 'Sparse Attention' model that cuts API costs by 50%
Credit: DeepSeek

Researchers at DeepSeek have introduced a new experimental AI model, V3.2-exp, designed to dramatically reduce inference costs during long-context operations. The announcement was made Monday via a post on Hugging Face, accompanied by a linked academic paper published on GitHub.

A New Take on Attention

The standout feature of V3.2-exp is DeepSeek Sparse Attention, a novel system built to optimize how transformer models handle long sequences of text.

The process has two major components:

  • Lightning Indexer – prioritizes which excerpts from the context window deserve focus.
  • Fine-Grained Token Selection System – further narrows down the important tokens from those excerpts to feed into the model’s limited attention window.

By combining these steps, the model can handle much larger chunks of context without requiring proportionally higher compute power.

Big Savings on Inference

In preliminary tests, DeepSeek reported that the cost of a simple API call could be cut by up to 50% in long-context use cases. While these results are early, the company has released the model as open-weight on Hugging Face, allowing independent researchers to quickly test and validate its performance claims.

Inference costs — the server expenses involved in running a trained AI model — have become a major bottleneck for scaling generative AI systems. V3.2-exp represents one of several recent attempts to make the transformer architecture more efficient and cost-effective.

DeepSeek’s Unique Position

DeepSeek, a China-based research company, has drawn global attention for its unconventional approach to AI development. Earlier this year, its R1 model, trained primarily through reinforcement learning at a fraction of the cost of Western competitors, stirred excitement about low-cost AI training. Despite the hype, R1 did not trigger the expected industry shift, and DeepSeek’s presence in the global spotlight faded.

The new sparse attention technique may not generate the same level of fanfare, but it underscores DeepSeek’s ongoing role as an innovator in cost reduction. If widely adopted, the system could influence U.S. and other international providers grappling with the rising costs of deploying large AI models.

Read more