
deepseek has recently completed a full‑stack performance upgrade, achieving significant improvements in response speed and service capabilities: the default concurrency capacity has been increased to 500 concurrent real-time requests, markedly reducing end-to-end latency and greatly enhancing service stability and availability. for high‑load business scenarios, enterprise users can request higher concurrency quotas through official channels as needed, flexibly adapting to ever‑growing inference demands.
also implemented simultaneously is an adjustment to the long-term pricing strategy for the deepseek‑v4‑pro model api. the previously offered 75% discount promotion will expire on may 31, 2026; thereafter, a new tiered, inclusive pricing structure will take effect—cache‑hit input costs are reduced to rmb 0.025 per million tokens (a 75% reduction), non‑cache input is priced at rmb 3 per million tokens (also a 75% reduction), and output fees are optimized to rmb 6 per million tokens (a 75% reduction). this pricing remains highly competitive when compared horizontally across the industry.
with enhanced performance coupled with cost optimization, the deepseek api now boasts dual competitiveness in key areas such as high throughput, low latency, and scalable deployment. it is particularly well suited for enterprise‑level applications that demand rapid response times and frequent calls, including real‑time financial risk control, intelligent e‑commerce customer service, and multimodal content generation. this upgrade not only strengthens technical delivery capabilities but also sends a clear signal that deepseek will continue to deepen its commitment to enterprise services, driving ai adoption while reducing costs and boosting efficiency. detailed configuration instructions and the application portal are now available on the official platform.