
according to the latest report from tech media testingcatalog, openai is accelerating development of a new bidirectional voice large model codenamed “gpt-bidi-1,” which promises to be the most groundbreaking technological leap in chatgpt’s voice interaction capabilities since its inception.
unlike traditional unidirectional voice systems that follow a linear process of “listen–pause–speak,” gpt-bidi-1 achieves true real-time bidirectional collaboration at the architectural level—simultaneously executing both speech understanding and generation tasks within the same millisecond time window. this means the system can not only instantly recognize user intentions even when they interrupt, correct, or modify their input mid-conversation, but also dynamically restructure response logic without breaking the conversational flow, significantly enhancing the human-like quality and resilience of voice interactions. launched in early 2026, the project’s core breakthrough lies in establishing a closed-loop speech processing system capable of continuous perception, real-time reasoning, and seamless switching, completely eliminating reliance on predefined turn boundaries.
on the infrastructure front, openai has completed low-level adaptation for web and mobile platforms, with relevant modules already integrated into the current main codebase. upon launch, the new model will coexist with existing advanced voice features under an independent voice mode called “bidi (latest),” allowing users to switch between them with a single click as needed. even more noteworthy is that gpt-bidi-1 introduces, for the first time in the voice domain, a three-tiered intelligent response strategy: “high depth” focuses on complex reasoning and contextual refinement; “medium balance” strikes a balance between accuracy and efficiency; and “instantaneous” is optimized specifically for low-latency scenarios—this three-level adjustable mechanism grants users unprecedented autonomy over their interactions.
this upgrade goes far beyond mere improvements in sound quality or tone—it represents a critical reinforcement of openai’s multimodal capabilities. with the text-based gpt‑5.5 already firmly established as a leader in logical reasoning and long-range planning, the longstanding issue of inference latency in the voice modality has finally been systematically resolved. the deployment of gpt‑bidi‑1 marks openai’s formal elevation of voice from a supplementary channel to a strategic, native entry point for interaction, bridging gaps in the multimodal experience while paving the way for future-oriented voice-first devices, enterprise-grade real-time voice intelligence platforms, and immersive ai interaction ecosystems.