Tencent Opensources OpenSearch-VL, Breaking Through Bottlenecks in Multimodal Search AI Agent Training

AI 05.08.26

on may 8, it was reported that tencent hunyuan, in collaboration with the university of california, los angeles (ucla), the chinese university of hong kong, and other institutions, jointly released the opensearch-vl open-source multimodal training framework, leveraging reinforcement learning (rl) techniques to build state-of-the-art deep search agents.

multimodal search agents are intelligent systems capable of processing multiple modalities of input, such as images and text, and proactively invoking external tools—such as search engines and image-processing utilities—to perform multi-step reasoning, evidence verification, and knowledge retrieval, with the goal of addressing complex, knowledge-intensive visual question-answering tasks. the report, published yesterday (may 6) on the arxiv platform, introduces the opensearch-vl framework for training cutting-edge multimodal deep search agents. the research team developed a high-quality data pipeline that employs wikipedia path sampling and fuzzy entity rewriting to reduce retrieval shortcuts, resulting in datasets such as searchvl-sft-36k.

the research team notes that the primary bottleneck currently hindering the advancement of state-of-the-art multimodal search agents is the availability of high-quality training data. most leading systems today are developed by commercial companies, whose data sources, filtering criteria, and tool-use logs are proprietary, thereby impeding the replication of advanced capabilities and systematic research. to address this challenge, the study proposes opensearch-vl, an end-to-end open-source solution spanning data, tools, and training algorithms.

in building the data pipeline, opensearch-vl leverages wikipedia’s hyperlink graph to perform multi-hop entity path sampling, rewrites intermediate entities into fuzzy descriptions, and anchors anchor entities to source images, thereby discouraging single-step retrieval shortcuts and encouraging the agent to learn multi-hop search and reasoning behaviors.

the pipeline generates the searchvl-sft-36k dataset for supervised fine-tuning, with each trajectory averaging 6.3 tool calls. at the same time, 10% of the data is randomly subjected to degrading treatments such as blurring and downsampling, paired with augmentation tools to induce “thinking while processing images” behavior.

the tool environment goes beyond simple retrieval agents, integrating functions such as text search, image search, ocr, cropping, sharpening, super-resolution, and perspective correction. this enables the agent to first process blurry, low-resolution, or skewed visual inputs before querying external knowledge, thus seamlessly combining proactive perception with knowledge acquisition.

experiments show that the opensearch-vl-30b-a3b model boosts the baseline average score from 47.8 to 61.6, achieving significant improvements on benchmarks such as vdr and mmsearch. ablation studies confirm the contribution of each component: removing source-anchor anchoring, fuzzy rewriting, or staged filtering results in an average score drop of 8.2 to 11.5 points.

The rumored M6‑based MacBook Pro may, for the first time, feature 5G cellular connectivity

according to multiple supply-chain sources, apple is accelerating preparations for mass production of the m6‑series macbook pro, which is expected to launch o

06.15.26 0

Samsung may equip its widescreen foldable phone with an innovative hinge technology similar to the rumored iPhone Fold, designed to significantly reduce screen creases and enhance overall device reliability

according to the latest report from korean media outlet zdnet korea, samsung is exploring a more robust display approach for its next-generation vertically fold

06.15.26 1

MIT has developed an innovative dual-mode propulsion system, specifically designed for deep-space missions with microsatellites, achieving, for the first time, a breakthrough in simultaneously enhancing both propulsion performance and energy efficiency at

the massachusetts institute of technology (mit) aerospace team is breaking through bottlenecks in micro‑ and nano‑satellite propulsion technology, developing

06.15.26 1

Ukraine has, for the first time, deployed “Terminator” AI drones to carry out autonomous target identification and strike missions, successfully killing Russian frontline personnel

according to an exclusive report by new scientist, ukraine’s military application of artificial intelligence has reached a historic turning point: in 2024, du

06.15.26 1

Apple has abandoned the Face ID solution for the iPhone Ultra, opting instead for an in‑display side-mounted fingerprint sensor—a groundbreaking design that tech bloggers have hailed as “like a dream”

a tech blogger exclaimed on social media: “apple has truly turned the $2,000 iphone ultra into reality—touch id is back on the power button, and face id is c

06.15.26 0

Attorneys general from multiple US states have launched a joint investigation into OpenAI, focusing on its advertising practices and measures to safeguard content for minors

a multi-state joint investigative team, composed of attorneys general from several states, has officially launched a formal review of the artificial intelligen

06.15.26 1

The Samsung Galaxy S27 and Xiaomi’s Mi 18 series unexpectedly appeared in the same global certification database, sparking speculation within the industry about potential adjustments to Xiaomi’s product roadmap

although several months remain before their official unveiling, the next‑generation flagship models from samsung and xiaomi have quietly appeared in global ce

06.15.26 1

Apple is secretly developing a brand-new, in-house camera app, which could make its debut alongside the iPhone 18 Pro lineup

at wwdc 2026, while apple unveiled major system updates like ios 27, it deliberately held back a key imaging feature—a completely redesigned camera app that o

06.15.26 0

Lenovo has officially launched the new Yoga Pro 7 15-inch laptop, which globally debuts support for dynamic video memory allocation technology, allowing up to 96 GB of system memory to be flexibly allocated as video memory

lenovo has officially unveiled its new flagship yoga pro 7 15ash11 laptop, built around the amd strix halo platform to redefine the high-end mobile creative ex

06.15.26 1

Huawei’s FreeClip 2 Collector’s Edition has been officially launched, debuting with the HarmonyOS 6 operating system

at 10:08 on june 15, huawei terminal officially launched the freeclip 2 collector’s edition earclip-style headphones, with a launch price of 1,499 yuan. this

06.15.26 0

The OnePlus Turbo 6X series has officially launched, pre-installed with the all-new ColorOS 16, delivering a smooth user experience that lasts for six years

on june 15, oneplus officially launched its brand-new turbo 6x series and simultaneously kicked off its first-ever omnichannel sales. the lineup includes two f

06.15.26 0

Key specs of the OPPO Find X10 Pro have surfaced: it will debut MediaTek’s flagship chip built on TSMC’s 2nm process, and its imaging system features a dual 200-megapixel ultra‑clear main camera

recently, the specifications of a flagship dimensity engineering prototype built on the cutting-edge 2nm process were unexpectedly leaked, drawing widespread a

06.15.26 0

At present, Siri’s intelligent interaction capabilities are roughly on par with the technological level of mainstream AI chatbots from six months ago

apple’s ai strategy has reached a pivotal turning point: the all-new siri is redefining the boundaries of smart assistants with its contextual understanding a

06.15.26 1

Goldman Sachs’ latest research report indicates that the market’s current assessment of the true demand for artificial intelligence is markedly conservative, while corporate‑level AI investment continues to gain momentum The report forecasts that global t

goldman sachs’ latest research report points out that the market has significantly misjudged the pace of ai infrastructure expansion—far from peaking, the wa

06.15.26 1

Oracle has announced adjustments to the resource quotas of its permanently free cloud service tiers: once the limits are exceeded, the services will be automatically suspended, and any usage beyond the quota will be billed on a pay-as-you-go basis

oracle cloud recently announced an official update to its free-tier policy, stating that starting june 15, 2026, the permanently free arm‑based plan for globa

06.15.26 0