Tencent, in collaboration with Renmin University of China, has unveiled PlanningBench, an open-source evaluation benchmark that focuses on the systematic assessment of large models’ capabilities in complex task planning

AI 06.08.26

recently, tencent’s hunyuan team, in collaboration with renmin university of china’s gaoling academy of artificial intelligence and several other research institutions, officially released and open-sourced a new framework for evaluating and training planning capabilities—planningbench. anchored in real-world planning problems, this framework establishes a data-generation and evaluation system that is both scalable, verifiable, and diverse in tasks, aiming to systematically measure and enhance the structured decision-making abilities of large language models under complex constraints.

breaking away from the limitations of traditional single-task evaluations, planningbench achieves, for the first time, full coverage of six core planning scenarios: schedule planning, resource allocation, workforce scheduling, route optimization, production management, and emergency response, encompassing more than 30 sub‑tasks. its data-generation mechanism does not rely on simply increasing prompt length; instead, it dynamically adjusts difficulty levels based on essential dimensions such as task topology, multi‑layer constraint coupling, and the degree of resource supply‑demand tension, ensuring that each sample directly addresses real‑world planning bottlenecks. each instance comes with a structured checklist that conducts triple validation—from input consistency and constraint satisfaction to objective optimality—comprehensively identifying feasibility issues in model outputs.

most notably, the framework innovatively introduces a dual-track evaluation paradigm of “local compliance–global feasibility,” enabling precise identification of typical failure modes such as “steps are correct but overall conflicts persist” or “resource allocation is reasonable yet impractical.” this significantly enhances the ability to diagnose the model’s underlying planning logic. empirical results show that after reinforcement training using verifiable data generated by planningbench, models not only demonstrate markedly improved performance on unseen planning benchmarks but also exhibit cross-domain transfer advantages in general reasoning and multi‑step tasks. as a result, planningbench establishes a complete closed loop—“scenario-driven–data generation–verifiable training–generalization evaluation”—providing a solid foundation for the scientific assessment and efficient advancement of large models’ planning capabilities.

The rumored M6‑based MacBook Pro may, for the first time, feature 5G cellular connectivity

according to multiple supply-chain sources, apple is accelerating preparations for mass production of the m6‑series macbook pro, which is expected to launch o

06.15.26 0

Samsung may equip its widescreen foldable phone with an innovative hinge technology similar to the rumored iPhone Fold, designed to significantly reduce screen creases and enhance overall device reliability

according to the latest report from korean media outlet zdnet korea, samsung is exploring a more robust display approach for its next-generation vertically fold

06.15.26 1

MIT has developed an innovative dual-mode propulsion system, specifically designed for deep-space missions with microsatellites, achieving, for the first time, a breakthrough in simultaneously enhancing both propulsion performance and energy efficiency at

the massachusetts institute of technology (mit) aerospace team is breaking through bottlenecks in micro‑ and nano‑satellite propulsion technology, developing

06.15.26 1

Ukraine has, for the first time, deployed “Terminator” AI drones to carry out autonomous target identification and strike missions, successfully killing Russian frontline personnel

according to an exclusive report by new scientist, ukraine’s military application of artificial intelligence has reached a historic turning point: in 2024, du

06.15.26 1

Apple has abandoned the Face ID solution for the iPhone Ultra, opting instead for an in‑display side-mounted fingerprint sensor—a groundbreaking design that tech bloggers have hailed as “like a dream”

a tech blogger exclaimed on social media: “apple has truly turned the $2,000 iphone ultra into reality—touch id is back on the power button, and face id is c

06.15.26 0

Attorneys general from multiple US states have launched a joint investigation into OpenAI, focusing on its advertising practices and measures to safeguard content for minors

a multi-state joint investigative team, composed of attorneys general from several states, has officially launched a formal review of the artificial intelligen

06.15.26 1

The Samsung Galaxy S27 and Xiaomi’s Mi 18 series unexpectedly appeared in the same global certification database, sparking speculation within the industry about potential adjustments to Xiaomi’s product roadmap

although several months remain before their official unveiling, the next‑generation flagship models from samsung and xiaomi have quietly appeared in global ce

06.15.26 1

Apple is secretly developing a brand-new, in-house camera app, which could make its debut alongside the iPhone 18 Pro lineup

at wwdc 2026, while apple unveiled major system updates like ios 27, it deliberately held back a key imaging feature—a completely redesigned camera app that o

06.15.26 0

Lenovo has officially launched the new Yoga Pro 7 15-inch laptop, which globally debuts support for dynamic video memory allocation technology, allowing up to 96 GB of system memory to be flexibly allocated as video memory

lenovo has officially unveiled its new flagship yoga pro 7 15ash11 laptop, built around the amd strix halo platform to redefine the high-end mobile creative ex

06.15.26 1

Huawei’s FreeClip 2 Collector’s Edition has been officially launched, debuting with the HarmonyOS 6 operating system

at 10:08 on june 15, huawei terminal officially launched the freeclip 2 collector’s edition earclip-style headphones, with a launch price of 1,499 yuan. this

06.15.26 0

The OnePlus Turbo 6X series has officially launched, pre-installed with the all-new ColorOS 16, delivering a smooth user experience that lasts for six years

on june 15, oneplus officially launched its brand-new turbo 6x series and simultaneously kicked off its first-ever omnichannel sales. the lineup includes two f

06.15.26 0

Key specs of the OPPO Find X10 Pro have surfaced: it will debut MediaTek’s flagship chip built on TSMC’s 2nm process, and its imaging system features a dual 200-megapixel ultra‑clear main camera

recently, the specifications of a flagship dimensity engineering prototype built on the cutting-edge 2nm process were unexpectedly leaked, drawing widespread a

06.15.26 0

At present, Siri’s intelligent interaction capabilities are roughly on par with the technological level of mainstream AI chatbots from six months ago

apple’s ai strategy has reached a pivotal turning point: the all-new siri is redefining the boundaries of smart assistants with its contextual understanding a

06.15.26 1

Goldman Sachs’ latest research report indicates that the market’s current assessment of the true demand for artificial intelligence is markedly conservative, while corporate‑level AI investment continues to gain momentum The report forecasts that global t

goldman sachs’ latest research report points out that the market has significantly misjudged the pace of ai infrastructure expansion—far from peaking, the wa

06.15.26 1

Oracle has announced adjustments to the resource quotas of its permanently free cloud service tiers: once the limits are exceeded, the services will be automatically suspended, and any usage beyond the quota will be billed on a pay-as-you-go basis

oracle cloud recently announced an official update to its free-tier policy, stating that starting june 15, 2026, the permanently free arm‑based plan for globa

06.15.26 0