通用文本优化与AI安全新视角

发布时间：2026-05-21 06:05阅读：10

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AI - 人工智能

1、[CL] optimize_anything：A Universal API for Optimizing any Text Parameter 2、[LG] A Bitter Lesson for Data Filtering 3、[AI] Agent Security is a Systems Problem 4、[LG] Optimal Reconstruction from Linear Queries 5、[LG] Density-Ratio Losses for Post-Hoc Learning to Defer

摘要：适配全场景的文本参数优化通用API、数据过滤的苦涩教训、智能体安全本质上是一个系统性工程问题、线性查询下的最优重构、基于密度比损失的事后学习推迟方法

L A Agrawal, D Lee, S Tan, W Ma… [UC Berkeley]

optimize_anything：适配全场景的文本参数优化通用API

要点:

主旨：现有基于大语言模型（LLM）的优化系统（如AlphaEvolve、GEPA）通常局限于特定的领域（纯代码或纯提示词），且仅支持单一的优化模式（单任务或泛化）。本文旨在解决这一碎片化问题，提出并开源了一个名为optimize_anything的通用API，将所有优化问题统一建模为“文本参数”的评估与迭代，旨在用一个框架在跨度极大的不同领域中匹配甚至超越领域特定的优化工具。

创新：

贡献：

提升：

不足：

心得：

一句话总结：optimize_anything是一个通用的声明式 LLM 优化框架，它通过将任意问题转化为“文本参数”，并利用富诊断信息（辅助信息）驱动帕累托搜索，首次在智能体架构、云调度、CUDA加速等六个截然不同的领域用单一工具实现了超越专用算法的 SOTA 性能。

Can a single LLM-based optimization system match specialized tools across fundamentally different domains? We show that when optimization problems are formulated as improving a text artifact evaluated by a scoring function, a single AI-based optimization system—supporting single-task search, multi-task search with crossproblem transfer, and generalization to unseen inputs—achieves state-of-the-art results across six diverse tasks. Our system discovers agent architectures where nearly triple ARC-AGI accuracy (32.5% → 89.5%), finds scheduling algorithms that cut cloud costs by 40%, generates CUDA kernels where 87% match or beat PyTorch, and outperforms AlphaEvolve’s reported circle packing solution (n=26). Ablations across three domains reveal that actionable side information yields faster convergence and substantially higher final scores than score-only feedback, and that multitask search outperforms independent optimization given equivalent per-problem budget through cross-task transfer, with benefits scaling with the number of related tasks. Together, we show for the first time that text optimization with LLM-based search is a generalpurpose problem-solving paradigm, unifying tasks traditionally requiring domain-specific algorithms under a single framework.

https://arxiv.org/abs/2605.19633

C Mohri, J Duchi, T Hashimoto [Stanford University]

数据过滤的苦涩教训

要点:

主旨：本文探讨了在充足乃至极限算力下，大语言模型预训练中的数据过滤问题。文章提出并验证了一个极具颠覆性的观点：“最好的数据过滤器就是没有过滤器”。研究表明，在模型参数量足够大、训练时间足够长的情况下，全量且未过滤的网络数据（哪怕包含大量低质量或垃圾内容）不仅不会损害模型，反而比经过精心筛选的高质量数据集更能提升模型的性能上限。

创新：

贡献：

提升：

不足：

心得：

一句话总结：本文挑战了预训练必须进行严格数据过滤的传统共识，通过实证与理论证明，在算力和模型容量足够大的前提下，大型语言模型不仅能完美抵御并隔离“垃圾数据”的干扰，甚至能从中提取残余价值，“放弃数据过滤直接训练全量互联网”将成为未来极致算力时代下的必然选择。

We investigate data filtering for large model pretraining via new scaling studies that target the high compute, data-scarce regime. In spite of an apparently common belief that filtering data to include only high-quality information is essential, our experiments suggest that with enough compute, the best data filter is no data filter. We find that sufficiently trained large parameter models not only tolerate low-quality and distractor data, but in fact benefit from nominally “poor” data.

https://arxiv.org/abs/2605.19407

M Christodorescu, E Fernandes, A Hooda, S Jha… [Google & University of California San Diego]

智能体安全本质上是一个系统性工程问题

要点:

主旨：本文提出，Agent（智能体）的安全不应仅仅依赖于提升底层AI模型自身的鲁棒性，而必须被视为一个“系统工程”问题。研究者呼吁改变现状，应将LLM作为一个不可信的黑盒组件，并在其外围构建类似于传统操作系统和网络安全的架构，通过严格的系统级机制（如指令与数据分离、可验证的安全策略、信息流控制）来确保Agentic系统的整体安全。

创新：

贡献：

提升：

不足：

心得：

一句话总结：不要再奢望把大模型训练得绝对安全，真正的Agent安全之道在于接受模型的不可靠性，并用传统计算机系统的隔离、沙箱和权限控制原则，为这个“聪明的黑盒”打造一套坚不可摧的外部枷锁。

We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted component, and security invariants must be enforced at the system level. Through this lens, efforts to increase model robustness (the dominant viewpoint in the community) are insufficient on their own. Instead, we must complement existing efforts with techniques from the systems security domain. Based on our experience as cybersecurity researchers in operating systems, networks, formal methods, and adversarial machine learning, we articulate a set of core principles, grounded in decades of systems security research, that provide a foundation for designing agentic systems with predictable guarantees. As evidence, we analyze eleven representative real-world attacks on agents and discuss how systems principles, if realized, could have prevented these attacks. We also identify the research challenges that stand in the way of implementing these principles in agents.

https://arxiv.org/abs/2605.18991

Y Filmus, E Nesterova [Technion – Israel Institute of Technology]

线性查询下的最优重构

要点:

主旨：本文从理论计算几何和交互式学习的角度，研究了如何通过带有对抗性噪声的线性查询（内积）在空间中重构一个未知点。论文旨在精确刻画最优重构误差与查询次数、空间维度以及噪声参数之间的函数关系，并揭示其收敛极限及收敛速率。

创新：

贡献：

提升：

不足：

心得：

一句话总结：本文在理论上彻底解决了带有对抗噪声的线性查询重构问题，不仅精确找出了如同“贝叶斯最优”的误差理论极限，还利用创新的“鲁棒容定理（Robust Jung's Theorem）”证明了通过自适应查询，超额误差可以实现极其罕见的双重指数级衰减。

We study the problem of reconstructing an unknown point in ℝd from approximate linear queries. This setting arises naturally in applications ranging from low-dimensional remote sensing and signal recovery to high-dimensional data analysis and privacy-sensitive inference. Our main goal is to characterize the optimal reconstruction error as a function of the number of queries T, the ambient dimension d, and the noise parameter δ. We first analyze the limit T→∞ and show that the optimal reconstruction error converges to the explicit value 2d/(d+1)‾‾‾‾‾‾‾‾‾√δ, which plays a role analogous to the Bayes optimal error in supervised learning. When the dimension is fixed, we show that the excess error above this limit decays doubly exponentially fast as T→∞, a rate that is significantly faster than those typically encountered in learning curves. When the dimension grows, we show that a number of queries on the order of exp(d) is necessary and sufficient to achieve vanishing excess error. Finally, we introduce and analyze an improper variant of the reconstruction problem. From a technical perspective, our main contribution is a generalization of Jung’s theorem (1901). The classical theorem bounds the maximum possible radius of a set of diameter 1 and characterizes extremal bodies. Our generalization provides a robust variant that characterizes near-extremal bodies and is proved via geometric and dynamical arguments exploiting symmetry and Lie group actions.

https://arxiv.org/abs/2605.19625

A Soen, R Thobaben, J Jaldén, R Nock [KTH & Google Research]

基于密度比损失的事后学习推迟方法

要点:

主旨：本文从“理想分布（Ideal Distributions）”的独特视角，研究了事后延迟学习（post-hoc L2D）问题。通过对比基础模型和专家模型在各自理想分布下的密度比，作者提出了一种新的延迟决策规则，并通过密度比估计到类概率估计（DRE to CPE）的归约，推导出了用于训练打分器的 DR CPE 损失函数，使得模型能够在不重新训练的前提下，灵活且鲁棒地决定何时将预测任务交由专家处理。

创新：

贡献：

提升：

不足：

心得：

一句话总结：本文提出了一种新的事后延迟学习方法，通过计算模型与专家的“理想分布”密度比来决定是否让专家接手，并利用二元分类归约技术推导出易于优化的 DR CPE 损失，在不重新训练大模型的前提下，实现了高度鲁棒且延迟率动态可调的智能人机协作决策。

We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergenceregularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model’s and an expert’s ideals. Using the reduction from density-ratio estimation to class-probability estimation, we derive the DR CPE losses for post-hoc L2D scorers. Deferral decisions are then made by thresholding the scorer, allowing deferral rates to be adjusted without retraining. For KL-based ideal distributions, our deferral rules recovers Chow’s rule under the original distribution and a connection to an expert-tilted Bayes posterior—which incorporates the expert’s performance—depending on if the ideal distributions are joint or marginal distributions. Experimentally, our approach is competitive compared to common baselines and more robust across dataset settings. More broadly, our results cast post-hoc L2D as density-ratio learning between ideal distributions, bridging Chow-style rules, expert comparison, and elucidating connections to related learning settings including anomaly detection.

https://arxiv.org/abs/2605.19557