<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
    <channel>
        <title>评估测试 - Tag - 编程心语</title>
        <link>https://www.ithome.me/tags/%E8%AF%84%E4%BC%B0%E6%B5%8B%E8%AF%95/</link>
        <description>评估测试 - Tag - 编程心语</description>
        <generator>Hugo -- gohugo.io</generator><language>zh-CN</language><lastBuildDate>Wed, 17 Jun 2026 08:00:00 &#43;0800</lastBuildDate><atom:link href="https://www.ithome.me/tags/%E8%AF%84%E4%BC%B0%E6%B5%8B%E8%AF%95/" rel="self" type="application/rss+xml" /><item>
    <title>AI Agent评估测试：如何科学衡量智能体表现？</title>
    <link>https://www.ithome.me/post/2026/06/17/ai-agent%E8%AF%84%E4%BC%B0%E6%B5%8B%E8%AF%95%E5%A6%82%E4%BD%95%E7%A7%91%E5%AD%A6%E8%A1%A1%E9%87%8F%E6%99%BA%E8%83%BD%E4%BD%93%E8%A1%A8%E7%8E%B0/</link>
    <pubDate>Wed, 17 Jun 2026 08:00:00 &#43;0800</pubDate>
    <author>Simon Chen</author>
    <guid>https://www.ithome.me/post/2026/06/17/ai-agent%E8%AF%84%E4%BC%B0%E6%B5%8B%E8%AF%95%E5%A6%82%E4%BD%95%E7%A7%91%E5%AD%A6%E8%A1%A1%E9%87%8F%E6%99%BA%E8%83%BD%E4%BD%93%E8%A1%A8%E7%8E%B0/</guid>
    <description><![CDATA[引言 随着大语言模型（LLM）驱动的 AI Agent 在客服、代码助手、数据分析等场景大规模落地，一个核心问题浮出水面：如何科学地衡量一个 Agent 的表现？ 传统的软]]></description>
</item>
</channel>
</rss>
