<description>&lt;p class="MsoNormal"&gt;Experimentation and validation of LLM performance is critical when building LLM-driven systems that must reliably deliver a service, from customer service chat bots to intelligence analysis tools. To help teams meet the need for rigorous evaluation methods, a research team in the SEI's AI Division led by &lt;a href= "https://www.sei.cmu.edu/authors/violet-turri/"&gt;&lt;span style= "mso-bookmark: OLE_LINK51;"&gt;Violet Turri&lt;/span&gt;&lt;/a&gt; &lt;span style= "mso-bookmark: OLE_LINK51;"&gt;has developed the Evaluating Large Language Models &lt;a style= "mso-comment-reference: KR_1; mso-comment-date: 20260610T0907; mso-comment-done: yes;"&gt; (ELM) &lt;/a&gt;&lt;span style= "mso-ansi-font-size: 12.0pt; mso-bidi-font-size: 12.0pt; line-height: 115%;"&gt;&lt;!-- [if !supportAnnotations]--&gt;&lt;/span&gt;library, which is built on best practices for LLM evaluation and benchmarking. In the latest episode from the Carnegie Mellon University Software Engineering Institute, Turri sits down with Katie Robinson, a design researcher also in the SEI's AI division, to discuss the ELM library, which turns evaluation from an ad-hoc process into a repeatable, extensible framework.&lt;/span&gt;&lt;/p&gt;</description>

Software Engineering Institute (SEI) Podcast Series

Members of Technical Staff at the Software Engineering Institute

An LLM Evaluation Framework for High-Stakes AI

JUN 11, 202616 MIN
Software Engineering Institute (SEI) Podcast Series

An LLM Evaluation Framework for High-Stakes AI

JUN 11, 202616 MIN

Description

Experimentation and validation of LLM performance is critical when building LLM-driven systems that must reliably deliver a service, from customer service chat bots to intelligence analysis tools. To help teams meet the need for rigorous evaluation methods, a research team in the SEI's AI Division led by Violet Turri has developed the Evaluating Large Language Models (ELM) library, which is built on best practices for LLM evaluation and benchmarking. In the latest episode from the Carnegie Mellon University Software Engineering Institute, Turri sits down with Katie Robinson, a design researcher also in the SEI's AI division, to discuss the ELM library, which turns evaluation from an ad-hoc process into a repeatable, extensible framework.