<p class="MsoNormal">Experimentation and validation of LLM performance is critical when building LLM-driven systems that must reliably deliver a service, from customer service chat bots to intelligence analysis tools. To help teams meet the need for rigorous evaluation methods, a research team in the SEI's AI Division led by <a href= "https://www.sei.cmu.edu/authors/violet-turri/">Violet Turri</a> has developed the Evaluating Large Language Models <a style= "mso-comment-reference: KR_1; mso-comment-date: 20260610T0907; mso-comment-done: yes;"> (ELM) </a>library, which is built on best practices for LLM evaluation and benchmarking. In the latest episode from the Carnegie Mellon University Software Engineering Institute, Turri sits down with Katie Robinson, a design researcher also in the SEI's AI division, to discuss the ELM library, which turns evaluation from an ad-hoc process into a repeatable, extensible framework.</p>

<description>&lt;p class="MsoNormal"&gt;Experimentation and validation of LLM performance is critical when building LLM-driven systems that must reliably deliver a service, from customer service chat bots to intelligence analysis tools. To help teams meet the need for rigorous evaluation methods, a research team in the SEI's AI Division led by &lt;a href= "https://www.sei.cmu.edu/authors/violet-turri/"&gt;&lt;span style= "mso-bookmark: OLE_LINK51;"&gt;Violet Turri&lt;/span&gt;&lt;/a&gt; &lt;span style= "mso-bookmark: OLE_LINK51;"&gt;has developed the Evaluating Large Language Models &lt;a style= "mso-comment-reference: KR_1; mso-comment-date: 20260610T0907; mso-comment-done: yes;"&gt; (ELM) &lt;/a&gt;&lt;span style= "mso-ansi-font-size: 12.0pt; mso-bidi-font-size: 12.0pt; line-height: 115%;"&gt;&lt;!-- [if !supportAnnotations]--&gt;&lt;/span&gt;library, which is built on best practices for LLM evaluation and benchmarking. In the latest episode from the Carnegie Mellon University Software Engineering Institute, Turri sits down with Katie Robinson, a design researcher also in the SEI's AI division, to discuss the ELM library, which turns evaluation from an ad-hoc process into a repeatable, extensible framework.&lt;/span&gt;&lt;/p&gt;</description>

Software Engineering Institute (SEI) Podcast Series

An LLM Evaluation Framework for High-Stakes AI

An LLM Evaluation Framework for High-Stakes AI

Description