<p>Your Java AI application is live in production. But have you tested whether it can be jailbroken, manipulated into revealing its system prompt, or tricked into printing content it should never output?</p><p>In this episode, <strong>Iryna Dohndorf</strong>, Software Engineer at Karakun Group and creator of Tiberius, explains how to bring security testing to LLM-powered Java applications. We cover why traditional unit tests break down with non-deterministic systems, how the Scan-Fixture-Validate workflow works, what buff mutation testing is, and why even well-trained models can be cracked with something as simple as the grandmother attack.</p><p>Topics include:</p><ul><li>Why LLM non-determinism breaks the classic input/output test model</li><li>The Scan-Fixture-Validate principle and sharing test artifacts across teams</li><li>Prompt injection, jailbreaks, and emotional manipulation attacks</li><li>Buff mutation: testing linguistic surface coverage</li><li>Probabilistic security contracts and multi-trial scans</li><li>Fingerprinting and why your model choice should not be detectable</li><li>LLM as a judge: using a second model as a guardrail</li><li>Getting started with Tiberius in Spring Boot and LangChain4j</li></ul><p><strong>Guest</strong><br>Iryna Dohndorf - Software Engineer at Karakun Group<br><a href="https://www.linkedin.com/in/iryna-dohndorf">LinkedIn</a></p><p><strong>Links</strong><br><a href="https://foojay.io/today/tiberius-a-security-testing-framework-for-llm-applications-in-java/">Article on Foojay</a><br><a href="https://github.com/tiberius-security/tiberius">Tiberius on GitHub</a><br><a href="https://github.com/tiberius-security/tiberius/blob/main/docs/SECURITY_TESTING_GUIDE.md">Security Testing Guide</a></p><p><strong>Timestamps</strong><br>00:00 Introduction of topic and guest<br>01:05 The problem Tiberius wants to solve<br>06:39 How &quot;traditional&quot; unit tests don&#39;t work for LLM integrations<br>10:23 Scan-Fixture-Validate principle and sharing artifacts<br>15:15 Using different skills, for example, the grandmother skill<br>17:33 Testing for required versus forbidden bias<br>19:35 The probes across nine attack categories used by Tiberius<br>20:44 Buff mutation testing<br>26:55 Using Tiberius in your pipelines and when to fail<br>29:35 Using multi-trial scans<br>31:14 Fingerprinting: which model you use, should not be detectable<br>32:55 Combining multiple models, model as a judge<br>34:41 Sharing JSON models to improve tests<br>36:05 How to get started with Tiberius in Spring and with LangChain4j<br>36:41 Quarkus not supported yet, plans for the future<br>39:07 Conclusions and a call out to everyone to become a Foojay author</p>

Foojay.io | Friends of OpenJDK and Java Programming

Foojay.io | Java and Programming Community

Testing the Untestable: LLM Security for Java Developers with Tiberius (#99)

JUN 20, 202641 MIN
Foojay.io | Friends of OpenJDK and Java Programming

Testing the Untestable: LLM Security for Java Developers with Tiberius (#99)

JUN 20, 202641 MIN

Description

<p>Your Java AI application is live in production. But have you tested whether it can be jailbroken, manipulated into revealing its system prompt, or tricked into printing content it should never output?</p><p>In this episode, <strong>Iryna Dohndorf</strong>, Software Engineer at Karakun Group and creator of Tiberius, explains how to bring security testing to LLM-powered Java applications. We cover why traditional unit tests break down with non-deterministic systems, how the Scan-Fixture-Validate workflow works, what buff mutation testing is, and why even well-trained models can be cracked with something as simple as the grandmother attack.</p><p>Topics include:</p><ul><li>Why LLM non-determinism breaks the classic input/output test model</li><li>The Scan-Fixture-Validate principle and sharing test artifacts across teams</li><li>Prompt injection, jailbreaks, and emotional manipulation attacks</li><li>Buff mutation: testing linguistic surface coverage</li><li>Probabilistic security contracts and multi-trial scans</li><li>Fingerprinting and why your model choice should not be detectable</li><li>LLM as a judge: using a second model as a guardrail</li><li>Getting started with Tiberius in Spring Boot and LangChain4j</li></ul><p><strong>Guest</strong><br>Iryna Dohndorf - Software Engineer at Karakun Group<br><a href="https://www.linkedin.com/in/iryna-dohndorf">LinkedIn</a></p><p><strong>Links</strong><br><a href="https://foojay.io/today/tiberius-a-security-testing-framework-for-llm-applications-in-java/">Article on Foojay</a><br><a href="https://github.com/tiberius-security/tiberius">Tiberius on GitHub</a><br><a href="https://github.com/tiberius-security/tiberius/blob/main/docs/SECURITY_TESTING_GUIDE.md">Security Testing Guide</a></p><p><strong>Timestamps</strong><br>00:00 Introduction of topic and guest<br>01:05 The problem Tiberius wants to solve<br>06:39 How &quot;traditional&quot; unit tests don&#39;t work for LLM integrations<br>10:23 Scan-Fixture-Validate principle and sharing artifacts<br>15:15 Using different skills, for example, the grandmother skill<br>17:33 Testing for required versus forbidden bias<br>19:35 The probes across nine attack categories used by Tiberius<br>20:44 Buff mutation testing<br>26:55 Using Tiberius in your pipelines and when to fail<br>29:35 Using multi-trial scans<br>31:14 Fingerprinting: which model you use, should not be detectable<br>32:55 Combining multiple models, model as a judge<br>34:41 Sharing JSON models to improve tests<br>36:05 How to get started with Tiberius in Spring and with LangChain4j<br>36:41 Quarkus not supported yet, plans for the future<br>39:07 Conclusions and a call out to everyone to become a Foojay author</p>