eBiltegia

    • What is eBiltegia? 
    •   About eBiltegia
    •   Publish your research in open access
    • Open Access at MU 
    •   What is Open Science?
    •   Mondragon Unibertsitatea's Institutional Policy on Open Access to scientific documents and teaching materials
    •   The Library compiles and disseminates your publications

Con la colaboración de:

Euskara | Español | English
  • Contact Us
  • Open Science
  • About eBiltegia
  • Login
View Item 
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Conference papers
  • Conference papers - Engineering
  • View Item
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Conference papers
  • Conference papers - Engineering
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
Thumbnail
View/Open
ASTRAL Automated Safety Testing of Large Language Models.pdf (625.1Kb)
Full record
Impact

Web of Science   

Google Scholar
Share
EmailLinkedinFacebookTwitter
Save the reference
Mendely

Zotero

untranslated

Mets

Mods

Rdf

Marc

Exportar a BibTeX
Title
ASTRAL: Automated Safety Testing of Large Language Models
Author
Ugarte Querejeta, Miriam cc
Valle Entrena, Pablo cc
Parejo, Jose Antonio
Segura, Sergio
Arrieta, Aitor cc
Publication Date
2025
Research Group
Ingeniería del software y sistemas
Other institutions
https://ror.org/00wvqgd19
Universidad de Sevilla
Version
Postprint
Document type
Conference Object
Language
English
Rights
© 2025 IEEE
Access
Open access
URI
https://hdl.handle.net/20.500.11984/13991
Publisher’s version
https://doi.org/10.1109/AST66626.2025.00018
Published at
IEEE/ACM International Conference on Automation of Software Test (AST)  Ottawa (Canada), 28-29 April 2025
Publisher
IEEE
Keywords
Large Language Models
ODS 9 Industria, innovación e infraestructura
ODS 10 Reducción de las desigualdades
Abstract
Large Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as ... [+]
Large Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as they might provide harmful and unsafe responses. Existing LLM testing frameworks address various safety-related concerns (e.g., drugs, terrorism, animal abuse) but often face challenges due to unbalanced and obsolete datasets. In this paper, we present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs. First, we introduce a novel black-box coverage criterion to generate balanced and diverse unsafe test inputs across a diverse set of safety categories as well as linguistic writing characteristics (i.e., different style and persuasive writing techniques). Second, we propose an LLM-based approach that leverages Retrieval Augmented Generation (RAG), few-shot prompting strategies and web browsing to generate up-to-date test inputs. Lastly, similar to current LLM test automation techniques, we leverage LLMs as test oracles to distinguish between safe and unsafe test outputs, allowing a fully automated testing approach. We conduct an extensive evaluation on well-known LLMs, revealing the following key findings: i) GPT3.5 outperforms other LLMs when acting as the test oracle, accurately detecting unsafe responses, and even surpassing more recent LLMs (e.g., GPT-4), as well as LLMs that are specifically tailored to detect unsafe LLM outputs (e.g., LlamaGuard); ii) the results confirm that our approach can uncover nearly twice as many unsafe LLM behaviors with the same number of test inputs compared to currently used static datasets; and iii) our black-box coverage criterion combined with web browsing can effectively guide the LLM on generating up-to-date unsafe test inputs, significantly increasing the number of unsafe LLM behaviors. [-]
Collections
  • Conference papers - Engineering [449]

Browse

All of eBiltegiaCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsResearch groupsPublished atThis CollectionBy Issue DateAuthorsTitlesSubjectsResearch groupsPublished at

My Account

LoginRegister

Statistics

View Usage Statistics

Harvested by:

OpenAIREBASERecolecta

Validated by:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Library
Contact Us | Send Feedback
DSpace
 

 

Harvested by:

OpenAIREBASERecolecta

Validated by:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Library
Contact Us | Send Feedback
DSpace