| dc.contributor.author | Ugarte Querejeta, Miriam | |
| dc.contributor.author | Valle Entrena, Pablo | |
| dc.contributor.author | Parejo, Jose Antonio | |
| dc.contributor.author | Segura, Sergio | |
| dc.contributor.author | Arrieta, Aitor | |
| dc.date.accessioned | 2025-11-25T11:44:39Z | |
| dc.date.available | 2025-11-25T11:44:39Z | |
| dc.date.issued | 2025 | |
| dc.identifier.isbn | 979-8-3315-0179-2 | en |
| dc.identifier.issn | 2833-9061 | en |
| dc.identifier.other | https://katalogoa.mondragon.edu/janium-bin/janium_login_opac.pl?find&ficha_no=200362 | en |
| dc.identifier.uri | https://hdl.handle.net/20.500.11984/13991 | |
| dc.description.abstract | Large Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as they might provide harmful and unsafe responses. Existing LLM testing frameworks address various safety-related concerns (e.g., drugs, terrorism, animal abuse) but often face challenges due to unbalanced and obsolete datasets. In this paper, we present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs. First, we introduce a novel black-box coverage criterion to generate balanced and diverse unsafe test inputs across a diverse set of safety categories as well as linguistic writing characteristics (i.e., different style and persuasive writing techniques). Second, we propose an LLM-based approach that leverages Retrieval Augmented Generation (RAG), few-shot prompting strategies and web browsing to generate up-to-date test inputs. Lastly, similar to current LLM test automation techniques, we leverage LLMs as test oracles to distinguish between safe and unsafe test outputs, allowing a fully automated testing approach. We conduct an extensive evaluation on well-known LLMs, revealing the following key findings: i) GPT3.5 outperforms other LLMs when acting as the test oracle, accurately detecting unsafe responses, and even surpassing more recent LLMs (e.g., GPT-4), as well as LLMs that are specifically tailored to detect unsafe LLM outputs (e.g., LlamaGuard); ii) the results confirm that our approach can uncover nearly twice as many unsafe LLM behaviors with the same number of test inputs compared to currently used static datasets; and iii) our black-box coverage criterion combined with web browsing can effectively guide the LLM on generating up-to-date unsafe test inputs, significantly increasing the number of unsafe LLM behaviors. | en |
| dc.language.iso | eng | en |
| dc.publisher | IEEE | en |
| dc.rights | © 2025 IEEE | en |
| dc.subject | Large Language Models | en |
| dc.subject | ODS 9 Industria, innovación e infraestructura | es |
| dc.subject | ODS 10 Reducción de las desigualdades | es |
| dc.title | ASTRAL: Automated Safety Testing of Large Language Models | en |
| dcterms.accessRights | http://purl.org/coar/access_right/c_abf2 | en |
| dcterms.source | IEEE/ACM International Conference on Automation of Software Test (AST) | en |
| local.contributor.group | Ingeniería del software y sistemas | es |
| local.description.peerreviewed | true | en |
| local.identifier.doi | https://doi.org/10.1109/AST66626.2025.00018 | en |
| local.contributor.otherinstitution | https://ror.org/00wvqgd19 | es |
| local.contributor.otherinstitution | https://ror.org/03yxnpp24 | es |
| local.source.details | Ottawa (Canada), 28-29 April 2025 | en |
| oaire.format.mimetype | application/pdf | en |
| oaire.file | $DSPACE\assetstore | en |
| oaire.resourceType | http://purl.org/coar/resource_type/c_c94f | en |
| oaire.version | http://purl.org/coar/version/c_ab4af688f83e57aa | en |
| dc.unesco.tesauro | http://vocabularies.unesco.org/thesaurus/concept450 | en |
| oaire.funderName | Comisión Europea | en |
| oaire.funderName | Gobierno de España | en |
| oaire.funderName | Gobierno Vasco | en |
| oaire.funderIdentifier | https://ror.org/00k4n6c32 / http://data.crossref.org/fundingdata/funder/10.13039/501100000780 | en |
| oaire.funderIdentifier | https://ror.org/038jjxj40 / http://data.crossref.org/fundingdata/funder/10.13039/501100010198 | en |
| oaire.funderIdentifier | https://ror.org/00pz2fp31 / http://data.crossref.org/fundingdata/funder/10.13039/501100003086 | en |
| oaire.fundingStream | HORIZON-CL4-2021-HUMAN-01 | en |
| oaire.fundingStream | Plan Estatal 2021-2023 - Proyectos Investigación No Orientada | en |
| oaire.fundingStream | Ikertalde Convocatoria 2022-2023 | en |
| oaire.awardNumber | 101069364 | en |
| oaire.awardNumber | PID2021-126227NB-C22 | en |
| oaire.awardNumber | IT1519-22 | en |
| oaire.awardTitle | Next Generation Internet Discovery and Search (NGI Search) | en |
| oaire.awardTitle | Mejorando el desarrollo, fiabilidad y gobierno de servicios digitales por medio de la colaboración bot-humano | en |
| oaire.awardTitle | Ingeniería de Software y Sistemas (IKERTALDE 2022-2023) | en |
| oaire.awardURI | https://doi.org/10.3030/101069364 | en |
| oaire.awardURI | Sin información | en |
| oaire.awardURI | Sin información | en |
| dc.unesco.clasificacion | http://skos.um.es/unesco6/120317 | en |