Erregistro soila

dc.contributor.authorUgarte Querejeta, Miriam
dc.contributor.authorValle Entrena, Pablo
dc.contributor.authorParejo, Jose Antonio
dc.contributor.authorSegura, Sergio
dc.contributor.authorArrieta, Aitor
dc.date.accessioned2025-11-25T11:44:39Z
dc.date.available2025-11-25T11:44:39Z
dc.date.issued2025
dc.identifier.isbn979-8-3315-0179-2en
dc.identifier.issn2833-9061en
dc.identifier.otherhttps://katalogoa.mondragon.edu/janium-bin/janium_login_opac.pl?find&ficha_no=200362en
dc.identifier.urihttps://hdl.handle.net/20.500.11984/13991
dc.description.abstractLarge Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as they might provide harmful and unsafe responses. Existing LLM testing frameworks address various safety-related concerns (e.g., drugs, terrorism, animal abuse) but often face challenges due to unbalanced and obsolete datasets. In this paper, we present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs. First, we introduce a novel black-box coverage criterion to generate balanced and diverse unsafe test inputs across a diverse set of safety categories as well as linguistic writing characteristics (i.e., different style and persuasive writing techniques). Second, we propose an LLM-based approach that leverages Retrieval Augmented Generation (RAG), few-shot prompting strategies and web browsing to generate up-to-date test inputs. Lastly, similar to current LLM test automation techniques, we leverage LLMs as test oracles to distinguish between safe and unsafe test outputs, allowing a fully automated testing approach. We conduct an extensive evaluation on well-known LLMs, revealing the following key findings: i) GPT3.5 outperforms other LLMs when acting as the test oracle, accurately detecting unsafe responses, and even surpassing more recent LLMs (e.g., GPT-4), as well as LLMs that are specifically tailored to detect unsafe LLM outputs (e.g., LlamaGuard); ii) the results confirm that our approach can uncover nearly twice as many unsafe LLM behaviors with the same number of test inputs compared to currently used static datasets; and iii) our black-box coverage criterion combined with web browsing can effectively guide the LLM on generating up-to-date unsafe test inputs, significantly increasing the number of unsafe LLM behaviors.en
dc.language.isoengen
dc.publisherIEEEen
dc.rights© 2025 IEEEen
dc.subjectLarge Language Modelsen
dc.subjectODS 9 Industria, innovación e infraestructuraes
dc.subjectODS 10 Reducción de las desigualdadeses
dc.titleASTRAL: Automated Safety Testing of Large Language Modelsen
dcterms.accessRightshttp://purl.org/coar/access_right/c_abf2en
dcterms.sourceIEEE/ACM International Conference on Automation of Software Test (AST)en
local.contributor.groupIngeniería del software y sistemases
local.description.peerreviewedtrueen
local.identifier.doihttps://doi.org/10.1109/AST66626.2025.00018en
local.contributor.otherinstitutionhttps://ror.org/00wvqgd19es
local.contributor.otherinstitutionhttps://ror.org/03yxnpp24es
local.source.detailsOttawa (Canada), 28-29 April 2025en
oaire.format.mimetypeapplication/pdfen
oaire.file$DSPACE\assetstoreen
oaire.resourceTypehttp://purl.org/coar/resource_type/c_c94fen
oaire.versionhttp://purl.org/coar/version/c_ab4af688f83e57aaen
dc.unesco.tesaurohttp://vocabularies.unesco.org/thesaurus/concept450en
oaire.funderNameComisión Europeaen
oaire.funderNameGobierno de Españaen
oaire.funderNameGobierno Vascoen
oaire.funderIdentifierhttps://ror.org/00k4n6c32 / http://data.crossref.org/fundingdata/funder/10.13039/501100000780en
oaire.funderIdentifierhttps://ror.org/038jjxj40 / http://data.crossref.org/fundingdata/funder/10.13039/501100010198en
oaire.funderIdentifierhttps://ror.org/00pz2fp31 / http://data.crossref.org/fundingdata/funder/10.13039/501100003086en
oaire.fundingStreamHORIZON-CL4-2021-HUMAN-01en
oaire.fundingStreamPlan Estatal 2021-2023 - Proyectos Investigación No Orientadaen
oaire.fundingStreamIkertalde Convocatoria 2022-2023en
oaire.awardNumber101069364en
oaire.awardNumberPID2021-126227NB-C22en
oaire.awardNumberIT1519-22en
oaire.awardTitleNext Generation Internet Discovery and Search (NGI Search)en
oaire.awardTitleMejorando el desarrollo, fiabilidad y gobierno de servicios digitales por medio de la colaboración bot-humanoen
oaire.awardTitleIngeniería de Software y Sistemas (IKERTALDE 2022-2023)en
oaire.awardURIhttps://doi.org/10.3030/101069364en
oaire.awardURISin informaciónen
oaire.awardURISin informaciónen
dc.unesco.clasificacionhttp://skos.um.es/unesco6/120317en


Item honetako fitxategiak

Thumbnail

Item hau honako bilduma honetan/hauetan agertzen da

Erregistro soila