eBiltegia

    • Zer da eBiltegia? 
    •   eBiltegiari buruz
    •   Argitaratu irekian zure ikerketa
    • Sarbide Irekia MUn 
    •   Zer da Zientzia Irekia?
    •   Mondragon Unibertsitatearen dokumentu zientifikoetara eta irakaskuntza-materialetara Sarbide Irekia izateko politika instituzionala
    •   Zure argitalpenak jaso eta zabaldu egiten ditu Bibliotekak

Con la colaboración de:

Euskara | Español | English
  • Kontaktua
  • Zientzia Irekia
  • eBiltegiari buruz
  • Hasi saioa
Ikusi itema 
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Kongresuetara ekarpenak
  • Kongresuetara ekarpenak - Ingeniaritza
  • Ikusi itema
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Kongresuetara ekarpenak
  • Kongresuetara ekarpenak - Ingeniaritza
  • Ikusi itema
JavaScript is disabled for your browser. Some features of this site may not work without it.
Thumbnail
Ikusi/Ireki
ASTRAL Automated Safety Testing of Large Language Models.pdf (625.1Kb)
Erregistro osoa
Eragina

Web of Science   

Google Scholar
Partekatu
EmailLinkedinFacebookTwitter
Gorde erreferentzia
Mendely

Zotero

untranslated

Mets

Mods

Rdf

Marc

Exportar a BibTeX
Izenburua
ASTRAL: Automated Safety Testing of Large Language Models
Egilea
Ugarte Querejeta, Miriam
Valle Entrena, Pablo
Parejo, Jose Antonio
Segura, Sergio
Arrieta, Aitor
Argitalpen data
2025
Ikerketa taldea
Ingeniería del software y sistemas
Beste erakundeak
https://ror.org/00wvqgd19
Universidad de Sevilla
Bertsioa
Postprinta
Dokumentu-mota
Kongresu-ekarpena
Hizkuntza
Ingelesa
Eskubideak
© 2025 IEEE
Sarbidea
Sarbide irekia
URI
https://hdl.handle.net/20.500.11984/13991
Argitaratzailearen bertsioa
https://doi.org/10.1109/AST66626.2025.00018
Non argitaratua
IEEE/ACM International Conference on Automation of Software Test (AST)  Ottawa (Canada), 28-29 April 2025
Argitaratzailea
IEEE
Gako-hitzak
Large Language Models
ODS 9 Industria, innovación e infraestructura
ODS 10 Reducción de las desigualdades
Laburpena
Large Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as ... [+]
Large Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as they might provide harmful and unsafe responses. Existing LLM testing frameworks address various safety-related concerns (e.g., drugs, terrorism, animal abuse) but often face challenges due to unbalanced and obsolete datasets. In this paper, we present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs. First, we introduce a novel black-box coverage criterion to generate balanced and diverse unsafe test inputs across a diverse set of safety categories as well as linguistic writing characteristics (i.e., different style and persuasive writing techniques). Second, we propose an LLM-based approach that leverages Retrieval Augmented Generation (RAG), few-shot prompting strategies and web browsing to generate up-to-date test inputs. Lastly, similar to current LLM test automation techniques, we leverage LLMs as test oracles to distinguish between safe and unsafe test outputs, allowing a fully automated testing approach. We conduct an extensive evaluation on well-known LLMs, revealing the following key findings: i) GPT3.5 outperforms other LLMs when acting as the test oracle, accurately detecting unsafe responses, and even surpassing more recent LLMs (e.g., GPT-4), as well as LLMs that are specifically tailored to detect unsafe LLM outputs (e.g., LlamaGuard); ii) the results confirm that our approach can uncover nearly twice as many unsafe LLM behaviors with the same number of test inputs compared to currently used static datasets; and iii) our black-box coverage criterion combined with web browsing can effectively guide the LLM on generating up-to-date unsafe test inputs, significantly increasing the number of unsafe LLM behaviors. [-]
Bildumak
  • Kongresuetara ekarpenak - Ingeniaritza [454]

Zerrendatu honako honen arabera

eBiltegia osoaKomunitateak & bildumakArgitalpen dataren araberaEgileakIzenburuakMateriakIkerketa taldeakNon argitaratuaBilduma hauArgitalpen dataren araberaEgileakIzenburuakMateriakIkerketa taldeakNon argitaratua

Nire kontua

SartuErregistratu

Estatistikak

Ikusi erabilearen inguruko estatistikak

Nork bildua:

OpenAIREBASERecolecta

Nork balioztatua:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Biblioteka
Kontaktua | Iradokizunak
DSpace
 

 

Nork bildua:

OpenAIREBASERecolecta

Nork balioztatua:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Biblioteka
Kontaktua | Iradokizunak
DSpace