eBiltegia

    • Zer da eBiltegia? 
    •   eBiltegiari buruz
    •   Argitaratu irekian zure ikerketa
    • Sarbide Irekia MUn 
    •   Zer da Zientzia Irekia?
    •   Mondragon Unibertsitatearen dokumentu zientifikoetara eta irakaskuntza-materialetara Sarbide Irekia izateko politika instituzionala
    •   Zure argitalpenak jaso eta zabaldu egiten ditu Bibliotekak

Con la colaboración de:

Euskara | Español | English
  • Kontaktua
  • Zientzia Irekia
  • eBiltegiari buruz
  • Hasi saioa
Ikusi itema 
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Kongresuetara ekarpenak
  • Kongresuetara ekarpenak - Ingeniaritza
  • Ikusi itema
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Kongresuetara ekarpenak
  • Kongresuetara ekarpenak - Ingeniaritza
  • Ikusi itema
JavaScript is disabled for your browser. Some features of this site may not work without it.
Thumbnail
Ikusi/Ireki
ASTRAL Automated Safety Testing of Large Language Models.pdf (625.1Kb)
Erregistro osoa
Eragina

Web of Science   

Google Scholar
Partekatu
EmailLinkedinFacebookTwitter
Gorde erreferentzia
Mendely

Zotero

untranslated

Mets

Mods

Rdf

Marc

Exportar a BibTeX
Izenburua
ASTRAL: Automated Safety Testing of Large Language Models
Egilea
Ugarte Querejeta, Miriam cc
Valle Entrena, Pablo cc
Parejo, Jose Antonio
Segura, Sergio
Arrieta, Aitor cc
Argitalpen data
2025
Ikerketa taldea
Ingeniería del software y sistemas
Beste erakundeak
https://ror.org/00wvqgd19
Universidad de Sevilla
Bertsioa
Postprinta
Dokumentu-mota
Kongresu-ekarpena
Hizkuntza
Ingelesa
Eskubideak
© 2025 IEEE
Sarbidea
Sarbide irekia
URI
https://hdl.handle.net/20.500.11984/13991
Argitaratzailearen bertsioa
https://doi.org/10.1109/AST66626.2025.00018
Non argitaratua
IEEE/ACM International Conference on Automation of Software Test (AST)  Ottawa (Canada), 28-29 April 2025
Argitaratzailea
IEEE
Gako-hitzak
Large Language Models
ODS 9 Industria, innovación e infraestructura
ODS 10 Reducción de las desigualdades
Laburpena
Large Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as ... [+]
Large Language Models (LLMs) have recently gained significant attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as they might provide harmful and unsafe responses. Existing LLM testing frameworks address various safety-related concerns (e.g., drugs, terrorism, animal abuse) but often face challenges due to unbalanced and obsolete datasets. In this paper, we present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs. First, we introduce a novel black-box coverage criterion to generate balanced and diverse unsafe test inputs across a diverse set of safety categories as well as linguistic writing characteristics (i.e., different style and persuasive writing techniques). Second, we propose an LLM-based approach that leverages Retrieval Augmented Generation (RAG), few-shot prompting strategies and web browsing to generate up-to-date test inputs. Lastly, similar to current LLM test automation techniques, we leverage LLMs as test oracles to distinguish between safe and unsafe test outputs, allowing a fully automated testing approach. We conduct an extensive evaluation on well-known LLMs, revealing the following key findings: i) GPT3.5 outperforms other LLMs when acting as the test oracle, accurately detecting unsafe responses, and even surpassing more recent LLMs (e.g., GPT-4), as well as LLMs that are specifically tailored to detect unsafe LLM outputs (e.g., LlamaGuard); ii) the results confirm that our approach can uncover nearly twice as many unsafe LLM behaviors with the same number of test inputs compared to currently used static datasets; and iii) our black-box coverage criterion combined with web browsing can effectively guide the LLM on generating up-to-date unsafe test inputs, significantly increasing the number of unsafe LLM behaviors. [-]
Bildumak
  • Kongresuetara ekarpenak - Ingeniaritza [449]

Zerrendatu honako honen arabera

eBiltegia osoaKomunitateak & bildumakArgitalpen dataren araberaEgileakIzenburuakMateriakIkerketa taldeakNon argitaratuaBilduma hauArgitalpen dataren araberaEgileakIzenburuakMateriakIkerketa taldeakNon argitaratua

Nire kontua

SartuErregistratu

Estatistikak

Ikusi erabilearen inguruko estatistikak

Nork bildua:

OpenAIREBASERecolecta

Nork balioztatua:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Biblioteka
Kontaktua | Iradokizunak
DSpace
 

 

Nork bildua:

OpenAIREBASERecolecta

Nork balioztatua:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Biblioteka
Kontaktua | Iradokizunak
DSpace