Practical approaches towards IoT dataset generation for security experiments

Sáez de Cámara Garcia, Xabier; Flores Barroso, Jose Luis; Arellano Bartolomé, Cristóbal; Urbieta Artetxe, Aitor; Garitano, Iñaki; Zurutuza, Urko

Ikusi/Ireki

Practical_approaches_towards_IoT_dataset_generation_for_security_experiments__MU_IKERLAN_preprint.pdf (2.758Mb)

Erregistro osoa

Eragina

Partekatu

Gorde erreferentzia

Izenburua

Practical approaches towards IoT dataset generation for security experiments

Egilea

Sáez de Cámara Garcia, Xabier

Flores Barroso, Jose Luis

Arellano Bartolomé, Cristóbal

Urbieta Artetxe, Aitor

Garitano, Iñaki

Zurutuza, Urko

Ikerketa taldea

Análisis de datos y ciberseguridad

Beste erakundeak

Ikerlan

Bertsioa

Preprinta

Dokumentu-mota

Liburu kapitulua

Bahituraren amaiera data

2145-12-31

Hizkuntza

Ingelesa

Eskubideak

Sarbidea

Sarbide bahitua

Argitaratzailea

Elsevier

Gako-hitzak

Botnet
Emulation
Internet of Things
Machine learning ... [+]

Botnet
Emulation
Internet of Things
Machine learning
Network security
testbed
ODS 4 Educación de calidad
ODS 9 Industria, innovación e infraestructura [-]

Gaia (UNESCO Tesauroa)

Datuen babesa

Laburpena

The cybersecurity field has been steadily adopting rapid advances in artificial intelligence (AI) and machine learning (ML) techniques for various purposes, such as threat detection and response, with promising results. Obtaining high-quality data for model training is fundamental to creating robust solutions; however, the scarcity of IoT security datasets remains a limiting factor in developing ML-based security systems for IoT scenarios. Broadly, there are two methods for generating datasets: using physical IoT hardware on operational networks and employing virtualization-based systems. The former provides accurate and representative data but can be costly, time-consuming, difficult to adapt, and potentially risky. On the other hand, the latter offers a safer, more flexible, and cost-effective approach for various research purposes, despite not replicating exact hardware conditions. This chapter will delve into the practical process of dataset generation from the point of view of these two approaches. First, regarding the virtualized approach, we will leverage the recently published Gotham testbed, a reproducible, flexible, and extendable security testbed based on emulated nodes that mixes containerization and virtual machine technologies. This testbed can be used to generate various datasets of network traces, including activities from real malware emulated in the platform or real attack activities from the internet interacting with the testbed. Then, based on the VARIoT project, we will explore the platform and methodology to create datasets of IoT traffic under realistic conditions, including both legitimate and malicious traces, using a laboratory set of physical IoT hardware devices. [-]