Novel automated interactive reinforcement learning framework with a constraint-based supervisor for procedural tasks

Elguea Aguinaco, Iñigo; Aguirre, Aitor; Izagirre, Unai; Inziarte Hidalgo, Ibai; Bogh, Simon; Arana-Arexolaleiba, Nestor

dc.rights.license	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.contributor.author	Elguea Aguinaco, Iñigo
dc.contributor.author	Aguirre, Aitor
dc.contributor.author	Izagirre, Unai
dc.contributor.author	Inziarte Hidalgo, Ibai
dc.contributor.author	Bogh, Simon
dc.contributor.author	Arana-Arexolaleiba, Nestor
dc.date.accessioned	2025-02-13T09:36:03Z
dc.date.available	2025-02-13T09:36:03Z
dc.date.issued	2025
dc.identifier.issn	1872-7409	en
dc.identifier.other	https://katalogoa.mondragon.edu/janium-bin/janium_login_opac.pl?find&ficha_no=179940	en
dc.identifier.uri	https://hdl.handle.net/20.500.11984/6898
dc.description.abstract	Learning to perform procedural motion or manipulation tasks in unstructured or uncertain environments poses significant challenges for intelligent agents. Although reinforcement learning algorithms have demonstrated positive results on simple tasks, the hard-to-engineer reward functions and the impractical amount of trial-and-error iterations these agents require in long-experience streams still present challenges for deployment in industrially relevant environments. In this regard, interactive reinforcement learning has emerged as a promising approach to mitigate these limitations, whereby a human supervisor provides evaluative or corrective feedback to the learning agent during training. However, the requirement of a human-in-the-loop approach throughout the learning process can be impractical for tasks that span several hours. This study aims to overcome this limitation by automating the learning process and substituting human feedback with an artificial supervisor grounded in constraint-based modeling techniques. In contrast to the logical constraints commonly used for conventional reinforcement learning, constraint-based modeling techniques offer enhanced adaptability in terms of conceptualizing and modeling the human knowledge of a task. This modeling capability allows an automated supervisor to acquire a closer approximation to human reasoning by dividing complex tasks into more manageable components and identifying the associated subtask and contextual cues in which the agent is involved. The supervisor then adjusts the evaluative and corrective feedback to suit the specific subtask under consideration. The framework was assessed using three actor-critic agents in a human–robot interaction environment, demonstrating a sample efficiency improvement of 50% and success rates of	es
dc.language.iso	eng	en
dc.publisher	Elsevier	en
dc.rights	© 2024 The Authors	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Automated supervisor	en
dc.subject	Contact-rich manipulation	en
dc.subject	Industrial manipulators	en
dc.subject	Interactive reinforcement learning	en
dc.subject	Sample efficiency	en
dc.subject	Procedural tasks	en
dc.title	Novel automated interactive reinforcement learning framework with a constraint-based supervisor for procedural tasks	en
dc.type	http://purl.org/coar/resource_type/c_6501
dcterms.accessRights	http://purl.org/coar/access_right/c_abf2	en
dcterms.source	Knowledge-Based Systems	en
local.contributor.group	Análisis de datos y ciberseguridad	es
local.contributor.group	Robótica y automatización	es
local.description.peerreviewed	true	en
local.identifier.doi	https://doi.org/10.1016/j.knosys.2024.112870	en
local.rights.publicationfee	APC	en
local.rights.publicationfeeamount	3000	en
local.contributor.otherinstitution	Electrotecnica Alavesa S.L.	es
local.contributor.otherinstitution	Montajes Mantenimiento y Automatismos Eléctricos Navarra	es
local.contributor.otherinstitution	https://ror.org/04m5j1k67	es
local.source.details	Vol. 309. N. art. 112870. 30 January, 2025	en
oaire.format.mimetype	application/pdf	en
oaire.file	$DSPACE\assetstore	en
oaire.resourceType	http://purl.org/coar/resource_type/c_6501	en
oaire.version	http://purl.org/coar/version/c_970fb48d4fbd8a85	en
dc.unesco.tesauro	http://vocabularies.unesco.org/thesaurus/concept3401	en
oaire.funderName	Gobierno Vasco	en
oaire.funderIdentifier	https://ror.org/00pz2fp31 / http://data.crossref.org/fundingdata/funder/10.13039/501100003086	en
oaire.fundingStream	Ikertalde Convocatoria 2022-2025	en
oaire.awardNumber	IT1676-22	en
oaire.awardTitle	Grupo de sistemas inteligentes para sistemas industriales (IKERTALDE 2022-2025)	en
oaire.awardURI	Sin información	en
dc.unesco.clasificacion	http://skos.um.es/unesco6/331101	en