Erregistro soila

dc.contributor.authorCernuda, Carlos
dc.contributor.authorReguera-Bakhache, Daniel
dc.contributor.authorAguirre, Aitor
dc.contributor.authorIturbe Urretxa, Mikel
dc.contributor.authorGaritano, Iñaki
dc.contributor.authorZurutuza, Urko
dc.date.accessioned2025-07-10T09:41:39Z
dc.date.available2025-07-10T09:41:39Z
dc.date.issued2021
dc.identifierhttps://caepia20-21.uma.es/inicio_files/caepia20-21-actas.pdfen
dc.identifier.isbn978-84-09-30514-8en
dc.identifier.otherhttps://katalogoa.mondragon.edu/janium-bin/janium_login_opac.pl?find&ficha_no=164804en
dc.identifier.urihttps://hdl.handle.net/20.500.11984/13905
dc.description.abstractA common problem that arises when facing classification tasks is the class imbalance problem, which happens when one or more classes are heavily underrepresented compared to the rest, being usually those minority classes the ones of interest. A natural solution consists of correcting the imbalance by sampling methods, being Synthetic Minority Oversampling TEchnique (SMOTE) the most widely used method. In the same way as all other oversampling techniques, it relies on using distances/similarities in order to focus on the neighborhoods of minority samples in the synthetic samples generation procedure, thus it is meant for pure numerical data. Nevertheless, it is really common to collect categorical data or to discretize numeric attributes as a preprocessing step, being limited to random sampling approaches to correct imbalance. Some approaches have been proposed to deal with mixed-type data or pure categorical data, but they ignore part of the information of the samples or end up being almost random approaches. We propose GSMOTE, a generalization of SMOTE method, suitable for any data type. For the neighborhoods determination, the distance between samples is obtained by means of a trans formation of Gower’s General Similarity Coefficient into a novel General Distance Coefficient, in which the part corresponding to the way of measuring similarities between categories in categorical variables has been replaced by a recently presented similarity measure called Variable Entropy measure, inspired by Shannon’s Entropy. GSMOTE has been tested on six public imbalanced datasets, with different characteristics and imbalance levels.en
dc.language.isoengen
dc.publisherCAEPIAen
dc.rights© Los autores, 2021en
dc.subjectImbalanced Learningen
dc.subjectOversampling Techniquesen
dc.titleGeneralized SMOTE: A universal generation oversampling technique for all data types in imbalanced learningen
dcterms.accessRightshttp://purl.org/coar/access_right/c_abf2en
dcterms.sourceConference of the Spanish Association for Artificial Intelligence (CAEPIA)en
local.contributor.groupAnálisis de datos y ciberseguridades
local.description.peerreviewedtrueen
local.description.publicationfirstpage108en
local.description.publicationlastpage113en
local.source.details19. Málaga, 2021en
oaire.format.mimetypeapplication/pdfen
oaire.file$DSPACE\assetstoreen
oaire.resourceTypehttp://purl.org/coar/resource_type/c_c94fen
oaire.versionhttp://purl.org/coar/version/c_ab4af688f83e57aaen
oaire.funderNameGobierno Vascoen
oaire.funderNameGobierno de Españaen
oaire.funderIdentifierhttps://ror.org/00pz2fp31 / http://data.crossref.org/fundingdata/funder/10.13039/501100003086en
oaire.funderIdentifierhttps://ror.org/038jjxj40 / http://data.crossref.org/fundingdata/funder/10.13039/501100010198en
oaire.fundingStreamElkartek 2021en
oaire.fundingStreamPrograma Estatal de Investigación, Desarrollo e Innovación orientada a los retos de la sociedad en el marco del Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016, convocatoria del 2017en
oaire.awardNumberKK-2021-00091en
oaire.awardNumberTIN2017-84658-C2-2-Ren
oaire.awardTitleREal tiME control and embeddeD securitY (REMEDY)en
oaire.awardTitleIntegración de Conocimiento Semántico para el Filtrado de Spam basado en Contenido (SKI4SPAM)en
oaire.awardURISin informaciónen
oaire.awardURISin informaciónen


Item honetako fitxategiak

Thumbnail

Item hau honako bilduma honetan/hauetan agertzen da

Erregistro soila