eBiltegia

    • What is eBiltegia? 
    •   About eBiltegia
    •   Publish your research in open access
    • Open Access at MU 
    •   What is Open Science?
    •   Mondragon Unibertsitatea's Institutional Policy on Open Access to scientific documents and teaching materials
    •   Mondragon Unibertsitatea's Institutional Open Access Policy for Research Data
    •   eBiltegia Digital Preservation Guidelines
    •   The Library compiles and disseminates your publications
    • Euskara
    • Español
    • English

xmlui.dri2xhtml.structural.fecyt

  • Contact Us
  • English 
    • Euskara
    • Español
    • English
  • About eBiltegia  
    • What is eBiltegia? 
    •   About eBiltegia
    •   Publish your research in open access
    • Open Access at MU 
    •   What is Open Science?
    •   Mondragon Unibertsitatea's Institutional Policy on Open Access to scientific documents and teaching materials
    •   Mondragon Unibertsitatea's Institutional Open Access Policy for Research Data
    •   eBiltegia Digital Preservation Guidelines
    •   The Library compiles and disseminates your publications
  • Login
View Item 
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Ikerketa-Kongresuak
  • Kongresuak-Ingeniaritza
  • View Item
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Ikerketa-Kongresuak
  • Kongresuak-Ingeniaritza
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
Thumbnail
View/Open
GSMOTE_v2.pdf (275.8Kb)
Full record
Impact
Google Scholar
Share
EmailLinkedinFacebookTwitter
Save the reference
Mendely

Zotero

untranslated

Mets

Mods

Rdf

Marc

Exportar a BibTeX
Title
Generalized SMOTE: A universal generation oversampling technique for all data types in imbalanced learning
Author
Cernuda, Carlos
Reguera-Bakhache, Daniel
Aguirre, Aitor
Iturbe Urretxa, Mikel
Garitano, Iñaki
Zurutuza, Urko
Research Group
Análisis de datos y ciberseguridad
Version
Postprint
Rights
© Los autores, 2021
Access
Open access
URI
https://hdl.handle.net/20.500.11984/13905
xmlui.dri2xhtml.METS-1.0.item-identifier
https://caepia20-21.uma.es/inicio_files/caepia20-21-actas.pdf
Published at
Conference of the Spanish Association for Artificial Intelligence (CAEPIA)  19. Málaga, 2021
Publisher
CAEPIA
Keywords
Imbalanced Learning
Oversampling Techniques
Abstract
A common problem that arises when facing classification tasks is the class imbalance problem, which happens when one or more classes are heavily underrepresented compared to the rest, being usually th ... [+]
A common problem that arises when facing classification tasks is the class imbalance problem, which happens when one or more classes are heavily underrepresented compared to the rest, being usually those minority classes the ones of interest. A natural solution consists of correcting the imbalance by sampling methods, being Synthetic Minority Oversampling TEchnique (SMOTE) the most widely used method. In the same way as all other oversampling techniques, it relies on using distances/similarities in order to focus on the neighborhoods of minority samples in the synthetic samples generation procedure, thus it is meant for pure numerical data. Nevertheless, it is really common to collect categorical data or to discretize numeric attributes as a preprocessing step, being limited to random sampling approaches to correct imbalance. Some approaches have been proposed to deal with mixed-type data or pure categorical data, but they ignore part of the information of the samples or end up being almost random approaches. We propose GSMOTE, a generalization of SMOTE method, suitable for any data type. For the neighborhoods determination, the distance between samples is obtained by means of a trans formation of Gower’s General Similarity Coefficient into a novel General Distance Coefficient, in which the part corresponding to the way of measuring similarities between categories in categorical variables has been replaced by a recently presented similarity measure called Variable Entropy measure, inspired by Shannon’s Entropy. GSMOTE has been tested on six public imbalanced datasets, with different characteristics and imbalance levels. [-]
Funder
Gobierno Vasco
Gobierno de España
Program
Elkartek 2021
Programa Estatal de Investigación, Desarrollo e Innovación orientada a los retos de la sociedad en el marco del Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016, convocatoria del 2017
Number
KK-2021-00091
TIN2017-84658-C2-2-R
Award URI
Sin información
Sin información
Project
REal tiME control and embeddeD securitY (REMEDY)
Integración de Conocimiento Semántico para el Filtrado de Spam basado en Contenido (SKI4SPAM)
Collections
  • Conferences - Engineering [431]

Browse

All of eBiltegiaCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsResearch groupsPublished atThis CollectionBy Issue DateAuthorsTitlesSubjectsResearch groupsPublished at

My Account

LoginRegister

Statistics

View Usage Statistics

Harvested by:

OpenAIREBASERecolecta

Validated by:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Library
Contact Us | Send Feedback
DSpace
 

 

Harvested by:

OpenAIREBASERecolecta

Validated by:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Library
Contact Us | Send Feedback
DSpace