eBiltegia

    • Zer da eBiltegia? 
    •   eBiltegiari buruz
    •   Argitaratu irekian zure ikerketa
    • Sarbide Irekia MUn 
    •   Zer da Zientzia Irekia?
    •   Mondragon Unibertsitatearen dokumentu zientifikoetara eta irakaskuntza-materialetara Sarbide Irekia izateko politika instituzionala
    •   Mondragon Unibertsitatearen ikerketa-datuetara Sarbide Irekia izateko Politika instituzionala
    •   Babes digitalerako jarraibideak
    •   Zure argitalpenak jaso eta zabaldu egiten ditu Bibliotekak
    • Euskara
    • Español
    • English

Laguntzailea:

  • Kontaktua
  • Euskara 
    • Euskara
    • Español
    • English
  • eBiltegia buruz  
    • Zer da eBiltegia? 
    •   eBiltegiari buruz
    •   Argitaratu irekian zure ikerketa
    • Sarbide Irekia MUn 
    •   Zer da Zientzia Irekia?
    •   Mondragon Unibertsitatearen dokumentu zientifikoetara eta irakaskuntza-materialetara Sarbide Irekia izateko politika instituzionala
    •   Mondragon Unibertsitatearen ikerketa-datuetara Sarbide Irekia izateko Politika instituzionala
    •   Babes digitalerako jarraibideak
    •   Zure argitalpenak jaso eta zabaldu egiten ditu Bibliotekak
  • Hasi saioa
Ikusi itema 
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Ikerketa-Artikuluak
  • Artikuluak-Ingeniaritza
  • Ikusi itema
  •   eBiltegia MONDRAGON UNIBERTSITATEA
  • Ikerketa-Artikuluak
  • Artikuluak-Ingeniaritza
  • Ikusi itema
JavaScript is disabled for your browser. Some features of this site may not work without it.
No Thumbnail [100%x320]
Ikusi/Ireki
SDRS_A new lossless dimensionality reduction for text corpora.docx (725.6Kb)
Erregistro osoa
Eragina

Web of Science   

Google Scholar
Plum Print visual indicator of research metrics
plumX logo
  • Citations
    • Citation Indexes: 8
  • Captures
    • Readers: 20
see details
Article has an altmetric score of 2
Partekatu
EmailLinkedinFacebookTwitter
Gorde erreferentzia
Mendely

Zotero

untranslated

Mets

Mods

Rdf

Marc

Exportar a BibTeX
Izenburua
SDRS: A new lossless dimensionality reduction for text corpora
Egilea
Velez de Mendizabal, Iñaki
Ezpeleta, Enaitz
Zurutuza, Urko
Egilea (beste erakunde batekoa)
Basto-Fernandes, Vitor
Méndez, José R.
Ikerketa taldea
Análisis de datos y ciberseguridad
Beste instituzio
Instituto Universitário de Lisboa (Iscte)
Universidade de Vigo
Instituto de Investigación Sanitaria Galicia Sur (IISGS)
Bertsioa
Postprinta
Eskubideak
© 2020 Elsevier Ltd.
Sarbidea
Sarbide bahitua
URI
https://hdl.handle.net/20.500.11984/1693
Argitaratzailearen bertsioa
https://doi.org/10.1016/j.ipm.2020.102249
Non argitaratua
Information Processing & Management  Vol. 57. N. 4. n. artículo 102249,
Argitaratzailea
Elsevier Ltd.
Gako-hitzak
Spam filtering
Token-based representation
Synset-based representation
Semantic-based feature reduction ... [+]
Spam filtering
Token-based representation
Synset-based representation
Semantic-based feature reduction
Multi-objective evolutionary algorithms [-]
Laburpena
In recent years, most content-based spam filters have been implemented using Machine Learning (ML) approaches by means of token-based representations of textual contents. After introducing multiple pe ... [+]
In recent years, most content-based spam filters have been implemented using Machine Learning (ML) approaches by means of token-based representations of textual contents. After introducing multiple performance enhancements, the impact has been virtually irrelevant. Recent studies have introduced synset-based content representations as a reliable way to improve classification, as well as different forms to take advantage of semantic information to address problems, such as dimensionality reduction. These preliminary solutions present some limitations and enforce simplifications that must be gradually redefined in order to obtain significant improvements in spam content filtering. This study addresses the problem of feature reduction by introducing a new semantic-based proposal (SDRS) that avoids losing knowledge (lossless). Synset-features can be semantically grouped by taking advantage of taxonomic relations (mainly hypernyms) provided by BabelNet ontological dictionary (e.g. “Viagra” and “Cialis” can be summarized into the single features “anti-impotence drug”, “drug” or “chemical substance” depending on the generalization of 1, 2 or 3 levels). In order to decide how many levels should be used to generalize each synset of a dataset, our proposal takes advantage of Multi-Objective Evolutionary Algorithms (MOEA) and particularly, of the Non-dominated Sorting Genetic Algorithm (NSGA-II). We have compared the performance achieved by a Naïve Bayes classifier, using both token-based and synset-based dataset representations, with and without executing dimensional reductions. As a result, our lossless semantic reduction strategy was able to find optimal semantic-based feature grouping strategies for the input texts, leading to a better performance of Naïve Bayes classifiers. [-]
Sponsorship
Gobierno de España
Projectu ID
GE/Programa Estatal de Investigacion, Desarrollo e Innovación orientada a los retos de la sociedad en el marco del Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016, convocatoria del 2017/TIN2017-84658-C2-2-R/Integración de Conocimiento Semántico para el Filtrado de Spam basado en Contenido/SKI4SPAM
Bildumak
  • Artikuluak - Ingeniaritza [743]

Zerrendatu honako honen arabera

eBiltegia osoaKomunitateak & bildumakArgitalpen dataren araberaEgileakIzenburuakMateriakIkerketa taldeakNon argitaratuaBilduma hauArgitalpen dataren araberaEgileakIzenburuakMateriakIkerketa taldeakNon argitaratua

Nire kontua

SartuErregistratu

Estatistikak

Ikusi erabilearen inguruko estatistikak

Nork bildua:

OpenAIREBASERecolecta

Nork balioztatua:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Biblioteka
Kontaktua | Iradokizunak
DSpace
 

 

Nork bildua:

OpenAIREBASERecolecta

Nork balioztatua:

OpenAIRERebiun
MONDRAGON UNIBERTSITATEA | Biblioteka
Kontaktua | Iradokizunak
DSpace
 

 

NoThumbnail
Posted by 2 X users
25 readers on Mendeley
See more details