New approaches for content-based analysis towards online social network spam detection

Ezpeleta, Enaitz

dc.contributor.advisor	Zurutuza Ortega, Urko
dc.contributor.advisor	Gómez Hidalgo, José María
dc.contributor.author	Ezpeleta, Enaitz
dc.date.accessioned	2019-05-22T07:23:21Z
dc.date.available	2019-05-22T07:23:21Z
dc.date.issued	2016
dc.date.submitted	2016-09-30
dc.identifier.other	https://katalogoa.mondragon.edu/janium-bin/janium_login_opac.pl?find&ficha_no=126069	en
dc.identifier.uri	https://hdl.handle.net/20.500.11984/1212
dc.description.abstract	Unsolicited email campaigns remain as one of the biggest threats affecting millions of users per day. Although spam filtering techniques are capable of detecting significant percentage of the spam messages, the problem is far from being solved, specially due to the total amount of spam traffic that flows over the Internet, and new potential attack vectors used by malicious users. The deeply entrenched use of Online Social Networks (OSNs), where millions of users share unconsciously any kind of personal data, offers a very attractive channel to attackers. Those sites provide two main interesting areas for malicious activities: exploitation of the huge amount of information stored in the profiles of the users, and the possibility of targeting user addresses and user spaces through their personal profiles, groups, pages... Consequently, new type of targeted attacks are being detected in those communication means. Being selling products, creating social alarm, creating public awareness campaigns, generating traffic with viral contents, fooling users with suspicious attachments, etc. the main purpose of spam messages, those type of communications have a specific writing style that spam filtering can take advantage of. The main objectives of this thesis are: (i) to demonstrate that it is possible to develop new targeted attacks exploiting personalized spam campaigns using OSN information, and (ii) to design and validate novel spam detection methods that help detecting the intentionality of the messages, using natural language processing techniques, in order to classify them as spam or legitimate. Additionally, those methods must be effective also dealing with the spam that is appearing in OSNs. To achieve the first objective a system to design and send personalized spam campaigns is proposed. We extract automatically users’ public information from a well known social site. We analyze it and design different templates taking into account the preferences of the users. After that, different experiments are carried out sending typical and personalized spam. The results show that the click-through rate is considerably improved with this new strategy. In the second part of the thesis we propose three novel spam filtering methods. Those methods aim to detect non-evident illegitimate intent in order to add valid information that is used by spam classifiers. To detect the intentionality of the texts, we hypothesize that sentiment analysis and personality recognition techniques could provide new means to differentiate spam text from legitimate one. Taking into account this assumption, we present three different methods: the first one uses sentiment analysis to extract the polarity feature of each analyzed text, thus we analyze the optimistic or pessimistic attitude of spam messages compared to legitimate texts. The second one uses personality recognition techniques to add personality dimensions (Extroversion/Introversion, Thinking/Feeling, Judging/ Perceiving and Sensing/iNtuition) to the spam filtering process; and the last one is a combination of the two previously mentioned techniques. Once the methods are described, we experimentally validate the proposed approaches in three different types of spam: email spam, SMS spam and spam from a popular OSN.	en
dc.description.abstract	Hartzailearen baimenik gabe bidalitako mezuak (spam) egunean milioika erabiltzaileri eragiten dien mehatxua dira. Nahiz eta spam detekzio tresnek gero eta emaitza hobeagoak lortu, arazoa konpontzetik oso urruti dago oraindik, batez ere spam kopuruari eta erasotzaileen estrategia berriei esker. Hori gutxi ez eta azken urteetan sare sozialek izan duten erabiltzaile gorakadaren ondorioz, non milioika erabiltzailek beraien datu pribatuak publiko egiten dituzten, gune hauek oso leku erakargarriak bilakatu dira erasotzaileentzat. Batez ere bi arlo interesgarri eskaintzen dituzte webgune hauek: profiletan pilatutako informazio guztiaren ustiapena, eta erabiltzaileekin harreman zuzena izateko erraztasuna (profil bidez, talde bidez, orrialde bidez...). Ondorioz, gero eta ekintza ilegal gehiago atzematen ari dira webgune hauetan. Spam mezuen helburu nagusienak zerbait saldu, alarma soziala sortu, sentsibilizazio kanpainak martxan jarri, etab. izaki, mezu mota hauek eduki ohi duten idazketa mezua berauen detekziorako erabilia izan daiteke. Lan honen helburu nagusiak ondorengoak dira: alde batetik, sare sozialetako informazio publikoa erabiliz egungo detekzio sistemak saihestuko dituen spam pertsonalizatua garatzea posible dela erakustea; eta bestetik hizkuntza naturalaren prozesamendurako teknikak erabiliz, testuen intentzionalitatea atzeman eta spam-a detektatzeko metodologia berriak garatzea. Gainera, sistema horiek sare sozialetako spam mezuekin lan egiteko gaitasuna ere izan beharko dute. Lehen helburu hori lortzekolan honetan spam pertsonalizatua diseinatu eta bidaltzeko sistema bat aurkeztu da. Era automatikoan erabiltzaileen informazio publikoa ateratzen dugu sare sozial ospetsu batetik, ondoren informazio hori aztertu eta txantiloi ezberdinak garatzen ditugu erabiltzaileen iritziak kontuan hartuaz. Behin hori egindakoan, hainbat esperimentu burutzen ditugu spam normala eta pertsonalizatua bidaliz, bien arteko emaitzen ezberdintasuna alderatzeko. Tesiaren bigarren zatian hiru spam atzemate metodologia berri aurkezten ditugu. Berauen helburua tribialak ez den intentzio komertziala atzeman ta hori baliatuz spam mezuak sailkatzean datza. Intentzionalitate hori lortze aldera, analisi sentimentala eta pertsonalitate detekzio teknikak erabiltzen ditugu. Modu honetan, hiru sistema ezberdin aurkezten dira hemen: lehenengoa analisi sentimentala soilik erabiliz, bigarrena lan honetarako pertsonalitate detekzio teknikek eskaintzen dutena aztertzen duena, eta azkenik, bien arteko konbinazioa. Tresna hauek erabiliz, balidazio esperimentala burutzen da proposatutako sistemak eraginkorrak diren edo ez aztertzeko, hiru mota ezberdinetako spam-arekin lan eginez: email spam-a, SMS spam-a eta sare sozial ospetsu bateko spam-a.	eu
dc.format.extent	135	en
dc.language.iso	eng	en
dc.publisher	Mondragon Unibertsitatea. Goi Eskola Politeknikoa	en
dc.rights	© Enaitz Ezpeleta Gallastegi	en
dc.subject	Informática	es
dc.subject	Inteligencia artificial	es
dc.subject	ODS 9 Industria, innovación e infraestructura	es
dc.title	New approaches for content-based analysis towards online social network spam detection	en
dcterms.accessRights	http://purl.org/coar/access_right/c_abf2	en
local.contributor.group	Análisis de datos y ciberseguridad	es
local.description.degree	Programa de Doctorado en Ingeniería	es
local.description.responsability	Presidencia: Manel Medina Llinas (Universitat Politecnica de Catalunya); Vocalía: Igor Santos Grueiro (Universidad de Deusto); Vocalía: Per Magnus Almgren (Chalmers University of Technology); Vocalía: José Ramón Méndez Reboredo (Universidad de Vigo); Secretaría: Iñaki Garitano Garitano (Mondragon Unibertsitatea)	es
local.identifier.doi	https://doi.org/10.48764/k2am-0h44
oaire.format.mimetype	application/pdf
oaire.file	$DSPACE\assetstore
oaire.resourceType	http://purl.org/coar/resource_type/c_db06	en

Ficheros en el ítem

Nombre:: Ezpeleta Gallastegi Enaitz_tes ...
Tamaño:: 822.0Kb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(es)

Tesis - Ingeniería [204]

Registro sencillo