SlideShare a Scribd company logo
1 of 17
Download to read offline
EVENT DETECTION
5th BDE Hang-out “Big Data in Secure societies”13/12/2017
George Giannakopoulos and
Nikiforos Pittaras,
NCSR "Demokritos"
Pilot Architecture
18-déc.-17www.big-data-europe.eu
Event Detection Workflow
18-déc.-17www.big-data-europe.eu
News & Twitter
Crawler
…
Event
Detector
Lookup
Service
ED Workflow: News Crawler
 Runs periodically
 Stores parsed content and metadata to Cassandra
 RSS feeds:
o Crawler conforms with privacy regulations
o Default RSS feeds list to Reuters generic categories
 Direct links to published articles:
o Best-effort parsing
18-déc.-17www.big-data-europe.eu
ED Workflow: Twitter Crawler
 Runs periodically
 Stores parsed content and metadata to Cassandra
 Multiple operation modes:
o Query specified twitter accounts
o Monitor all twitter posts of a specified language
o Keyword-based search
o Parse individual specified posts
18-déc.-17www.big-data-europe.eu
ED Workflow: Cassandra
 Scalable, noSQL distributed database
 I/O scenarios:
1. News & Tweets storage:
o Individual items (news articles or tweets) from the crawlers
2. Event storage:
o Event objects & metadata, as identified by the Event Detector
3. Frontend queries:
o Queries from Sextant about the stored news items and events
18-déc.-17www.big-data-europe.eu
ED Workflow: Event Detector
 Runs periodically
 Distributed execution based on Apache SPARK
Two algorithm steps:
1. Discovers related news items and clusters them into events
2. Produced events are augmented with useful meta-data: date,
locations, images and specified named entities
 Detector algorithm based on
18-déc.-17www.big-data-europe.eu
ED Workflow: ED Algorithm
1) Identify events:
o Gather all unique article pairs
o Extract similarity of members in each pair using graph
representation methods
 If similarity > threshold → related pair
o Form clusters based on related pairs
 If cluster has support > threshold → event
18-déc.-17www.big-data-europe.eu
ED Workflow: ED Algorithm
2) Enrich events:
o Assign individual social media items to events
 Convert to graph-based representation method, similarity-based classification
 If similarity > threshold → attach to event
o Augment events from external metadata extractable from their member
articles and tweets:
 Locations names and geocoordinates (GADM)
 Named entities (Famous people)
 Photographs (Flickr)
18-déc.-17www.big-data-europe.eu
ED Workflow: Location Extraction
 Based on Apache Lucene for fuzzy queries
 Based on the GAMD dataset
o more than 180,000 location names & geometries
 Input: Clean text
 Output: Location name(s) with their corresponding
geocoordinates
18-déc.-17www.big-data-europe.eu
ED Workflow: Entity extraction
Incorporation of semantic metadata extraction
 Augment events by extracting generic named
entities
o Grounded to a unique entity URI
o Highly extensible: entity metadata easily queriable
from additional RESTful APIs, if needed
 APIs & thesauri by the Semantic Web Company
18-déc.-17www.big-data-europe.eu
Text (https://en.wikipedia.org/wiki/The_Godfather#Cast)
ED Workflow: Entity extraction
 Example: famous people thesaurus:
18-déc.-17www.big-data-europe.eu
Extractor
APIhttp://bde.poolparty.biz/People/20
http://bde.poolparty.biz/People/446473
http://bde.poolparty.biz/People/688722
....
Metadata
API
name: Marlon Brando
uri: http://bde.poolparty.biz/People/688722
grounding: http://dbpedia.org/resource/Marlon_Brando
broaders: http://bde.poolparty.biz/People/2
properties: http://www.w3.org/1999/02/22-rdf-syntax-
ns#type
...
Entity metadata Entities
ED Workflow: Detector Scaling
Study on event detection performance scaling
 Distributed execution in Apache SPARK
 Further experiments on two datasets on two different domains
o News articles (Reuters-21578)
o Biomedical scientific publications (bioASQ)
 Up to 10K articles in total (~ 5 mil pairs)
 Technical report draft available upon request
18-déc.-17www.big-data-europe.eu
ED Workflow: Detector Scaling
 Preliminary results on Reuters-21578
 Parallel vs distributed execution time (lower is better)
 Substantial speedup at large enough (> 8K articles) workloads
18-déc.-17www.big-data-europe.eu
ED Workflow: Image extraction
 Enrichment of extracted locations with photographs
o Considers a radial area around the centroid of the
geocoordinates of a location geometry
o Queries the Flickr API for user-uploaded public
photographs within that area
o Filters results to a temporal window relevant to
the date of the event in question
18-déc.-17www.big-data-europe.eu
ED Workflow: Connectivity
Workflow inter-connections
 Automatic triggering of the CD workflow
o Event support calculated during detection
o Triggers if support greater than a specified threshold
 Twitter Crawler source injection
o Targeted consumption of specified posts
 Asynchronous non-blocking operations
18-déc.-17www.big-data-europe.eu
Thank you!
Questions?
Links
 Strabon: http://strabon.di.uoa.gr
 GeoTriples: https://github.com/LinkedEOData/GeoTriples
 Event Detection: https://github.com/big-data-europe/docker-
event-detection
18-déc.-17www.big-data-europe.eu

More Related Content

What's hot

Triple store
Triple storeTriple store
Triple store
Soonho
 

What's hot (20)

Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open Data
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Building stateful apps using serverless
Building stateful apps using serverlessBuilding stateful apps using serverless
Building stateful apps using serverless
 
Building real apps on serverless
Building real apps on serverlessBuilding real apps on serverless
Building real apps on serverless
 
SC4 Hangout - Luigi Selmi, Transport pilot architecture
SC4 Hangout - Luigi Selmi, Transport pilot architectureSC4 Hangout - Luigi Selmi, Transport pilot architecture
SC4 Hangout - Luigi Selmi, Transport pilot architecture
 
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
Piyali Kamra - Analytics and Data Visualization pipeline backed by AWS Glue &...
 
Lightweight Collection and Storage of Software Repository Data with DataRover
Lightweight Collection and Storage of  Software Repository Data with DataRoverLightweight Collection and Storage of  Software Repository Data with DataRover
Lightweight Collection and Storage of Software Repository Data with DataRover
 
Data Science in the Cloud
Data Science in the CloudData Science in the Cloud
Data Science in the Cloud
 
Tran Minh: big data platform in high performance computing at NISCI
Tran Minh: big data platform in high performance computing at NISCITran Minh: big data platform in high performance computing at NISCI
Tran Minh: big data platform in high performance computing at NISCI
 
Workshop introduction-to-rxjs
Workshop introduction-to-rxjsWorkshop introduction-to-rxjs
Workshop introduction-to-rxjs
 
Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5Application as data flow - LSCC Talks #5
Application as data flow - LSCC Talks #5
 
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
Managing and querying large data sets using Data Factory, Cosmos DB and Azure...
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
JanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching ForwardJanusGraph: Looking Backward, Reaching Forward
JanusGraph: Looking Backward, Reaching Forward
 
Graphite
GraphiteGraphite
Graphite
 
Gdal introduction
Gdal introductionGdal introduction
Gdal introduction
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Dsl yodit stanton
Dsl    yodit stantonDsl    yodit stanton
Dsl yodit stanton
 
Triple store
Triple storeTriple store
Triple store
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 

Similar to SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"

Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Thomas Gottron
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
Georgi Kobilarov
 

Similar to SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection" (20)

SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
 
SC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
SC7 Hangout 3: Architecture of the BDE Pilot for Secure SocietiesSC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
SC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
 
Linked open data sandwich
Linked open data sandwichLinked open data sandwich
Linked open data sandwich
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
 
Boost your data analytics with open data and public news content
Boost your data analytics with open data and public news contentBoost your data analytics with open data and public news content
Boost your data analytics with open data and public news content
 
Data Infrastructure in Kumparan
Data Infrastructure in KumparanData Infrastructure in Kumparan
Data Infrastructure in Kumparan
 
Where is the World is my Open Government Data?
Where is the World is my Open Government Data?Where is the World is my Open Government Data?
Where is the World is my Open Government Data?
 
The Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management SystemThe Rhizomer Semantic Content Management System
The Rhizomer Semantic Content Management System
 
Semantic Web & TYPO3
Semantic Web & TYPO3Semantic Web & TYPO3
Semantic Web & TYPO3
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologiesSC4 Workshop 1: Simon Scerri: Existing tools and technologies
SC4 Workshop 1: Simon Scerri: Existing tools and technologies
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
 
Validating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectivesValidating RDF data: Challenges and perspectives
Validating RDF data: Challenges and perspectives
 
Big data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilotsBig data Europe: concept, platform and pilots
Big data Europe: concept, platform and pilots
 
Student Management System
Student Management SystemStudent Management System
Student Management System
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentation
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
 
N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)N hidden gems in forge (as of may '17)
N hidden gems in forge (as of may '17)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Culture Geeks Feb talk: Adventures in Linked Data Land
Culture Geeks Feb talk: Adventures in Linked Data LandCulture Geeks Feb talk: Adventures in Linked Data Land
Culture Geeks Feb talk: Adventures in Linked Data Land
 

More from BigData_Europe

More from BigData_Europe (20)

Luigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator PlatformLuigi Selmi - The Big Data Integrator Platform
Luigi Selmi - The Big Data Integrator Platform
 
Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4Josep Maria Salanova - Introduction to BDE+SC4
Josep Maria Salanova - Introduction to BDE+SC4
 
Rajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO ProjectRajendra Akerkar - LeMO Project
Rajendra Akerkar - LeMO Project
 
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
 
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
 
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
 
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 BDE SC3.3 Workshop -  BDE review: Scope and Opportunities BDE SC3.3 Workshop -  BDE review: Scope and Opportunities
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
 
BDE SC3.3 Workshop - Agenda
 BDE SC3.3 Workshop - Agenda BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - Agenda
 
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re... BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
 
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 BDE SC3.3 Workshop - Data management in WT testing and monitoring  BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Data management in WT testing and monitoring
 
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring BDE SC3.3 Workshop -  Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
 
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics  BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
 
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
 
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
 
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
 
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

SC7 Webinar 5 13/12/2017 NCSR "Demokritos" Presentation "Event Detection"

  • 1. EVENT DETECTION 5th BDE Hang-out “Big Data in Secure societies”13/12/2017 George Giannakopoulos and Nikiforos Pittaras, NCSR "Demokritos"
  • 3. Event Detection Workflow 18-déc.-17www.big-data-europe.eu News & Twitter Crawler … Event Detector Lookup Service
  • 4. ED Workflow: News Crawler  Runs periodically  Stores parsed content and metadata to Cassandra  RSS feeds: o Crawler conforms with privacy regulations o Default RSS feeds list to Reuters generic categories  Direct links to published articles: o Best-effort parsing 18-déc.-17www.big-data-europe.eu
  • 5. ED Workflow: Twitter Crawler  Runs periodically  Stores parsed content and metadata to Cassandra  Multiple operation modes: o Query specified twitter accounts o Monitor all twitter posts of a specified language o Keyword-based search o Parse individual specified posts 18-déc.-17www.big-data-europe.eu
  • 6. ED Workflow: Cassandra  Scalable, noSQL distributed database  I/O scenarios: 1. News & Tweets storage: o Individual items (news articles or tweets) from the crawlers 2. Event storage: o Event objects & metadata, as identified by the Event Detector 3. Frontend queries: o Queries from Sextant about the stored news items and events 18-déc.-17www.big-data-europe.eu
  • 7. ED Workflow: Event Detector  Runs periodically  Distributed execution based on Apache SPARK Two algorithm steps: 1. Discovers related news items and clusters them into events 2. Produced events are augmented with useful meta-data: date, locations, images and specified named entities  Detector algorithm based on 18-déc.-17www.big-data-europe.eu
  • 8. ED Workflow: ED Algorithm 1) Identify events: o Gather all unique article pairs o Extract similarity of members in each pair using graph representation methods  If similarity > threshold → related pair o Form clusters based on related pairs  If cluster has support > threshold → event 18-déc.-17www.big-data-europe.eu
  • 9. ED Workflow: ED Algorithm 2) Enrich events: o Assign individual social media items to events  Convert to graph-based representation method, similarity-based classification  If similarity > threshold → attach to event o Augment events from external metadata extractable from their member articles and tweets:  Locations names and geocoordinates (GADM)  Named entities (Famous people)  Photographs (Flickr) 18-déc.-17www.big-data-europe.eu
  • 10. ED Workflow: Location Extraction  Based on Apache Lucene for fuzzy queries  Based on the GAMD dataset o more than 180,000 location names & geometries  Input: Clean text  Output: Location name(s) with their corresponding geocoordinates 18-déc.-17www.big-data-europe.eu
  • 11. ED Workflow: Entity extraction Incorporation of semantic metadata extraction  Augment events by extracting generic named entities o Grounded to a unique entity URI o Highly extensible: entity metadata easily queriable from additional RESTful APIs, if needed  APIs & thesauri by the Semantic Web Company 18-déc.-17www.big-data-europe.eu
  • 12. Text (https://en.wikipedia.org/wiki/The_Godfather#Cast) ED Workflow: Entity extraction  Example: famous people thesaurus: 18-déc.-17www.big-data-europe.eu Extractor APIhttp://bde.poolparty.biz/People/20 http://bde.poolparty.biz/People/446473 http://bde.poolparty.biz/People/688722 .... Metadata API name: Marlon Brando uri: http://bde.poolparty.biz/People/688722 grounding: http://dbpedia.org/resource/Marlon_Brando broaders: http://bde.poolparty.biz/People/2 properties: http://www.w3.org/1999/02/22-rdf-syntax- ns#type ... Entity metadata Entities
  • 13. ED Workflow: Detector Scaling Study on event detection performance scaling  Distributed execution in Apache SPARK  Further experiments on two datasets on two different domains o News articles (Reuters-21578) o Biomedical scientific publications (bioASQ)  Up to 10K articles in total (~ 5 mil pairs)  Technical report draft available upon request 18-déc.-17www.big-data-europe.eu
  • 14. ED Workflow: Detector Scaling  Preliminary results on Reuters-21578  Parallel vs distributed execution time (lower is better)  Substantial speedup at large enough (> 8K articles) workloads 18-déc.-17www.big-data-europe.eu
  • 15. ED Workflow: Image extraction  Enrichment of extracted locations with photographs o Considers a radial area around the centroid of the geocoordinates of a location geometry o Queries the Flickr API for user-uploaded public photographs within that area o Filters results to a temporal window relevant to the date of the event in question 18-déc.-17www.big-data-europe.eu
  • 16. ED Workflow: Connectivity Workflow inter-connections  Automatic triggering of the CD workflow o Event support calculated during detection o Triggers if support greater than a specified threshold  Twitter Crawler source injection o Targeted consumption of specified posts  Asynchronous non-blocking operations 18-déc.-17www.big-data-europe.eu
  • 17. Thank you! Questions? Links  Strabon: http://strabon.di.uoa.gr  GeoTriples: https://github.com/LinkedEOData/GeoTriples  Event Detection: https://github.com/big-data-europe/docker- event-detection 18-déc.-17www.big-data-europe.eu