SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Building an Open Source, Real-Time,
Billion Object Spatio-Temporal Search
Plaform
2016 International Workshop on Cloud Computing and Big Data
Benjamin Lewis, David Strohschein, Paolo Corti, David Smiley
Center for Geographic Analysis, Harvard University
Background
● Big data is everywhere: sensors (weather, pollution…), mobile devices,
social platform activities, software logs, etc.
● Data are generally streaming, so they are temporal
● Most of those data are spatial as well
● Traditional RDBMS, desktop statistics and visualization packages have
difficulty handling big data
● Current solutions involve “massive parallel software running on a large
number of servers”
Use case
● We work in a research university so we need to provide big data to students and
researchers
● Our goal is to lower barriers to interactive data exploration
● Some systems support visualization of large spatio-temporal datasets but don’t handle
search well
● Many search applications (most search engines) handle text but do not support the
geographic dimension.
● Great need for tool to allow user to interactively search large collections and visualize
them geographically. To support such increasingly common datasets, a new kind of
map server and client is needed.
● Project funded by the Sloan Foundation in partnership with Dataverse team at
Harvard IQSS
Solution
● A general solution. Prototype
with geotagged tweets (tweets
containing GPS coordinates
from originating device)
● Platform adaptable to other
big data spatial time streams
(weather and pollution
sensors, geoRSS feeds etc...)
● Integrate the new platform
within Harvard WorldMap
and Dataverse systems
Objective
● Create a missing piece of geo-infrastructure and make it
available
● Demonstrate possibility of addressing scalability limitations
with non-exotic software and hardware
● Make setting up platforms for big spatio-temporal
visualization as easy as setting up a standard GIS stack
Streaming big data
Geotagged tweets
● Geotagged tweets: tweets containing GPS coordinates from originating
device
● Currently about 2% of tweets are geotagged, about 8 million per day
● The CGA has been harvesting geo-tweets since October 2012 using the
Twitter API
● Billion Object Platform(BOP) will provide a client and API to browse and
search the latest 1 billion geotagged tweets (about 3 months range)
● Command line tools to extract older geotagged tweets from archives
The BOP (Billion Object Platform)
● General purpose, open source platform to support exploration of large collections
of spatio-temporal entities
● Built on top of a search engine
● Supports exploration, visualization, extraction via a RESTful API
● Queryable by time, space, text
● Responsive
● Spatial heatmap to represent the distribution of results (spatial faceting: results
per cell in a grid)
● Support temporal histograms (temporal faceting: results per date time range)
● Support word clouds as a mechanism to enhance results browsing by topic
● Support downloads of subsets for registered users (up to 10,000 features)
● Sentiment stamping
Solution Stack
● Apache Lucene: an indexing and search library
● Apache Solr: a search web server platform built on top of
Lucene
● Apache Kafka: a message broker written in Scala to provide
a platform for handling real-time data streams
● Apache ZooKeeper: enables highly reliable distributed
coordination
● Swagger: a framework for building APIs
● scikit-learn library: Machine Learning in Python
● OpenLayers: a javascript mapping client
● AngularJS: a javascript framework
Search engine features
● Faceted searches (category, space and time)
● Stemming: ability to detect words derived from a common root
● Synonyms detection and controlled vocabulary such as thesauri and taxonomies
● Weighted results
● Wildcard and fuzzy search: provide results for a given term and its common
variations
● Boolean queries: search results using terms and boolean operators such as AND,
OR, NOT…
● Hit highlighting: provides immediate suggestions to the user typing the text to
search
● Stop words: words filtered out during the processing of text
Client to enable data exploration and extraction
API to streaming geotagged tweets
Sentiment Analysis
● Sentiment analysis is a field of study which identifies the opinion of people
expressed in a text using natural language processing tools
● Social media such as Twitter provides a constant source of textual data, many
with an opinion, which can be analyzed using Sentiment Analysis tools.
● Using the scikit-learn library (Machine Learning in Python) we sentiment stamp
as positive or negative each tweet
HHypermap
Similar approach to BOP
(Solr/Lucene): provides a
searchable registry of map
service layers from OGC
and Esri public endpoints

Más contenido relacionado

La actualidad más candente

Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagMicrosoft Mobile Developer
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial dataKudos S.A.S
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みYoichi Kayama
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisRob Emanuele
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 
Building a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQLBuilding a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQLKudos S.A.S
 

La actualidad más candente (6)

Location based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tagLocation based services for Nokia X and Nokia Asha using Geo2tag
Location based services for Nokia X and Nokia Asha using Geo2tag
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
CKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試みCKANへの空間情報機能拡張実装の試み
CKANへの空間情報機能拡張実装の試み
 
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and GeotrellisWorking with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Building a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQLBuilding a Spatial Database in PostgreSQL
Building a Spatial Database in PostgreSQL
 

Destacado

2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan Tchakarov2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan TchakarovIvan Tchakarov
 
Idiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin WritingIdiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin WritingSchalk Cronjé
 
Clivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILEClivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILETavita Vargas
 
Pritam Naik Resume
Pritam Naik ResumePritam Naik Resume
Pritam Naik Resumepritam naik
 
Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011Tavita Vargas
 
ZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HRZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HRLars Clausen
 
Your application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous deliveryYour application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous deliveryDavide Benvegnù
 
DocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital MarketingDocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital MarketingJon Samsel
 
Gradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 versionGradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 versionSchalk Cronjé
 
Voxxed Belgrade 2016
Voxxed Belgrade 2016Voxxed Belgrade 2016
Voxxed Belgrade 2016Karina Popova
 
Кастомная разработка в области E-Commerce
Кастомная разработка в области E-CommerceКастомная разработка в области E-Commerce
Кастомная разработка в области E-CommerceDZ Systems
 

Destacado (14)

2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan Tchakarov2016 New Lighting Technology Ivan Tchakarov
2016 New Lighting Technology Ivan Tchakarov
 
Las plantas
Las plantasLas plantas
Las plantas
 
Idiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin WritingIdiomatic Gradle Plugin Writing
Idiomatic Gradle Plugin Writing
 
Clivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILEClivaje y elecciones de 1851 - CHILE
Clivaje y elecciones de 1851 - CHILE
 
Pritam Naik Resume
Pritam Naik ResumePritam Naik Resume
Pritam Naik Resume
 
Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011Trabajo práctico ayudantía 2011
Trabajo práctico ayudantía 2011
 
ZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HRZOO_DIGITAL_300414 HR
ZOO_DIGITAL_300414 HR
 
Your application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous deliveryYour application ever up-to-date? Go continuous delivery
Your application ever up-to-date? Go continuous delivery
 
Nuevas Tecnologias
Nuevas TecnologiasNuevas Tecnologias
Nuevas Tecnologias
 
DocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital MarketingDocDoc's Guide To Digital Marketing
DocDoc's Guide To Digital Marketing
 
Gradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 versionGradle in 45min - JBCN2-16 version
Gradle in 45min - JBCN2-16 version
 
Voxxed Belgrade 2016
Voxxed Belgrade 2016Voxxed Belgrade 2016
Voxxed Belgrade 2016
 
Java Docs
Java DocsJava Docs
Java Docs
 
Кастомная разработка в области E-Commerce
Кастомная разработка в области E-CommerceКастомная разработка в области E-Commerce
Кастомная разработка в области E-Commerce
 

Similar a Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Platform

Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)ijcsitcejournal
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Fast App development with SwellRT
Fast App development  with SwellRTFast App development  with SwellRT
Fast App development with SwellRTSamer Hassan
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...PyData
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackJustina Petraitytė
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
 
Free remote sensing and GIS data
Free remote sensing and GIS dataFree remote sensing and GIS data
Free remote sensing and GIS dataNopphawanTamkuan
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadataLuis Bermudez
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016Jessie Chuang
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemPlanetek Italia Srl
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibDavid Nzoputa Ofili
 

Similar a Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Platform (20)

Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)Library Information Retrieval (IR) System of University of Cyprus (UCY)
Library Information Retrieval (IR) System of University of Cyprus (UCY)
 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Viswanth_chadalawada_ft_resume
Viswanth_chadalawada_ft_resumeViswanth_chadalawada_ft_resume
Viswanth_chadalawada_ft_resume
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Fast App development with SwellRT
Fast App development  with SwellRTFast App development  with SwellRT
Fast App development with SwellRT
 
Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...Deprecating the state machine: building conversational AI with the Rasa stack...
Deprecating the state machine: building conversational AI with the Rasa stack...
 
Deprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stackDeprecating the state machine: building conversational AI with the Rasa stack
Deprecating the state machine: building conversational AI with the Rasa stack
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
Free remote sensing and GIS data
Free remote sensing and GIS dataFree remote sensing and GIS data
Free remote sensing and GIS data
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
ESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication SystemESA-SAPS: Science Archives Publication System
ESA-SAPS: Science Archives Publication System
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 

Más de Paolo Corti

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019Paolo Corti
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Paolo Corti
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructurePaolo Corti
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesPaolo Corti
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016Paolo Corti
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitariePaolo Corti
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demoPaolo Corti
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionPaolo Corti
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...Paolo Corti
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Paolo Corti
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Paolo Corti
 

Más de Paolo Corti (12)

State of GeoNode 2019
State of GeoNode 2019State of GeoNode 2019
State of GeoNode 2019
 
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
Harvard Hypermap: An Open Source Framework for Making the World’s Geospatial ...
 
Making Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data InfrastructureMaking Temporal Search Central in a Spatial Data Infrastructure
Making Temporal Search Central in a Spatial Data Infrastructure
 
Maintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queuesMaintaining spatial data infrastructures (SDIs) using distributed task queues
Maintaining spatial data infrastructures (SDIs) using distributed task queues
 
Status of WorldMap, 2016
Status of WorldMap, 2016Status of WorldMap, 2016
Status of WorldMap, 2016
 
GeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze UmanitarieGeoNode per il Supporto alle Emergenze Umanitarie
GeoNode per il Supporto alle Emergenze Umanitarie
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demo
 
GeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk ReductionGeoNode for Humanitarian Crisis and Risk Reduction
GeoNode for Humanitarian Crisis and Risk Reduction
 
Geonode 2.0
Geonode 2.0Geonode 2.0
Geonode 2.0
 
L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...L'utilizzo di software fee and open source nello European Forest Fire Informa...
L'utilizzo di software fee and open source nello European Forest Fire Informa...
 
Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...Fire news management in the context of the European Forest Fire Information S...
Fire news management in the context of the European Forest Fire Information S...
 
Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1Developing Geospatial software with Python, Part 1
Developing Geospatial software with Python, Part 1
 

Último

OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 

Último (20)

OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 

Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Platform

  • 1. Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Plaform 2016 International Workshop on Cloud Computing and Big Data Benjamin Lewis, David Strohschein, Paolo Corti, David Smiley Center for Geographic Analysis, Harvard University
  • 2. Background ● Big data is everywhere: sensors (weather, pollution…), mobile devices, social platform activities, software logs, etc. ● Data are generally streaming, so they are temporal ● Most of those data are spatial as well ● Traditional RDBMS, desktop statistics and visualization packages have difficulty handling big data ● Current solutions involve “massive parallel software running on a large number of servers”
  • 3. Use case ● We work in a research university so we need to provide big data to students and researchers ● Our goal is to lower barriers to interactive data exploration ● Some systems support visualization of large spatio-temporal datasets but don’t handle search well ● Many search applications (most search engines) handle text but do not support the geographic dimension. ● Great need for tool to allow user to interactively search large collections and visualize them geographically. To support such increasingly common datasets, a new kind of map server and client is needed. ● Project funded by the Sloan Foundation in partnership with Dataverse team at Harvard IQSS
  • 4. Solution ● A general solution. Prototype with geotagged tweets (tweets containing GPS coordinates from originating device) ● Platform adaptable to other big data spatial time streams (weather and pollution sensors, geoRSS feeds etc...) ● Integrate the new platform within Harvard WorldMap and Dataverse systems
  • 5. Objective ● Create a missing piece of geo-infrastructure and make it available ● Demonstrate possibility of addressing scalability limitations with non-exotic software and hardware ● Make setting up platforms for big spatio-temporal visualization as easy as setting up a standard GIS stack
  • 7. Geotagged tweets ● Geotagged tweets: tweets containing GPS coordinates from originating device ● Currently about 2% of tweets are geotagged, about 8 million per day ● The CGA has been harvesting geo-tweets since October 2012 using the Twitter API ● Billion Object Platform(BOP) will provide a client and API to browse and search the latest 1 billion geotagged tweets (about 3 months range) ● Command line tools to extract older geotagged tweets from archives
  • 8. The BOP (Billion Object Platform) ● General purpose, open source platform to support exploration of large collections of spatio-temporal entities ● Built on top of a search engine ● Supports exploration, visualization, extraction via a RESTful API ● Queryable by time, space, text ● Responsive ● Spatial heatmap to represent the distribution of results (spatial faceting: results per cell in a grid) ● Support temporal histograms (temporal faceting: results per date time range) ● Support word clouds as a mechanism to enhance results browsing by topic ● Support downloads of subsets for registered users (up to 10,000 features) ● Sentiment stamping
  • 9. Solution Stack ● Apache Lucene: an indexing and search library ● Apache Solr: a search web server platform built on top of Lucene ● Apache Kafka: a message broker written in Scala to provide a platform for handling real-time data streams ● Apache ZooKeeper: enables highly reliable distributed coordination ● Swagger: a framework for building APIs ● scikit-learn library: Machine Learning in Python ● OpenLayers: a javascript mapping client ● AngularJS: a javascript framework
  • 10. Search engine features ● Faceted searches (category, space and time) ● Stemming: ability to detect words derived from a common root ● Synonyms detection and controlled vocabulary such as thesauri and taxonomies ● Weighted results ● Wildcard and fuzzy search: provide results for a given term and its common variations ● Boolean queries: search results using terms and boolean operators such as AND, OR, NOT… ● Hit highlighting: provides immediate suggestions to the user typing the text to search ● Stop words: words filtered out during the processing of text
  • 11. Client to enable data exploration and extraction
  • 12. API to streaming geotagged tweets
  • 13. Sentiment Analysis ● Sentiment analysis is a field of study which identifies the opinion of people expressed in a text using natural language processing tools ● Social media such as Twitter provides a constant source of textual data, many with an opinion, which can be analyzed using Sentiment Analysis tools. ● Using the scikit-learn library (Machine Learning in Python) we sentiment stamp as positive or negative each tweet
  • 14. HHypermap Similar approach to BOP (Solr/Lucene): provides a searchable registry of map service layers from OGC and Esri public endpoints