SlideShare a Scribd company logo
1 of 24
ElasticSearch on AWS
                      Real Estate portal Case Study (Spitogatos.gr)


                                                    AWSUG GR meetup #7
                                                      27 September 2012




                                                    Andreas Chatzakis
                                     co-founder / IT Director – Spitogatos.gr

Event sponsored by:                                       @achatzakis on twitter
http://geekandpoke.typepad.com/geekandpoke/2010/09/instant-search.html
#about_us
Helping you find a property

Finding a property in Greece is complex, lacks transparency.
We make life easier for househunters via:
     Powerful search functionality
          Web & Mobile
          Location & Criteria
     Quality content
          Listings (we love photos)
          Articles
     mySpitogatos
          Email alerts
          Save your search
          Favorite listings & notes
          Contact the realtors


                                                                          4
Realtors love us too!

Professionals need help in those turbulent times.
We add value in multiple ways:
     Cost effective promotion & high quality leads
          Targeted channel (very)
          Leads already filtered (we ve seen the fotos!)
     Technology services for realtors
          Turnkey web site solution
          Listing synchronization web service
     B2B via Spitogatos Network (SpiN) business
      network / collaboration tool for realtors
     Channel for foreign buyers via the English version




                                                                                    5
#background
To Search is to Find

Search is central to what we do
   Users searching for property come with structured criteria of huge variety
        Athens Center, residential - flat or studio, for sale, 100-150k €, 85-120 sq meter,
         with a garage
        Athens Center & N.Kosmos, residential - flat, for sale, 75-100k €, 70-100 sq meter,
         2+ bedrooms, only show listings with photos
        Piraeus centre or Mikrolimano, commercial – store, for rent, 500-750 € per
         month, only listings with recently reduced price
        Monetize: # of Listings grouped by paying member + above criteria
        IPhone app → Listings within geo-rectangle + above criteria
        As a result, caching is rarely our friend!
   We used to think Lucene/Solr, ElasticSearch, CloudSearch etc were only useful
    for text search, not adding value for structured search




                                                                                           G
   Have been insisting on trying to optimize MySQL (multi column indices etc)




                                                                                      N
    while throwing replicas to the problem.




                                                                                     O
                                                                                   R
                                                                                               7
Why ElasticSearch

Selected elasticSearch after a (very) brief research* on alternatives:
   AWS's own Cloudsearch:
        Zero management service: nice!
        Not available on eu-west-1
        Currently lacks ES functionality (e.g. geospatial, non english analyzers)
   Sphinx
        Easy MySQL integration
        How do you scale it?*
   Solr
        Industry standard
        Seems like it is conceived as somehow harder to scale/operate*?
   ElasticSearch:
        Piece of cake to setup on AWS (stay tuned!)
        Super distributed, scales & is easy on IT ops (more on that later!)
                                                      * Disclaimer: We did not go through a
                                                                                              8
                                                       detailed product selection process!
#elasticsearch
ElasticSearch basics

A distributed, RESTful Search engine built on top of Lucene
   Free Schema
        JSON documents
        Analyzers
        Boost levels
   Easy & flexible Search
        Lucene query string or JSON based search query DSL
        Facets & Highlighting
        Spatial search
        Custom scripts
   Multi Tenancy
        Store & search across multiple indices
        Each with its own settings
        Use-case: Logs – recent in memory, old on disk

                                                                                 10
Scaling ElasticSearch

Designed from the ground up to be Scalable & Highly Available
   Distributed
        Indices automatically broken into shards
        Replicas for read performance & availability
        Multiple cluster nodes, each hosting 1+ shards/replicas
        peer2peer, each node can delegate operations to other nodes
        Add,remove nodes at will
              Rebalancing & routing automagically behind the scenes
   Discovery
        Multicast or unicast (declarative)
   Gateway
        Allows recovery in case all nodes go down
        Local or shared storage
        Async replication in case of shared storage

                                                                                       11
A scale-up example

Assume a cluster with 4 shards and 1 replica configuration
   1 node example – Status Yellow



   2 nodes example – Status Green



   3 nodes example




     : Primary shard              : Replica shard              : Master node               : Regular node

Master node maintains cluster state, acts if nodes join or leave the cluster by reassigning shards.         12
ElasticSearch on AWS

2 modules make deployment on AWS a breeze
   EC2 discovery
        Filter by security group, AZ, tags
              Requires IAM user with certain EC2 privileges:
               DescribeAvailabilityZones, DescribeInstances, DescribeRegions,
               DescribeSecurityGroups, DescribeTags
       Very useful in autoscaling setups with ephemeral servers
   S3 gateway
        Long term reliable async persistency of cluster state and indices
        Allows deployment without EBS volumes
        Still, local gateway with EBS volumes performs better (less network used,
         faster recovery)
        Won't protect from accidental deletion of index (deletion will propagate to
         shared storage)


                                                                                       13
#implementation
Indexation

Indexation of Spitogatos.gr ads
   DB is still the “source of truth”
        We propagate DELETEs synchronously, INSERTs & UPDATEs asynchronously
              KISS: Cron job (re) indexes never or least-recently indexed listings
              ORM marks new/modified listings as never-indexed (so they go first)
   Location: Multivalue field instead of nested set model in the DB
        e.g. this property is in Greece, Attica, Piraeus, Port of Piraeus
        Property will be included in results when I search for any of the above.
   Flat schema
        Searchable listing owner fields are included in the document (vs a JOIN in our DB)
        Changes to other tables might lead to large # of listings requiring reindexation
         (e.g. real estate agent becomes a paying member)




                                                                                               15
Index Integrity

Making sure our index is consistent with the DB
   Scrutineer ( https://github.com/Aconex/scrutineer )
        Compares DB and ElasticSearch index for mismatches
             exists in ES but not on DB (or vice versa)
             ES version not up to date
        Relies on “_version” field - is incremented via our ORM onChange
        When indexing we explicitly set versioning to “external”
        Had to “hack” it as it doesn't work with EC2 discovery module
           http://labs.spitogatos.gr/?p=45




                                                                                  16
Search – Shards & Routing

How does ElasticSearch decide in which shard to store a doc?
   By default this is done based on hash of document id
   Can be ovverriden while indexing and while searching (routing parameter)
   We shard based on hash of the id of area id
       - Most users search for listings within a specific area
       - We hit only a single shard for a large percentage of the searches.




           No routing                                                Routing by
           specificed                                                specific areaId

                                                                                         17
Search – Flat Schema, Facets & Scoring

We rely a lot on ElasticSearch's Flat Schema, Facets & Scoring
   No joins due to flat schema => fast!
   Multivalue fields => fast filtering for listings in areas of various hierarchy levels
   Facets functionality returns list of paying agents with # listings matching criteria
   Old slow ranking algorithm replaced by elasticSearch scoring functionality
        used to go through our DB and refresh score
             ad age is part of the equation
        Now ES computes this dynamically on every search
        We use custom scoring
        We can modify scoring algorithm and see changes instantly
             no need to recalculate scores for all listings




                                                                                            18
Monitoring

Sematext SPM offers a (currently free) ES monitoring solution
   Cluster Health       Search rate & latency      Disk
   Index Stats          Cache                      Network
   Shard Stats          CPU & RAM                  JVM & GC




                                                                          19
Tooling

ElasticSearch-Head is a GUI for browsing /interacting with a cluster




                                                                       20
Backups

 We take periodic copies from the Gateway
    Cause the Gateway is no cure for accidental deletions or bugs
    S3cmd syncs S3 gateway contents to local folder
         Expect some errors here as files get deleted/modified
    Disables snapshots to gateway
    Syncs again (no errors this time and much faster)
    Reenables snapshots to gateway
    Zips local folder contents, splits into smaller files & uploads to secondary S3 bucket




Get the script here: http://labs.spitogatos.gr/?p=17


                                                                                              21
Learnings

Issues & leasons learned:
   Faceted search can return wrong (smaller) results (on multiple shards)
        Due to the way sorting/merging is done
        Increase facet size field depending on cardinallity of faceted field
   We use Elastica – a PHP client for ElasticSearch - https://github.com/ruflin/Elastica
        Lacking Document Routing and Version Type support
        Our own Jerry Manolarakis on a pull request to add setRouting, setVersionType
   Filters vs queries (Query DSL)
        Filters perform an order of magnitude better than plain queries since no scoring is
         performed and they are automatically cached.
   Do it! Your DB will thank you




CPU Utilization                                  Response time pattern

                                                                                               22
Read more
    Useful resources:

   https://speakerdeck.com/u/jmikola/p/symfony-live-london-elasticsearch
   http://blog.sematext.com/2010/05/03/elastic-search-distributed-lucene/
   http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010
   http://www.slideshare.net/kucrafal/scaling-massive-elastic-search-clusters-rafa-ku-sematext


    Need help integrating ElasticSearch to your app?




    http://bacterials.net/


                                                     Follow us on twitter: @spitogatosLabs
                                                 Check out our blog: http://labs.spitogatos.gr

                                                                                             23
#questions

More Related Content

What's hot

Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Amazon Web Services
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsYelp Engineering
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsAmazon Web Services
 
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBAmar Das
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Amazon Web Services
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksAmazon Web Services
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansRightScale
 
Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017Amazon Web Services
 
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...Amazon Web Services
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetupstevemcpherson
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageAmazon Web Services
 
Cloud Storage in Azure, AWS and Google Cloud
Cloud  Storage in Azure, AWS and Google CloudCloud  Storage in Azure, AWS and Google Cloud
Cloud Storage in Azure, AWS and Google CloudThurupathan Vijayakumar
 
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...Amazon Web Services
 

What's hot (20)

Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
Consolidate MySQL Shards Into Amazon Aurora Using AWS Database Migration Serv...
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
 
Aws Kinesis
Aws KinesisAws Kinesis
Aws Kinesis
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
 
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech TalksA Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
A Deeper Dive into Apache MXNet - March 2017 AWS Online Tech Talks
 
AWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It MeansAWS re:Invent 2016 Recap: What Happened, What It Means
AWS re:Invent 2016 Recap: What Happened, What It Means
 
Beyond EC2 and S3
Beyond EC2 and S3Beyond EC2 and S3
Beyond EC2 and S3
 
Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017
 
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
Cloud Storage in Azure, AWS and Google Cloud
Cloud  Storage in Azure, AWS and Google CloudCloud  Storage in Azure, AWS and Google Cloud
Cloud Storage in Azure, AWS and Google Cloud
 
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
 

Viewers also liked

Ansible pill09wp
Ansible pill09wpAnsible pill09wp
Ansible pill09wpIdeato
 
Key considerations when adopting cloud: expectations vs hurdles
Key considerations when adopting cloud: expectations vs hurdlesKey considerations when adopting cloud: expectations vs hurdles
Key considerations when adopting cloud: expectations vs hurdlesScalr
 
Scalr cost analytics talk
Scalr cost analytics talkScalr cost analytics talk
Scalr cost analytics talkScalr
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and ElasticsearchDean Hamstead
 
CCCEU14 - A Real World Outlook on Hybrid Cloud: Why and How
CCCEU14 - A Real World Outlook on Hybrid Cloud: Why and HowCCCEU14 - A Real World Outlook on Hybrid Cloud: Why and How
CCCEU14 - A Real World Outlook on Hybrid Cloud: Why and HowScalr
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data OutOpenThink Labs
 
Personalize Expedia Hotel Searches
Personalize Expedia Hotel SearchesPersonalize Expedia Hotel Searches
Personalize Expedia Hotel Searcheslethalamby
 
Scalr - Open Source Cloud Management
Scalr - Open Source Cloud Management Scalr - Open Source Cloud Management
Scalr - Open Source Cloud Management Arvind Palanisamy
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningPetar Djekic
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)MongoSF
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Stephan Ewen
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Machine Learning Travel Industry
Machine Learning   Travel IndustryMachine Learning   Travel Industry
Machine Learning Travel IndustryVijay PG
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.
 
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...Amazon Web Services
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 

Viewers also liked (20)

DynamoDB for PHP sessions
DynamoDB for PHP sessionsDynamoDB for PHP sessions
DynamoDB for PHP sessions
 
Scalr Demo
Scalr DemoScalr Demo
Scalr Demo
 
Ansible pill09wp
Ansible pill09wpAnsible pill09wp
Ansible pill09wp
 
Key considerations when adopting cloud: expectations vs hurdles
Key considerations when adopting cloud: expectations vs hurdlesKey considerations when adopting cloud: expectations vs hurdles
Key considerations when adopting cloud: expectations vs hurdles
 
Scalr cost analytics talk
Scalr cost analytics talkScalr cost analytics talk
Scalr cost analytics talk
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
CCCEU14 - A Real World Outlook on Hybrid Cloud: Why and How
CCCEU14 - A Real World Outlook on Hybrid Cloud: Why and HowCCCEU14 - A Real World Outlook on Hybrid Cloud: Why and How
CCCEU14 - A Real World Outlook on Hybrid Cloud: Why and How
 
03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out03. ElasticSearch : Data In, Data Out
03. ElasticSearch : Data In, Data Out
 
Selling Umbraco - CodeGarden 2015
Selling Umbraco - CodeGarden 2015Selling Umbraco - CodeGarden 2015
Selling Umbraco - CodeGarden 2015
 
Personalize Expedia Hotel Searches
Personalize Expedia Hotel SearchesPersonalize Expedia Hotel Searches
Personalize Expedia Hotel Searches
 
Scalr - Open Source Cloud Management
Scalr - Open Source Cloud Management Scalr - Open Source Cloud Management
Scalr - Open Source Cloud Management
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Machine Learning Travel Industry
Machine Learning   Travel IndustryMachine Learning   Travel Industry
Machine Learning Travel Industry
 
Selling umbraco
Selling umbracoSelling umbraco
Selling umbraco
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigDat...
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 

Similar to ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)

ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Laure Vergeron
 
AWS case study: real estate portal
AWS case study: real estate portalAWS case study: real estate portal
AWS case study: real estate portalAndreas Chatzakis
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...Amazon Web Services
 
Kubernetes in 15 minutes
Kubernetes in 15 minutesKubernetes in 15 minutes
Kubernetes in 15 minutesrhirschfeld
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for ElasticsearchJodok Batlogg
 
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Laure Vergeron
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Scaling the Content Repository with Elasticsearch
Scaling the Content Repository with ElasticsearchScaling the Content Repository with Elasticsearch
Scaling the Content Repository with ElasticsearchNuxeo
 
Clouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & AvereClouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & AvereAvere Systems
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWSAmazon Web Services Korea
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solrmacrochen
 
Zenko: Enabling Data Control in a Multi-cloud World
Zenko: Enabling Data Control in a Multi-cloud WorldZenko: Enabling Data Control in a Multi-cloud World
Zenko: Enabling Data Control in a Multi-cloud WorldScality
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
Building a Resilient, Scalable, Storage System with OpenStack
Building a Resilient, Scalable, Storage System with OpenStackBuilding a Resilient, Scalable, Storage System with OpenStack
Building a Resilient, Scalable, Storage System with OpenStackCloudian
 
OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)Randy Bias
 

Similar to ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) (20)

ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018Zenko @Cloud Native Foundation London Meetup March 6th 2018
Zenko @Cloud Native Foundation London Meetup March 6th 2018
 
The Power of Elasticsearch
The Power of ElasticsearchThe Power of Elasticsearch
The Power of Elasticsearch
 
AWS case study: real estate portal
AWS case study: real estate portalAWS case study: real estate portal
AWS case study: real estate portal
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
Kubernetes in 15 minutes
Kubernetes in 15 minutesKubernetes in 15 minutes
Kubernetes in 15 minutes
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for Elasticsearch
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
Zenko & MetalK8s @ Dublin Docker Meetup, June 2018
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Introduction to AWS tools
Introduction to AWS toolsIntroduction to AWS tools
Introduction to AWS tools
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Scaling the Content Repository with Elasticsearch
Scaling the Content Repository with ElasticsearchScaling the Content Repository with Elasticsearch
Scaling the Content Repository with Elasticsearch
 
Clouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & AvereClouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & Avere
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
 
Elastic search apache_solr
Elastic search apache_solrElastic search apache_solr
Elastic search apache_solr
 
Zenko: Enabling Data Control in a Multi-cloud World
Zenko: Enabling Data Control in a Multi-cloud WorldZenko: Enabling Data Control in a Multi-cloud World
Zenko: Enabling Data Control in a Multi-cloud World
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Building a Resilient, Scalable, Storage System with OpenStack
Building a Resilient, Scalable, Storage System with OpenStackBuilding a Resilient, Scalable, Storage System with OpenStack
Building a Resilient, Scalable, Storage System with OpenStack
 
OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)

  • 1. ElasticSearch on AWS Real Estate portal Case Study (Spitogatos.gr) AWSUG GR meetup #7 27 September 2012 Andreas Chatzakis co-founder / IT Director – Spitogatos.gr Event sponsored by: @achatzakis on twitter
  • 4. Helping you find a property Finding a property in Greece is complex, lacks transparency. We make life easier for househunters via:  Powerful search functionality  Web & Mobile  Location & Criteria  Quality content  Listings (we love photos)  Articles  mySpitogatos  Email alerts  Save your search  Favorite listings & notes  Contact the realtors 4
  • 5. Realtors love us too! Professionals need help in those turbulent times. We add value in multiple ways:  Cost effective promotion & high quality leads  Targeted channel (very)  Leads already filtered (we ve seen the fotos!)  Technology services for realtors  Turnkey web site solution  Listing synchronization web service  B2B via Spitogatos Network (SpiN) business network / collaboration tool for realtors  Channel for foreign buyers via the English version 5
  • 7. To Search is to Find Search is central to what we do  Users searching for property come with structured criteria of huge variety  Athens Center, residential - flat or studio, for sale, 100-150k €, 85-120 sq meter, with a garage  Athens Center & N.Kosmos, residential - flat, for sale, 75-100k €, 70-100 sq meter, 2+ bedrooms, only show listings with photos  Piraeus centre or Mikrolimano, commercial – store, for rent, 500-750 € per month, only listings with recently reduced price  Monetize: # of Listings grouped by paying member + above criteria  IPhone app → Listings within geo-rectangle + above criteria  As a result, caching is rarely our friend!  We used to think Lucene/Solr, ElasticSearch, CloudSearch etc were only useful for text search, not adding value for structured search G  Have been insisting on trying to optimize MySQL (multi column indices etc) N while throwing replicas to the problem. O R 7
  • 8. Why ElasticSearch Selected elasticSearch after a (very) brief research* on alternatives:  AWS's own Cloudsearch:  Zero management service: nice!  Not available on eu-west-1  Currently lacks ES functionality (e.g. geospatial, non english analyzers)  Sphinx  Easy MySQL integration  How do you scale it?*  Solr  Industry standard  Seems like it is conceived as somehow harder to scale/operate*?  ElasticSearch:  Piece of cake to setup on AWS (stay tuned!)  Super distributed, scales & is easy on IT ops (more on that later!) * Disclaimer: We did not go through a 8 detailed product selection process!
  • 10. ElasticSearch basics A distributed, RESTful Search engine built on top of Lucene  Free Schema  JSON documents  Analyzers  Boost levels  Easy & flexible Search  Lucene query string or JSON based search query DSL  Facets & Highlighting  Spatial search  Custom scripts  Multi Tenancy  Store & search across multiple indices  Each with its own settings  Use-case: Logs – recent in memory, old on disk 10
  • 11. Scaling ElasticSearch Designed from the ground up to be Scalable & Highly Available  Distributed  Indices automatically broken into shards  Replicas for read performance & availability  Multiple cluster nodes, each hosting 1+ shards/replicas  peer2peer, each node can delegate operations to other nodes  Add,remove nodes at will  Rebalancing & routing automagically behind the scenes  Discovery  Multicast or unicast (declarative)  Gateway  Allows recovery in case all nodes go down  Local or shared storage  Async replication in case of shared storage 11
  • 12. A scale-up example Assume a cluster with 4 shards and 1 replica configuration  1 node example – Status Yellow  2 nodes example – Status Green  3 nodes example : Primary shard : Replica shard : Master node : Regular node Master node maintains cluster state, acts if nodes join or leave the cluster by reassigning shards. 12
  • 13. ElasticSearch on AWS 2 modules make deployment on AWS a breeze  EC2 discovery  Filter by security group, AZ, tags  Requires IAM user with certain EC2 privileges: DescribeAvailabilityZones, DescribeInstances, DescribeRegions, DescribeSecurityGroups, DescribeTags  Very useful in autoscaling setups with ephemeral servers  S3 gateway  Long term reliable async persistency of cluster state and indices  Allows deployment without EBS volumes  Still, local gateway with EBS volumes performs better (less network used, faster recovery)  Won't protect from accidental deletion of index (deletion will propagate to shared storage) 13
  • 15. Indexation Indexation of Spitogatos.gr ads  DB is still the “source of truth”  We propagate DELETEs synchronously, INSERTs & UPDATEs asynchronously  KISS: Cron job (re) indexes never or least-recently indexed listings  ORM marks new/modified listings as never-indexed (so they go first)  Location: Multivalue field instead of nested set model in the DB  e.g. this property is in Greece, Attica, Piraeus, Port of Piraeus  Property will be included in results when I search for any of the above.  Flat schema  Searchable listing owner fields are included in the document (vs a JOIN in our DB)  Changes to other tables might lead to large # of listings requiring reindexation (e.g. real estate agent becomes a paying member) 15
  • 16. Index Integrity Making sure our index is consistent with the DB  Scrutineer ( https://github.com/Aconex/scrutineer )  Compares DB and ElasticSearch index for mismatches  exists in ES but not on DB (or vice versa)  ES version not up to date  Relies on “_version” field - is incremented via our ORM onChange  When indexing we explicitly set versioning to “external”  Had to “hack” it as it doesn't work with EC2 discovery module  http://labs.spitogatos.gr/?p=45 16
  • 17. Search – Shards & Routing How does ElasticSearch decide in which shard to store a doc?  By default this is done based on hash of document id  Can be ovverriden while indexing and while searching (routing parameter)  We shard based on hash of the id of area id - Most users search for listings within a specific area - We hit only a single shard for a large percentage of the searches. No routing Routing by specificed specific areaId 17
  • 18. Search – Flat Schema, Facets & Scoring We rely a lot on ElasticSearch's Flat Schema, Facets & Scoring  No joins due to flat schema => fast!  Multivalue fields => fast filtering for listings in areas of various hierarchy levels  Facets functionality returns list of paying agents with # listings matching criteria  Old slow ranking algorithm replaced by elasticSearch scoring functionality  used to go through our DB and refresh score  ad age is part of the equation  Now ES computes this dynamically on every search  We use custom scoring  We can modify scoring algorithm and see changes instantly  no need to recalculate scores for all listings 18
  • 19. Monitoring Sematext SPM offers a (currently free) ES monitoring solution  Cluster Health  Search rate & latency  Disk  Index Stats  Cache  Network  Shard Stats  CPU & RAM  JVM & GC 19
  • 20. Tooling ElasticSearch-Head is a GUI for browsing /interacting with a cluster 20
  • 21. Backups We take periodic copies from the Gateway  Cause the Gateway is no cure for accidental deletions or bugs  S3cmd syncs S3 gateway contents to local folder  Expect some errors here as files get deleted/modified  Disables snapshots to gateway  Syncs again (no errors this time and much faster)  Reenables snapshots to gateway  Zips local folder contents, splits into smaller files & uploads to secondary S3 bucket Get the script here: http://labs.spitogatos.gr/?p=17 21
  • 22. Learnings Issues & leasons learned:  Faceted search can return wrong (smaller) results (on multiple shards)  Due to the way sorting/merging is done  Increase facet size field depending on cardinallity of faceted field  We use Elastica – a PHP client for ElasticSearch - https://github.com/ruflin/Elastica  Lacking Document Routing and Version Type support  Our own Jerry Manolarakis on a pull request to add setRouting, setVersionType  Filters vs queries (Query DSL)  Filters perform an order of magnitude better than plain queries since no scoring is performed and they are automatically cached.  Do it! Your DB will thank you CPU Utilization Response time pattern 22
  • 23. Read more Useful resources:  https://speakerdeck.com/u/jmikola/p/symfony-live-london-elasticsearch  http://blog.sematext.com/2010/05/03/elastic-search-distributed-lucene/  http://www.slideshare.net/elasticsearch/elasticsearch-at-berlinbuzzwords-2010  http://www.slideshare.net/kucrafal/scaling-massive-elastic-search-clusters-rafa-ku-sematext Need help integrating ElasticSearch to your app? http://bacterials.net/ Follow us on twitter: @spitogatosLabs Check out our blog: http://labs.spitogatos.gr 23