SlideShare una empresa de Scribd logo
1 de 43
Big Data
Elasticsearch Practical
Content
▪ Setup
▪ Introduction
▪ Basics
▪ Search in Depth
▪ Human Language
▪ Aggregations
Setup
1. Go to https://github.com/tomvdbulck/elasticsearchworkshop
2. Make sure the following items have been installed on your machine:
o Java 7 or higher
o Git (if you like a pretty interface to deal with git, try SourceTree)
o Maven
3. Install VirtualBox https://www.virtualbox.org/wiki/Downloads
4. Install Vagrant https://www.vagrantup.com/downloads.html
5. Clone the repository into your workspace
6. Open a command prompt, go to the elasticsearchworkshop folder and run
Introduction
▪ Distributed restful search and analytics
▪ Distributed
- Built to scale horizontally
- Based on Apache Lucene
- High Availability (automatic failover and data replication)
▪ Restful
- RESTful api using JSON over HTTP
▪ Full text search
▪ Document Oriented and Schema free
Introduction
ElasticSearch => Relational DB
Index => Database
Type => Table
Document => Row
Field => Column
Mapping => Schema
Shard => Partition
Introduction
Index
Like a database in relational database
It has a mapping which defines multiple types
Logical namespace which maps to 1 or more primary shards
Type
Like a table, has list of fields which can be attributed to documents of that type
Document
JSON document
Like a row
Is stored in an index, has a type and an id.
Introduction
Field
A document contains a list of fields, key/value pairs
Each field has a field ‘type’ which indicates type of data
Mapping
Is like a schema definition
Each index has a mapping which defines each type within the index
Can be defined explicitly or generated automatically when a document is indexed.
Introduction: Cluster, Nodes
Cluster
Consists of one or more nodes sharing the same cluster name.
Each cluster has 1 master node which is elected automatically
Node
Running instance of elasticsearch
@startup will automatically search for a cluster with the same cluster name
Introduction: Shards
▪ Shard
Single Lucene instance
Low-level worker unit
Elasticsearch distributes shards among nodes automatically
▪ Primary Shard
Each document is stored in a single primary shard
1st indexed on primary shard (by default 5 shards per index)
Then on all replicas of the primary shard (by default 1 replica per shard)
▪ Replica Shard
Each primary can have 0 or more replicas
Has 2 functions
- high availability (failover) - can be promoted to primary
- increase performance - can handle get and search requests
Introduction: Filter vs Query
Although we refer to the query DSL there are 2 DSL’s, the filter DSL and
the query DSL
▪ Filter DSL
A filter ask a yes/no question of every document and is used for fields that contain
exact values
Is the created date in the range 2013 - 2014?
Does the status field contain the term published?
Is the lat_lon field within 10km of a specified point?
▪ Query DSL
Similar to a filter but also asks the question, “how well does this document
match?”
Best matching the words full text search
Containing the word run, but maybe also matching runs, running, jog, or sprint
Containing the words quick, brown, and fox—the closer together they are, the more relevant the
document
Introduction: Filter vs Query
Differences
▪ Filter is quicker, as a query must calculate the relevance score
▪ Goal of a filter is to reduce the amount of documents which need to
be examined by a query
▪ When to use: query for full text search or anytime you need a
relevance score.
Filters for everything else.
Basics
▪ Connection to ElasticSearch
▪ Inserting data
▪ Searching data
▪ Updating data
▪ Deleting Data
▪ Parent - Child
Basics: Connecting to Elasticsearch
▪ Node Client and Transport Client
- Node Client: acts as a node which joins the cluster (same as the
data nodes) - all nodes are aware of each other
▪Better query performance
▪Bigger memory footprint and slower start up
▪Less secure (application tied to the cluster)
- Transport client: connects every time to the cluster
▪No lucene dependencies in your project (unless you use spring
boot ;-)
▪Starts up faster
▪Application decoupled from the cluster
▪Less efficient to access index and execute queries
Basics: Connecting to Elasticsearch
▪ Node Client (if we would use this - we would all form 1 big cluster)
▪ Transport Client (we use this one in the exercises)
Basics: Inserting Data
Basics: Searching Data
▪ Get API
- Retrieve document based on its id
▪ Search API
- Returns a single page of results
Basics: Updating Data
Basics: Deleting Data
▪ Delete a document
▪ Delete an index
- For performing operations on index, use admin client => client.admin()
Basics: Exercises
▪ Time for Exercises
- Begin with exercises in package: be.ordina.wes.exercises.basics
▪ Some hints
- Go to http://localhost:9200/_plugin/marvel
- Choose “sense” in the upper right corner under “Dashboards”
▪ Sense:
- You can see how an index has been created
- You can analyze -> what will the index do with your search query
Search in Depth
▪ Filters
- very important as they are very fast
▪do not calculate relevance
▪are easily cached
▪ Multi-Field Search
Search in Depth: Filters
▪ Range Filter
you also have queries, please note that a query is slower than a filter
Search in Depth: Filters
▪ Term Filter
- Filters on a term (not analyzed)
▪so you must pass the exact term as it exists in the index
▪no automatic conversion of lower - and uppercase
▪The result is automatically cached
- Some filters are automatically cached, if so, this can be overridden
Search in Depth: Multi-Field Search
▪ fields can be boosted
- in the example below subject field is boosted by a factor of 3
Search in Depth: Exercises
▪ Time for Exercises
- Begin with exercises in package:
be.ordina.wes.exercises.advanced_search
Human Language
▪ Use default Analyzers
▪ Inserting stop words
▪ Synonyms
▪ Normalizing
Human Language: Default Analyzers
▪ Ships with a collection of analyzers for most common languages
▪ Have 4 functions
- Tokenize text in individual words
The quick brown foxes → [The, quick, brown, foxes]
- Lowercase tokens
The → the
- Remove common stopwords
[The, quick, brown, foxes] → [quick, brown, foxes]
- Stem tokens to their root form
foxes → fox
Human Language: Default Analyzers
▪ Can also apply transformations specific to a language to make words
more searchable
▪ The english analyzer removes the possessive ‘s
John's → john
▪ The french analyzer removes elisions and diacritics
l'église → eglis
▪ The german analyzer normalizers terms
äußerst → ausserst
Human Language: Default Analyzers
Human Language: Inserting Stop Words
▪ Words which are common to a language but add little to no value for
a search
- default english stopwords
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it,
no, not, of, on, or, such, that, the, their, then, there, these,
they, this, to, was, will, with
▪ Pros
- Performance (disk space is no longer an argument)
▪ Cons
- Reduce our ability to perform certain searches
▪distinguish happy from ‘not happy’
▪search for the band ‘The The’
▪finding Shakespeare’s quotation ‘To be, or not to be’
▪Using the country code for Norway ‘No’
Human Language: Inserting Stop Words
▪ default stopwords can be used via the _lang_ annotation
Human Language: Synonyms
▪ Broaden the scope, not narrow it
▪ No document matches “English queen”, but documents containing
“British monarch” would still be considered a good match
▪ Using the synonym token filter at both index and search time is
redundant.
- At index time a word is replaced by the synonyms
- At search time a query would be converted from “English” to
“english” or “british”
Human Language: Synonyms
Human Language: Normalizing
▪ Removes ‘insignificant’ differences between otherwise identical words
- uppercase vs lowercase
- é to e
▪ Default filters
- lowercase
- asciifolding
- remove diacritics (like ^)
Human Language: Normalizing
▪ Retaining meaning
- When you normalize, you lose meaning (spanish example)
▪ For that reason it is best to index twice
- 1 time - normalized
- 1 time the original form
(this is also a good practice and will generate better results with a
multi-match query)
Human Language: Normalizing
▪ For the exercises not important - but pay attention to the sequence of
the filters as they are applied sequentially.
Languages: Exercises
▪ Time for Exercises
- Begin with exercises in package: be.ordina.wes.exercises.language
Aggregations
▪ Not like search - now we zoom out to get an overview of the data
▪ Allows use to ask sophisticated questions of our data
▪ Uses the same data structures => almost as fast as search
▪ Operates alongside search - so you can do both search and analyze
simultaneously
Aggregations
▪ Buckets
- collection of documents matching criteria
- can be nested
▪ Metrics
- statistics calculated on the documents in a bucket
▪ translation in rough sql terms:
Aggregations
Aggregations
We add a new aggs level to hold the metric.
We then give the metric a name: avg_price.
And finally, we define it as an avg metric over the price field.
Aggregations: Exercises
▪ Time for Exercises
- Begin with exercises in package: be.ordina.wes.exercises.aggregations
Questions or Suggestions?

Más contenido relacionado

La actualidad más candente

Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaLucidworks
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Xapian vs sphinx
Xapian vs sphinxXapian vs sphinx
Xapian vs sphinxpanjunyong
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data ScienceDonald Miner
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB MongoDB
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionLucidworks
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in actionCodemotion
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsItamar
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesItamar
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Federico Panini
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013Roy Russo
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and HadoopDonald Miner
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Lucidworks
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 

La actualidad más candente (20)

Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Xapian vs sphinx
Xapian vs sphinxXapian vs sphinx
Xapian vs sphinx
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Hadoop for Data Science
Hadoop for Data ScienceHadoop for Data Science
Hadoop for Data Science
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
 
Webinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with FusionWebinar: Modern Techniques for Better Search Relevance with Fusion
Webinar: Modern Techniques for Better Search Relevance with Fusion
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
ElasticSearch in action
ElasticSearch in actionElasticSearch in action
ElasticSearch in action
 
The ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch pluginsThe ultimate guide for Elasticsearch plugins
The ultimate guide for Elasticsearch plugins
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Data science and Hadoop
Data science and HadoopData science and Hadoop
Data science and Hadoop
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 

Destacado (6)

thesis
thesisthesis
thesis
 
Unit Testing in AngularJS - CC FE & UX
Unit Testing in AngularJS -  CC FE & UXUnit Testing in AngularJS -  CC FE & UX
Unit Testing in AngularJS - CC FE & UX
 
Frontend Build Tools - CC FE & UX
Frontend Build Tools - CC FE & UXFrontend Build Tools - CC FE & UX
Frontend Build Tools - CC FE & UX
 
Integration testing - A&BP CC
Integration testing - A&BP CCIntegration testing - A&BP CC
Integration testing - A&BP CC
 
IoT: LoRa and Java on the PI
IoT: LoRa and Java on the PIIoT: LoRa and Java on the PI
IoT: LoRa and Java on the PI
 
Introduction to Webpack - Ordina JWorks - CC JS & Web
Introduction to Webpack - Ordina JWorks - CC JS & WebIntroduction to Webpack - Ordina JWorks - CC JS & Web
Introduction to Webpack - Ordina JWorks - CC JS & Web
 

Similar a Big data elasticsearch practical

Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138Jose Portillo
 
Hive and Pig for .NET User Group
Hive and Pig for .NET User GroupHive and Pig for .NET User Group
Hive and Pig for .NET User GroupCsaba Toth
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...David Horvath
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!Paul Borgermans
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
LF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your DocumentationLF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your DocumentationLF_APIStrat
 
Lares from LOW to PWNED
Lares from LOW to PWNEDLares from LOW to PWNED
Lares from LOW to PWNEDChris Gates
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedis Labs
 

Similar a Big data elasticsearch practical (20)

Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138
 
Hive and Pig for .NET User Group
Hive and Pig for .NET User GroupHive and Pig for .NET User Group
Hive and Pig for .NET User Group
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
20120606 Lazy Programmers Write Self-Modifying Code /or/ Dealing with XML Ord...
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Natural Language Processing using Java
Natural Language Processing using JavaNatural Language Processing using Java
Natural Language Processing using Java
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
LF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your DocumentationLF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
LF_APIStrat17_Don't Repeat Yourself - Your API is Your Documentation
 
Lares from LOW to PWNED
Lares from LOW to PWNEDLares from LOW to PWNED
Lares from LOW to PWNED
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
 

Más de JWORKS powered by Ordina

Netflix OSS and HATEOAS deployed on production - JavaLand
Netflix OSS and HATEOAS deployed on production - JavaLandNetflix OSS and HATEOAS deployed on production - JavaLand
Netflix OSS and HATEOAS deployed on production - JavaLandJWORKS powered by Ordina
 
Cc internet of things LoRa and IoT - Innovation Enablers
Cc internet of things   LoRa and IoT - Innovation Enablers Cc internet of things   LoRa and IoT - Innovation Enablers
Cc internet of things LoRa and IoT - Innovation Enablers JWORKS powered by Ordina
 
Big data document and graph d bs - couch-db and orientdb
Big data  document and graph d bs - couch-db and orientdbBig data  document and graph d bs - couch-db and orientdb
Big data document and graph d bs - couch-db and orientdbJWORKS powered by Ordina
 
Big data key-value and column stores redis - cassandra
Big data  key-value and column stores redis - cassandraBig data  key-value and column stores redis - cassandra
Big data key-value and column stores redis - cassandraJWORKS powered by Ordina
 
Documenting your REST API with Swagger - JOIN 2014
Documenting your REST API with Swagger - JOIN 2014Documenting your REST API with Swagger - JOIN 2014
Documenting your REST API with Swagger - JOIN 2014JWORKS powered by Ordina
 
Android secure offline storage - CC Mobile
Android secure offline storage - CC MobileAndroid secure offline storage - CC Mobile
Android secure offline storage - CC MobileJWORKS powered by Ordina
 

Más de JWORKS powered by Ordina (20)

Lagom in Practice
Lagom in PracticeLagom in Practice
Lagom in Practice
 
Netflix OSS and HATEOAS deployed on production - JavaLand
Netflix OSS and HATEOAS deployed on production - JavaLandNetflix OSS and HATEOAS deployed on production - JavaLand
Netflix OSS and HATEOAS deployed on production - JavaLand
 
Cc internet of things @ Thomas More
Cc internet of things @ Thomas MoreCc internet of things @ Thomas More
Cc internet of things @ Thomas More
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
An introduction to Cloud Foundry
An introduction to Cloud FoundryAn introduction to Cloud Foundry
An introduction to Cloud Foundry
 
Cc internet of things LoRa and IoT - Innovation Enablers
Cc internet of things   LoRa and IoT - Innovation Enablers Cc internet of things   LoRa and IoT - Innovation Enablers
Cc internet of things LoRa and IoT - Innovation Enablers
 
Mongodb @ vrt
Mongodb @ vrtMongodb @ vrt
Mongodb @ vrt
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Big data document and graph d bs - couch-db and orientdb
Big data  document and graph d bs - couch-db and orientdbBig data  document and graph d bs - couch-db and orientdb
Big data document and graph d bs - couch-db and orientdb
 
Big data key-value and column stores redis - cassandra
Big data  key-value and column stores redis - cassandraBig data  key-value and column stores redis - cassandra
Big data key-value and column stores redis - cassandra
 
Hadoop bootcamp getting started
Hadoop bootcamp getting startedHadoop bootcamp getting started
Hadoop bootcamp getting started
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Android wear - CC Mobile
Android wear - CC MobileAndroid wear - CC Mobile
Android wear - CC Mobile
 
Clean Code - A&BP CC
Clean Code - A&BP CCClean Code - A&BP CC
Clean Code - A&BP CC
 
Unit testing - A&BP CC
Unit testing - A&BP CCUnit testing - A&BP CC
Unit testing - A&BP CC
 
Documenting your REST API with Swagger - JOIN 2014
Documenting your REST API with Swagger - JOIN 2014Documenting your REST API with Swagger - JOIN 2014
Documenting your REST API with Swagger - JOIN 2014
 
Spring 4 - A&BP CC
Spring 4 - A&BP CCSpring 4 - A&BP CC
Spring 4 - A&BP CC
 
Android secure offline storage - CC Mobile
Android secure offline storage - CC MobileAndroid secure offline storage - CC Mobile
Android secure offline storage - CC Mobile
 
Meteor - JOIN 2015
Meteor - JOIN 2015Meteor - JOIN 2015
Meteor - JOIN 2015
 
Batch Processing - A&BP CC
Batch Processing - A&BP CCBatch Processing - A&BP CC
Batch Processing - A&BP CC
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 

Big data elasticsearch practical

  • 2.
  • 3. Content ▪ Setup ▪ Introduction ▪ Basics ▪ Search in Depth ▪ Human Language ▪ Aggregations
  • 4. Setup 1. Go to https://github.com/tomvdbulck/elasticsearchworkshop 2. Make sure the following items have been installed on your machine: o Java 7 or higher o Git (if you like a pretty interface to deal with git, try SourceTree) o Maven 3. Install VirtualBox https://www.virtualbox.org/wiki/Downloads 4. Install Vagrant https://www.vagrantup.com/downloads.html 5. Clone the repository into your workspace 6. Open a command prompt, go to the elasticsearchworkshop folder and run
  • 5. Introduction ▪ Distributed restful search and analytics ▪ Distributed - Built to scale horizontally - Based on Apache Lucene - High Availability (automatic failover and data replication) ▪ Restful - RESTful api using JSON over HTTP ▪ Full text search ▪ Document Oriented and Schema free
  • 6. Introduction ElasticSearch => Relational DB Index => Database Type => Table Document => Row Field => Column Mapping => Schema Shard => Partition
  • 7. Introduction Index Like a database in relational database It has a mapping which defines multiple types Logical namespace which maps to 1 or more primary shards Type Like a table, has list of fields which can be attributed to documents of that type Document JSON document Like a row Is stored in an index, has a type and an id.
  • 8. Introduction Field A document contains a list of fields, key/value pairs Each field has a field ‘type’ which indicates type of data Mapping Is like a schema definition Each index has a mapping which defines each type within the index Can be defined explicitly or generated automatically when a document is indexed.
  • 9. Introduction: Cluster, Nodes Cluster Consists of one or more nodes sharing the same cluster name. Each cluster has 1 master node which is elected automatically Node Running instance of elasticsearch @startup will automatically search for a cluster with the same cluster name
  • 10. Introduction: Shards ▪ Shard Single Lucene instance Low-level worker unit Elasticsearch distributes shards among nodes automatically ▪ Primary Shard Each document is stored in a single primary shard 1st indexed on primary shard (by default 5 shards per index) Then on all replicas of the primary shard (by default 1 replica per shard) ▪ Replica Shard Each primary can have 0 or more replicas Has 2 functions - high availability (failover) - can be promoted to primary - increase performance - can handle get and search requests
  • 11. Introduction: Filter vs Query Although we refer to the query DSL there are 2 DSL’s, the filter DSL and the query DSL ▪ Filter DSL A filter ask a yes/no question of every document and is used for fields that contain exact values Is the created date in the range 2013 - 2014? Does the status field contain the term published? Is the lat_lon field within 10km of a specified point? ▪ Query DSL Similar to a filter but also asks the question, “how well does this document match?” Best matching the words full text search Containing the word run, but maybe also matching runs, running, jog, or sprint Containing the words quick, brown, and fox—the closer together they are, the more relevant the document
  • 12. Introduction: Filter vs Query Differences ▪ Filter is quicker, as a query must calculate the relevance score ▪ Goal of a filter is to reduce the amount of documents which need to be examined by a query ▪ When to use: query for full text search or anytime you need a relevance score. Filters for everything else.
  • 13. Basics ▪ Connection to ElasticSearch ▪ Inserting data ▪ Searching data ▪ Updating data ▪ Deleting Data ▪ Parent - Child
  • 14. Basics: Connecting to Elasticsearch ▪ Node Client and Transport Client - Node Client: acts as a node which joins the cluster (same as the data nodes) - all nodes are aware of each other ▪Better query performance ▪Bigger memory footprint and slower start up ▪Less secure (application tied to the cluster) - Transport client: connects every time to the cluster ▪No lucene dependencies in your project (unless you use spring boot ;-) ▪Starts up faster ▪Application decoupled from the cluster ▪Less efficient to access index and execute queries
  • 15. Basics: Connecting to Elasticsearch ▪ Node Client (if we would use this - we would all form 1 big cluster) ▪ Transport Client (we use this one in the exercises)
  • 17. Basics: Searching Data ▪ Get API - Retrieve document based on its id ▪ Search API - Returns a single page of results
  • 19. Basics: Deleting Data ▪ Delete a document ▪ Delete an index - For performing operations on index, use admin client => client.admin()
  • 20. Basics: Exercises ▪ Time for Exercises - Begin with exercises in package: be.ordina.wes.exercises.basics ▪ Some hints - Go to http://localhost:9200/_plugin/marvel - Choose “sense” in the upper right corner under “Dashboards” ▪ Sense: - You can see how an index has been created - You can analyze -> what will the index do with your search query
  • 21. Search in Depth ▪ Filters - very important as they are very fast ▪do not calculate relevance ▪are easily cached ▪ Multi-Field Search
  • 22. Search in Depth: Filters ▪ Range Filter you also have queries, please note that a query is slower than a filter
  • 23. Search in Depth: Filters ▪ Term Filter - Filters on a term (not analyzed) ▪so you must pass the exact term as it exists in the index ▪no automatic conversion of lower - and uppercase ▪The result is automatically cached - Some filters are automatically cached, if so, this can be overridden
  • 24. Search in Depth: Multi-Field Search ▪ fields can be boosted - in the example below subject field is boosted by a factor of 3
  • 25. Search in Depth: Exercises ▪ Time for Exercises - Begin with exercises in package: be.ordina.wes.exercises.advanced_search
  • 26. Human Language ▪ Use default Analyzers ▪ Inserting stop words ▪ Synonyms ▪ Normalizing
  • 27. Human Language: Default Analyzers ▪ Ships with a collection of analyzers for most common languages ▪ Have 4 functions - Tokenize text in individual words The quick brown foxes → [The, quick, brown, foxes] - Lowercase tokens The → the - Remove common stopwords [The, quick, brown, foxes] → [quick, brown, foxes] - Stem tokens to their root form foxes → fox
  • 28. Human Language: Default Analyzers ▪ Can also apply transformations specific to a language to make words more searchable ▪ The english analyzer removes the possessive ‘s John's → john ▪ The french analyzer removes elisions and diacritics l'église → eglis ▪ The german analyzer normalizers terms äußerst → ausserst
  • 30. Human Language: Inserting Stop Words ▪ Words which are common to a language but add little to no value for a search - default english stopwords a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with ▪ Pros - Performance (disk space is no longer an argument) ▪ Cons - Reduce our ability to perform certain searches ▪distinguish happy from ‘not happy’ ▪search for the band ‘The The’ ▪finding Shakespeare’s quotation ‘To be, or not to be’ ▪Using the country code for Norway ‘No’
  • 31. Human Language: Inserting Stop Words ▪ default stopwords can be used via the _lang_ annotation
  • 32. Human Language: Synonyms ▪ Broaden the scope, not narrow it ▪ No document matches “English queen”, but documents containing “British monarch” would still be considered a good match ▪ Using the synonym token filter at both index and search time is redundant. - At index time a word is replaced by the synonyms - At search time a query would be converted from “English” to “english” or “british”
  • 34. Human Language: Normalizing ▪ Removes ‘insignificant’ differences between otherwise identical words - uppercase vs lowercase - é to e ▪ Default filters - lowercase - asciifolding - remove diacritics (like ^)
  • 35. Human Language: Normalizing ▪ Retaining meaning - When you normalize, you lose meaning (spanish example) ▪ For that reason it is best to index twice - 1 time - normalized - 1 time the original form (this is also a good practice and will generate better results with a multi-match query)
  • 36. Human Language: Normalizing ▪ For the exercises not important - but pay attention to the sequence of the filters as they are applied sequentially.
  • 37. Languages: Exercises ▪ Time for Exercises - Begin with exercises in package: be.ordina.wes.exercises.language
  • 38. Aggregations ▪ Not like search - now we zoom out to get an overview of the data ▪ Allows use to ask sophisticated questions of our data ▪ Uses the same data structures => almost as fast as search ▪ Operates alongside search - so you can do both search and analyze simultaneously
  • 39. Aggregations ▪ Buckets - collection of documents matching criteria - can be nested ▪ Metrics - statistics calculated on the documents in a bucket ▪ translation in rough sql terms:
  • 41. Aggregations We add a new aggs level to hold the metric. We then give the metric a name: avg_price. And finally, we define it as an avg metric over the price field.
  • 42. Aggregations: Exercises ▪ Time for Exercises - Begin with exercises in package: be.ordina.wes.exercises.aggregations