ELK - What's new and showcases

ELASTICSEARCH & CO.
What’s new?
tech talk @ ferret
Andrii Gakhov

ELK
open source data visualization
platform that allows you to interact
with your data through stunning,
powerful graphics.
distributed, open source search and
analytics engine, designed for
horizontal scalability, reliability, and easy
management.
ﬂexible, open source data collection,
parsing, and enrichment pipeline.
Shield brings enterprise-grade security to Elasticsearch, protecting the entire ELK
stack with encrypted communications, authentication, role-based access control
and auditing.
comprehensive tool that provides you
with complete transparency into the
status of your Elasticsearch
deployment.
Elasticsearch 1.4.4 Kibana 4.0.1
Logstash 1.4.2Marvel
Shield 1.0.1

SHIELD
Security as a Plugin
Security features for Elasticsearch are implemented in a
plugin that you install on each node in your cluster.

ARCHITECTURE NOTES
• The plugin intercepts inbound API calls in order to
enforce authentication and authorization.
• The plugin provides encryption using Secure Sockets
Layer/Transport Layer Security (SSL/TLS) for the
network trafﬁc to and from the Elasticsearch node.
• The plugin uses the API interception layer that
enables authentication and authorization to provide
audit logging capability.

MAIN FEATURES
• User Authentication 
Shield deﬁnes (realm) a known set of users in order to authenticate users that make
requests.The supported realms are esusers and LDAP.
• Authorization 
Shield’s data model for action authorization includes: Secured Resource, Privilege,
Permissions, Role, Users
• Node Authentication and Channel Encryption 
Shield use SSL/TLS to wrap usual node communication over port 9300.When SSL/TLS
is enabled, the nodes validate each other’s certiﬁcates, establishing trust between the
nodes.
• IP Filtering 
Shield provides IP-based access control for Elasticsearch nodes that allows to restrict
which other servers, via their IP address, can connect to Elasticsearch nodes and make
requests.
• Auditing 
The audit functionality in a secure Elasticsearch cluster logs particular events and
activity on that cluster. The events logged include authentication attempts, including
granted and denied access.

KIBANA
Kibana 4 provides dozens of new features that enable you
to compose questions, get answers, and solve problems like
never before.

WHAT’S NEW?
• New interface with D3, drag&drop dashboard builder
• New diagrams:Area Chart, DataTable, MarkdownText Widget, Pie Chart,
Raw Document Widget, Single Metric Widget,Tile Map,Vertical Bar Chart
• Advanced aggregation-based analytics capabilities: Unique counts
(cardinality), Non-date histograms, Ranges, Significant terms, Percentiles etc.
• Expressions-based scripted fields enable you to perform ad-hoc analysis by
performing computations on the fly
• Search result highlighting
• Ability to save searches and visualizations
• Faster dashboard loading due to a reduction in the number HTTP calls
needed to load the page
• SSL encryption for client requests as well as requests to and from
Elasticsearch

WHAT’S NEW? SINCE 1.2.0
• Upgraded to Lucene 4.10.1 release
• New aggregations: percentiles_rank, top_hits, cardinality,
scripted_metric, …
• Added sum of the doc counts of other buckets in terms aggs
• Added support bounding box aggregation on geo_shape/
geo_point data types
• Parent/child optimization
• Added support for scripted upserts
• Fielddata and cache optimisation
• Removed deprecated gateway functionality
• …

PERCENTILES RANK AGGREGATION
A multi-value metrics aggregation that calculates one or more percentile
ranks over numeric values extracted from the aggregated documents.
{
“aggs” : {
“load_time_outlier” : {
“percentile_ranks” : {
“ﬁeld” :“load_time”,
“values” : [15, 30]
}
}
}
}
{
“aggregations” : {
“load_time_outlier” : {
“values” : {
“15”: 92,
“30”: 100
}
}
}
}
Example above shows that 92% of page were loaded within 15 sec, and
100% within 30 sec.

TOP HITS AGGREGATION
A top_hits metric aggregator keeps track of the most relevant document
being aggregated.This aggregator is intended to be used as a sub aggregator,
so that the top matching documents can be aggregated per bucket.
{
“aggs”: {
“top_logs”: {
“top_hits”: {
“sort": [
{
“created_at”: {
“order”:“desc”
}
}
],
“_source”: {
“include”: [
“path”
]
}
}
}
{
“aggregations”: {
“top_logs”: {
“hits”: {
“total”: 180
“hits”: [
{
“_index”:“logs”,
“_type”:“log”,
“_id”:“an893d30mlss”,
“_source”: {
“path”:“/home/user/”
}
sort: [ 1422388801000 ]
…
}

CARDINALITY AGGREGATION
A single-value metrics aggregation that calculates an approximate count of
distinct values. It is based on the HyperLogLog++ algorithm, which counts
based on the hashes of the values with some interesting properties:
• configurable precision, which decides on how to trade memory for accuracy,
• excellent accuracy on low-cardinality sets,
• fixed memory usage: no matter if there are tens or billions of unique values,
memory usage only depends on the configured precision.
{
“aggs” : {
“tags_count” : {
“cardinality” : {
“field” :“tags”,
“precision_threshold”: 100
}
}
}
}
{
“aggregations” : {
“tags_count” : {
“value”: 120002
}
}
}

SCRIPTED METRIC AGGREGATION
A metric aggregation that executes using scripts to provide a metric output.
{
“aggs” : {
"profit": {
"scripted_metric": {
"init_script" : "_agg['transactions'] = []",
"map_script" : "if (doc['type'].value == "sale")
{ _agg.transactions.add(doc['amount'].value) }
else { _agg.transactions.add(-1 * doc['amount'].value) }",
"combine_script" : "profit = 0;
for (t in _agg.transactions) { profit += t };
return profit",
"reduce_script" : "profit = 0;
for (a in _aggs) { profit += a };
return profit"
}
}
}

PROBLEM I
{
“location”: {
“type”:“geo_point”
},
“tags”: {
“type”:“string”,
“index”:“not_analyzed”
},
“text”: {
“index”:“not_analyzed”
}
}
Find most popular tags per location (e.g. grouping by
geohash with precision 10km x 10km)

SOLUTION
use geohash_grid and terms aggregations
{
“aggs”: {
“hotspots”: {
“geohash_grid” : {
“ﬁeld”:“location”,
“precision”: 10
},
"aggs": {
“top_tags": {
"terms": {
“ﬁeld”:“tags”
}
…
}

RESPONSE EXAMPLE
“hotspots”: {
“buckets”: [
{
"key": "dr5rs",
"doc_count": 2
“top_tags”: {
“buckets”: [
{
“key”:“#NY”
“doc_count”: 20001
},
{
“key”:“#Obama”
},
…
]
}
},
…
]
…

PROBLEM II
{
“event”: {
“index”:“not_analyzed"
},
“rating”: {
“type”:“ﬂoat”
}
}
}
Find total number of records and average rating
for events with most number of rating records

SOLUTION
{
“aggs”: {
“top_events”: {
“terms”: {
“ﬁeld”:“event”
},
“aggs”: {
“avg_rating”: {
“avg”: {
“ﬁeld”:“rating”
}
…
}
use terms and avg aggregations

RESPONSE EXAMPLE
“top_events”: {
“buckets”: [
{
“key”:“Venus Berlin”
“doc_count”: 36665,
“avg_rating”: {
“value”: 9.991
}
},
{
“key”:“ITB Berlin”
“avg_rating”: {
“value”: 8.46
}
}
…

PROBLEM III
{
“tags”: {
},
“keywords”: {
“type”:“nested”,
“properties”: {
“lemma”: {
}
}
}
Find top tags for most popular keywords’ lemmas

SOLUTION
{
"aggs": {
"kw": {
"nested": { "path":“keywords" },
"aggs": {
"top_lemmas": {
"terms": { "ﬁeld":“keywords.lemma" },
"aggs": {
"kw_to_tags": {
"reverse_nested": {},
"aggs": {
"top_tags_per_lemma": {
"terms": { "ﬁeld":“tags" }
}
…
}
use nested aggregation together with terms and reverse_nested
aggregations

RESPONSE EXAMPLE
“kw”: {
“doc_count”: 6829872,
“top_lemmas”: {
“buckets”: [
{
“key”:“BMW”
“kw_to_lemma”: {
“top_tags_per_lemma: {
“buckets”: [
{
“key”:“auto”
},
{
“key”:“car”
},
]
…

PROBLEM IV
{
“tags”: {
},
“text”: {
},
“created_at”: {
“type”:“date”
}
}
Find latest tweets for most popular tags

SOLUTION
use terms and top_hits aggregations
{
“aggs”: {
“top_tags”: {
“terms”: {
“ﬁeld”:“tags”
},
“aggs”: {
“top_tweets”: {
“top_hits”: {
“sort": [
{
“created_at”: {
“order”:“desc”
}
}
],
}
…
}

RESPONSE EXAMPLE
“top_tags”: {
“buckets”: [
{
“key”:“#TheDress”
“top_tweets”: {
“hits”: {
“total”: 30000
“hits”: [
{
“_index”:“tweets”,
“_type”:“tweet”,
“_id”:“579024639982202880”,
“_source”: {
“tags”: [ “#TheDress”,“#TheSims4”]
“text”:“just put #TheDress in #TheSims4!”
“created_at”: 2015-03-20T20:00:01
}
sort: [ 1422388801000 ]
…

PROBLEMV
{
“topics”: {
},
“title”: {
“type”:“string”
},
“created_at”: {
“type”:“date”
}
}
Find news that contain “Obama” in title and top topics from all
news regardless the title

SOLUTION
use query_string, global and terms aggregations
{
“query”: {
“query_string”: {
“default_ﬁeld” :“title”,
“query” :“Obama”
}
},
“aggs”: {
“all_news”: {
“global” : {},
“aggs”: {
“top_topics”: {
“terms”: {
“ﬁeld”:“topics”
}
…
}

RESPONSE EXAMPLE
“hits”: {
“total”: 23,
“max_score”: 2.9730792,
“hits”: [
{
“_index”:“news”,
“_type”: ”record”,
“_id”: 6785,
“_score”: 2.9730792,
“_source”: …
},
…
]
},
“all_news”: {
“top_tags”: {
“buckets”: [
{
“key”:“Politics”
}
…

ELK - What's new and showcases

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a ELK - What's new and showcases

Similar a ELK - What's new and showcases (20)

Más de Andrii Gakhov

Más de Andrii Gakhov (20)

Último

Último (20)

ELK - What's new and showcases