SlideShare una empresa de Scribd logo
1 de 45
Pop-up Loft
Log Analytics with AWS
Rajeev Srinivasan
Strategic Solutions Architect
srrajeev@amazon.com
Pop-up Loft
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Austin Clark
Technical Account Manager
auclark@amazon.com
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your pager goes off
at 3:00am.
The servers are
melting!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You VPN in and quick check the logs
Uh oh
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
If you had Amazon Elasticsearch Service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Transition from IT
to DevOps
Increase in IoT and
Mobile Devices
Cloud-based
architectures
Machine-generated data is growing 10x faster than business data
Source: insideBigData - The Exponential Growth of Data, February 16, 2018
THE EXPLOSION OF MACHINE-GENERATED DATA
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Elasticsearch Service is a
fully managed service that makes
it easy to deploy, manage, and
scale Elasticsearch and Kibana
A M A Z O N E L A S T I C S E A R C H S E R V I C E
+
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
B E N E F I T S O F A MA ZO N E LA ST I C SE A R C H SE R VI C E
Supports Open-Source
APIs and Tools
Drop-in replacement with no
need to learn new APIs or skills
Easy to Use
Deploy a production-ready
Elasticsearch cluster in
minutes
Scalable
Resize your cluster with a few
clicks or a single API call
Secure
Deploy into your VPC and restrict
access using security groups and
IAM policies
Highly Available
Replicate across Availability
Zones, with monitoring and
automated self-healing
Tightly Integrated with
Other AWS Services
Seamless data ingestion, security,
auditing and orchestration
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ELASTICSEARCH LEADING USE CASES
Application Monitoring &
Root-cause Analysis
Security Information and Event
Management (SIEM)
IoT & Mobile Business & Clickstream Analytics
Provides developers with a high
performance, self-service operational
monitoring and analytics platform
Enables security practitioners to centralize
and analyze events from across the entire
organization
Gives developers and lines of business
users real-time location-aware insights
into their device fleets
Provides business users with a real-time view
of the performance of their web content and
e-commerce platforms
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Application Monitoring & Root-cause Analysis
C A S E S T U D Y : E XPE D I A
Logs, lots and lots of logs. How to cost effectively monitor logs?
Require centralized logging infrastructure
Did not have the man power to manage infrastructure
P R O B L E M
Quick insights: Able to identify and troubleshoot issues in real-time
Secure: Integrated w/ AWS IAM
Scalable: Cluster sizes are able to grow to accommodate additional log sources
B E N E F I T S
Streaming AWS CloudTrail logs, application logs, and Docker
startup logs to Elasticsearch
Created centralized logging service for all team members
Using Kibana for visualizations and for Elasticsearch queries
S O L U T I O N
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Business and Clickstream Analytics
C A S E S T U D Y : FI NA NC I A L T I ME S
What stories do our readers care about? What’s hot?
Required a custom clickstream analytics solution
Need a solution that delivers analytics in real-time
Did not have a team to manage analytics infrastructure
P R O B L E M
B E N E F I T S
Streaming user data to Amazon ES for analysis. Created their own
custom dashboards for editors and journalists – Lantern.
Lantern - ”shines a light” on reader activity for the editors and
journalists at the FT
Critical tool for making editorial decisions. Daily editorial meetings
start by looking at Lantern dashboard
S O L U T I O N
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reliability : Lantern is used throughout the day by journalists and editors. Relying on Amazon to manage
their systems for maximum uptime.
Cost savings: Able to easily tune their cluster to meet their needs with minimal management overhead
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Software & Internet Financial ServicesEducation Technology BioTech and Pharma
Media and Entertainment Social Media Telecommunications Travel & Transportation
Real Estate Logistics & Operations Publishing Other
A MA ZO N E LA ST I C SE A R C H SE R VI C E C UST O ME R S
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elasticsearch is a distributed solution
You run
clusters of
instances
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data pattern
Amazon ES cluster
logs_01.21.2018
logs_01.22.2018
logs_01.23.2018
logs_01.24.2018
logs_01.25.2018
logs_01.26.2018
logs_01.27.2018
Shard 1
Shard 2
Shard 3
host
request
verb
status
timestamp
etc.
Each index has
multiple shards
Each shard contains
a set of documents
Each document contains
a set of fields and values
One index per day
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elasticsearch documents: structured JSON
{ "verb": "GET",
"ident": "-",
"bytes": 6245,
"@timestamp": "1995-07-01T00:00:01",
"request":
"GET /history/apollo/ HTTP/1.0",
"host": "199.72.81.55",
"authuser": "-",
"@timestamp_utc":
"1995-07-01T04:00:01+00:00",
"timezone": "-0400",
"response": 200 }
• Documents contain fields –
name/value pairs
• Fields can nest
• Value types include text,
numerics, dates, and geo objects
• Field values can be single or array
• When you send documents to
Elasticsearch they should arrive
as JSON
199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Documents are the core entity
ID
Field: value
Field: value
Field: value
Field: value
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon ES stores documents in an index
Amazon Elasticsearch Service domain
ID
Field: value
Field: value
Field: value
Field: value
_bulk API
Index / Type
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Indexes have primary shards
Amazon Elasticsearch Service domain
ID
Field: value
Field: value
Field: value
Field: value
_bulk API
Index / Type
Shards
• Shards are instances of
Apache Lucene
• Shards are the units of
storage and compute in
an Amazon ES domain
• The primary shard count
is fixed at index creation
• Each shard’s set of
documents is distinct
• Lucene creates an index
for each field in each
document
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Indexes have replica shards
Amazon Elasticsearch Service domain
ID
Field: value
Field: value
Field: value
Field: value
_bulk API
Index / Type
Shards
• Replica shards are
distributed copies of
their primaries
• The replica shard count
is dynamic
• Replica shards provide
data redundancy and
parallelism
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Shards are distributed across instances
Amazon Elasticsearch Service domain
ID
Field: value
Field: value
Field: value
Field: value
_bulk API
Index / Type
Shards
Instances
• Primary and replica
are distributed to
different instances
• Distribution is initially
by shard count
• Storage available
limits new shard
allocation as well
• Instances with shards
are called data nodes
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Shards (Lucene) store indexes on disk
Amazon Elasticsearch Service domain
ID
Field: value
Field: value
Field: value
Field: value
_bulk API
Index / Type
Shards
Instances Storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Service architecture
AWS SDK
AWS CLI
AWS CloudFormation
Elasticsearch
data nodes
Elasticsearch
master nodes
Elastic Load
Balancing
IAM
Amazon
CloudWatch
AWS
CloudTrail
Amazon Elasticsearch Service domain
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How you shard is
how you scale
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How many shards?
• Best practice: shards should be < 50GB
• Divide index size by ~40GB to get initial
shard count
• Active shards per instance ~= vCPUs
• Always use at least 1 replica for production!
Example: 2 TB corpus will need 50 primary shards
2,000 GB / 40GB per shard = 50 shards
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Which instance type?
Instance Max Storage* Workload
T2 3.5TB You want to do dev/QA
M3, M4 150TB Your data and queries are “average”
R3, R4 150TB You have higher request volumes, larger
documents, or are using aggregations heavily
C4 150TB You need to support high concurrency
I2, I3 1.5 PB You have XL storage needs
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How many instances?
• The index size will be about the same as the
corpus of source documents
• Double this if you are deploying an index replica
• Size based on storage requirements
• Either local storage or up to 1.5 TB of Amazon
Elastic Block Store (EBS) per instance
Example: a 2 TB corpus will need 4 instances
Assuming a replica and using EBS
Given 1.5 TB of storage per instance, this gives 6TB of storage
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Load data into
Amazon ES
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The components of ingestion
Data source Collect Transform Buffer Deliver
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS-Centric solutions
Amazon Kinesis
Firehose
Amazon CloudWatch Logs
Logstash
AWS IoT
Elasticsearch
data nodes
Elasticsearch
master nodes
Kibana
Data Producers Buffer/
Transform/
Deliver
Amazon ES Cluster Analytics UI
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Lambda architectures
Files
S3 Events
Amazon S3 AWS Lambda
Function
DynamoDB
streams
Amazon
DynamoDB
Table
AWS Lambda
Function
Amazon Elasticsearch Service
Amazon KinesisData
Producers
AWS Lambda
Function
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis Firehose delivery architecture
• For public access
domains
• Easily transform
data
• Serverless with
built-in batching,
index rollover, error
handling
S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
transformed
records
delivery failure
Data transformation
function
transformation failure
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Secure your data
• Pub l i c – IA M
• Pr i v ate – e ndp o i nt
• E nc r yp ti o n
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use IAM for public endpoints
{
"Version": "2012-10-17",
"Statement": [ {
"Effect":...
"Principal": ...
"Action": [...],
"Resource": ...,
"Condition": ...
} ]
}
• Effect: Allow or Deny
• Principal: AWS account ID
• Action
• HTTP verbs
• Service actions
• Resource: Amazon ES
domain/index
• Condition: IP Address
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use security groups for private endpoints
• Private networking between
your VPC and Amazon
Elasticsearch Service
• Traffic does not
traverse the public
internet
• Use IAM security groups for
authentication and access
control
VPC subnet
security group
VPC subnet
security group
Amazon Elasticsearch Service
Data Master
Data
Master
IAM
IAM
Availability
Zone B
Availability
Zone A
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Encrypt your data
Elasticsearch
data nodes
Elasticsearch
master nodes
AWSKMS
• Use your own KMS keys
• Encrypted data at rest on
Amazon ES instances
• Encrypted automatic
snapshots
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Make your domain
more stable
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use dedicated masters
Amazon ES cluster
Dedicated master nodes:
cluster state Data nodes: queries and updates
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Master node recommendations
Number of data nodes Master node instance type
< 10 m3.medium+
< 20 m4.large+
<= 50 c4.xlarge+
50-100 c4.2xlarge+
• In production, always use an odd number of masters, >= 3
• Use fewer, smaller than data nodes
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Use zone awareness
Amazon ES cluster
Availability Zone 1 Availability Zone 2
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CloudWatch Alarms
Name Metric Threshold Periods
ClusterStatus.red Maximum >= 1 1
ClusterIndexWritesBlocked Maximum >= 1 1
CPUUtilization/MasterCPUUtilization Average >= 80% 3
JVMMemoryPressure/Master... Maximum >= 80% 3
FreeStorageSpace Minimum <= (25% of
avail space)
1
AutomatedSnapshotFailure Maximum >= 1 1
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elasticsearch
analysis is
delivered by
Aggregations
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Think buckets and metrics
Buckets – a collection of documents meeting some criterion
Metrics – calculations on the content of buckets
Bucket: time
Metric:count
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dig in to your data in real time
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Run Elasticsearch in the AWS Cloud with
Amazon Elasticsearch Service
Deploy, scale, ingest, secure, monitor, and
analyze
Start sending your log data today!Amazon
Elasticsearch
Service
Pop-up Loft
Hands-on Lab: Log Analytics with AWS
https://github.com/wrbaldwin/da-week/
Pop-up Loft
aws.amazon.com/activate
Everything and Anything You Need to
Get Started with Analytics on AWS

Más contenido relacionado

Más de Amazon Web Services

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAmazon Web Services
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightAmazon Web Services
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotAmazon Web Services
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Amazon Web Services
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?Amazon Web Services
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksAmazon Web Services
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Amazon Web Services
 

Más de Amazon Web Services (20)

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced Attacks
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
 

Log Analytics with AWS: Data Analytics Week at the SF Loft

  • 1. Pop-up Loft Log Analytics with AWS Rajeev Srinivasan Strategic Solutions Architect srrajeev@amazon.com Pop-up Loft © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Austin Clark Technical Account Manager auclark@amazon.com
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your pager goes off at 3:00am. The servers are melting!
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You VPN in and quick check the logs Uh oh
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. If you had Amazon Elasticsearch Service
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Transition from IT to DevOps Increase in IoT and Mobile Devices Cloud-based architectures Machine-generated data is growing 10x faster than business data Source: insideBigData - The Exponential Growth of Data, February 16, 2018 THE EXPLOSION OF MACHINE-GENERATED DATA
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Elasticsearch Service is a fully managed service that makes it easy to deploy, manage, and scale Elasticsearch and Kibana A M A Z O N E L A S T I C S E A R C H S E R V I C E +
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. B E N E F I T S O F A MA ZO N E LA ST I C SE A R C H SE R VI C E Supports Open-Source APIs and Tools Drop-in replacement with no need to learn new APIs or skills Easy to Use Deploy a production-ready Elasticsearch cluster in minutes Scalable Resize your cluster with a few clicks or a single API call Secure Deploy into your VPC and restrict access using security groups and IAM policies Highly Available Replicate across Availability Zones, with monitoring and automated self-healing Tightly Integrated with Other AWS Services Seamless data ingestion, security, auditing and orchestration
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ELASTICSEARCH LEADING USE CASES Application Monitoring & Root-cause Analysis Security Information and Event Management (SIEM) IoT & Mobile Business & Clickstream Analytics Provides developers with a high performance, self-service operational monitoring and analytics platform Enables security practitioners to centralize and analyze events from across the entire organization Gives developers and lines of business users real-time location-aware insights into their device fleets Provides business users with a real-time view of the performance of their web content and e-commerce platforms
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application Monitoring & Root-cause Analysis C A S E S T U D Y : E XPE D I A Logs, lots and lots of logs. How to cost effectively monitor logs? Require centralized logging infrastructure Did not have the man power to manage infrastructure P R O B L E M Quick insights: Able to identify and troubleshoot issues in real-time Secure: Integrated w/ AWS IAM Scalable: Cluster sizes are able to grow to accommodate additional log sources B E N E F I T S Streaming AWS CloudTrail logs, application logs, and Docker startup logs to Elasticsearch Created centralized logging service for all team members Using Kibana for visualizations and for Elasticsearch queries S O L U T I O N © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business and Clickstream Analytics C A S E S T U D Y : FI NA NC I A L T I ME S What stories do our readers care about? What’s hot? Required a custom clickstream analytics solution Need a solution that delivers analytics in real-time Did not have a team to manage analytics infrastructure P R O B L E M B E N E F I T S Streaming user data to Amazon ES for analysis. Created their own custom dashboards for editors and journalists – Lantern. Lantern - ”shines a light” on reader activity for the editors and journalists at the FT Critical tool for making editorial decisions. Daily editorial meetings start by looking at Lantern dashboard S O L U T I O N © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reliability : Lantern is used throughout the day by journalists and editors. Relying on Amazon to manage their systems for maximum uptime. Cost savings: Able to easily tune their cluster to meet their needs with minimal management overhead
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Software & Internet Financial ServicesEducation Technology BioTech and Pharma Media and Entertainment Social Media Telecommunications Travel & Transportation Real Estate Logistics & Operations Publishing Other A MA ZO N E LA ST I C SE A R C H SE R VI C E C UST O ME R S
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Elasticsearch is a distributed solution You run clusters of instances
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data pattern Amazon ES cluster logs_01.21.2018 logs_01.22.2018 logs_01.23.2018 logs_01.24.2018 logs_01.25.2018 logs_01.26.2018 logs_01.27.2018 Shard 1 Shard 2 Shard 3 host request verb status timestamp etc. Each index has multiple shards Each shard contains a set of documents Each document contains a set of fields and values One index per day
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Elasticsearch documents: structured JSON { "verb": "GET", "ident": "-", "bytes": 6245, "@timestamp": "1995-07-01T00:00:01", "request": "GET /history/apollo/ HTTP/1.0", "host": "199.72.81.55", "authuser": "-", "@timestamp_utc": "1995-07-01T04:00:01+00:00", "timezone": "-0400", "response": 200 } • Documents contain fields – name/value pairs • Fields can nest • Value types include text, numerics, dates, and geo objects • Field values can be single or array • When you send documents to Elasticsearch they should arrive as JSON 199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Documents are the core entity ID Field: value Field: value Field: value Field: value
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon ES stores documents in an index Amazon Elasticsearch Service domain ID Field: value Field: value Field: value Field: value _bulk API Index / Type
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Indexes have primary shards Amazon Elasticsearch Service domain ID Field: value Field: value Field: value Field: value _bulk API Index / Type Shards • Shards are instances of Apache Lucene • Shards are the units of storage and compute in an Amazon ES domain • The primary shard count is fixed at index creation • Each shard’s set of documents is distinct • Lucene creates an index for each field in each document
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Indexes have replica shards Amazon Elasticsearch Service domain ID Field: value Field: value Field: value Field: value _bulk API Index / Type Shards • Replica shards are distributed copies of their primaries • The replica shard count is dynamic • Replica shards provide data redundancy and parallelism
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Shards are distributed across instances Amazon Elasticsearch Service domain ID Field: value Field: value Field: value Field: value _bulk API Index / Type Shards Instances • Primary and replica are distributed to different instances • Distribution is initially by shard count • Storage available limits new shard allocation as well • Instances with shards are called data nodes
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Shards (Lucene) store indexes on disk Amazon Elasticsearch Service domain ID Field: value Field: value Field: value Field: value _bulk API Index / Type Shards Instances Storage
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Service architecture AWS SDK AWS CLI AWS CloudFormation Elasticsearch data nodes Elasticsearch master nodes Elastic Load Balancing IAM Amazon CloudWatch AWS CloudTrail Amazon Elasticsearch Service domain
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How you shard is how you scale
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How many shards? • Best practice: shards should be < 50GB • Divide index size by ~40GB to get initial shard count • Active shards per instance ~= vCPUs • Always use at least 1 replica for production! Example: 2 TB corpus will need 50 primary shards 2,000 GB / 40GB per shard = 50 shards
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Which instance type? Instance Max Storage* Workload T2 3.5TB You want to do dev/QA M3, M4 150TB Your data and queries are “average” R3, R4 150TB You have higher request volumes, larger documents, or are using aggregations heavily C4 150TB You need to support high concurrency I2, I3 1.5 PB You have XL storage needs
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How many instances? • The index size will be about the same as the corpus of source documents • Double this if you are deploying an index replica • Size based on storage requirements • Either local storage or up to 1.5 TB of Amazon Elastic Block Store (EBS) per instance Example: a 2 TB corpus will need 4 instances Assuming a replica and using EBS Given 1.5 TB of storage per instance, this gives 6TB of storage
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Load data into Amazon ES
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The components of ingestion Data source Collect Transform Buffer Deliver
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS-Centric solutions Amazon Kinesis Firehose Amazon CloudWatch Logs Logstash AWS IoT Elasticsearch data nodes Elasticsearch master nodes Kibana Data Producers Buffer/ Transform/ Deliver Amazon ES Cluster Analytics UI
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Lambda architectures Files S3 Events Amazon S3 AWS Lambda Function DynamoDB streams Amazon DynamoDB Table AWS Lambda Function Amazon Elasticsearch Service Amazon KinesisData Producers AWS Lambda Function
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Firehose delivery architecture • For public access domains • Easily transform data • Serverless with built-in batching, index rollover, error handling S3 bucket source records data source source records Amazon Elasticsearch Service Firehose delivery stream transformed records delivery failure Data transformation function transformation failure
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Secure your data • Pub l i c – IA M • Pr i v ate – e ndp o i nt • E nc r yp ti o n
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use IAM for public endpoints { "Version": "2012-10-17", "Statement": [ { "Effect":... "Principal": ... "Action": [...], "Resource": ..., "Condition": ... } ] } • Effect: Allow or Deny • Principal: AWS account ID • Action • HTTP verbs • Service actions • Resource: Amazon ES domain/index • Condition: IP Address
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use security groups for private endpoints • Private networking between your VPC and Amazon Elasticsearch Service • Traffic does not traverse the public internet • Use IAM security groups for authentication and access control VPC subnet security group VPC subnet security group Amazon Elasticsearch Service Data Master Data Master IAM IAM Availability Zone B Availability Zone A
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encrypt your data Elasticsearch data nodes Elasticsearch master nodes AWSKMS • Use your own KMS keys • Encrypted data at rest on Amazon ES instances • Encrypted automatic snapshots
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Make your domain more stable
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use dedicated masters Amazon ES cluster Dedicated master nodes: cluster state Data nodes: queries and updates
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Master node recommendations Number of data nodes Master node instance type < 10 m3.medium+ < 20 m4.large+ <= 50 c4.xlarge+ 50-100 c4.2xlarge+ • In production, always use an odd number of masters, >= 3 • Use fewer, smaller than data nodes
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use zone awareness Amazon ES cluster Availability Zone 1 Availability Zone 2
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CloudWatch Alarms Name Metric Threshold Periods ClusterStatus.red Maximum >= 1 1 ClusterIndexWritesBlocked Maximum >= 1 1 CPUUtilization/MasterCPUUtilization Average >= 80% 3 JVMMemoryPressure/Master... Maximum >= 80% 3 FreeStorageSpace Minimum <= (25% of avail space) 1 AutomatedSnapshotFailure Maximum >= 1 1
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Elasticsearch analysis is delivered by Aggregations
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Think buckets and metrics Buckets – a collection of documents meeting some criterion Metrics – calculations on the content of buckets Bucket: time Metric:count
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dig in to your data in real time
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Run Elasticsearch in the AWS Cloud with Amazon Elasticsearch Service Deploy, scale, ingest, secure, monitor, and analyze Start sending your log data today!Amazon Elasticsearch Service
  • 44. Pop-up Loft Hands-on Lab: Log Analytics with AWS https://github.com/wrbaldwin/da-week/
  • 45. Pop-up Loft aws.amazon.com/activate Everything and Anything You Need to Get Started with Analytics on AWS