AWS Community Day | Midwest 2018
Track 2
Elastic.co's ELK Stack - Platform Agnostic Immutable Infrastructure & Analysis through Configuration - Dan Morgan, Chicago burbs
Handwritten Text Recognition for manuscripts and early printed texts
Platform Agnostic ELK Stack - Normalize, Centralize, Analyze and Visualize Data
1. Elastic.co's ELK Stack - Platform Agnostic
Immutable Infrastructure & Analysis through
Configuration
DAN MORGAN - SOLUTION ARCHITECT – PAYFORMANCE SOLUTIONS
2. About Payformance
u Payformance Solutions’ mission is to establish a smooth “handshake” between Insurance
Companies and Doctors that deliver efficiency, predictability and consistency in value-based
care programs
u Doctors make more money by keeping you healthy
u TrustHub assists Insurance Companies and Doctors with the design, measurement, and
collaboration of value-based reimbursement programs as an independent, transparent, and
neutral third party
u What are they spending their time, money and resources on
3. About me
u 20 years of experience in the industry, mostly in e-commerce & security
u Background is as a F.E.D and Full Stack Developer
u Solutions Architect / Engineering Manager
u 8th startup
u linkedin.com/in/morganize
u twitter.com/morgan_graphics
4. About this talk
u The ELK Stack (Elasticsearch, Logstash & Kibana) is a NoSQL search
engine and DataStore used to normalize, centralize, analyze and
visualize ALL THE THINGS in your environment via configuration and
very little programming. This talk focuses on a platform agnostic
version of Elasticsearch, as compared to AWS Elasticsearch Service,
the pros and cons of both, and walks through a practical approach
to setting up a P.O.C. cluster to visualize the security file on a
CentOS bastion host.
5. About this talk
u The ELK Stack (Elasticsearch, Logstash & Kibana) is a NoSQL search
engine and DataStore used to normalize, centralize, analyze and
visualize ALL THE THINGS in your environment via configuration and
very little programming. This talk focuses on a platform agnostic
version of Elasticsearch, as compared to AWS Elasticsearch Service,
the pros and cons of both, and walks through a practical approach
to setting up a P.O.C. cluster to visualize the security file on a
CentOS bastion host.
This talk would have taken 50+ minutes to present
6. About this talk now
u Payformance uses the E.L.K. stack /w Beats to normalize, centralize,
analyze, and visualize ALL THE THINGS in our environment via
configuration for:
u HIPAA compliance auditing
u Environment monitoring
u Above and Beyond Cloudtrail
u Instance monitoring
u Security monitoring
u Application monitoring
u Process monitoring
7. No deep dives
u No Sharding Strategy’s
u No Groking Examples
u No Schema Design
u No DML Walkthrough
8. Why this matters to Payformance
u We’re a startup
u Small team of talented people, limited resources
u Cash strapped
u Open source vs. paid when it makes sense
u D.I.Y. most things with off the shelf technologies
u Most of our AWS work is done in batch
u Using CF Templates we spin up and provision an AWS environment when we need it and shut
it down when we’re done
u Metrics about all aspects of the environment and the people
accessing the environment are dictated by HIPAA compliance
regulations
9. Why you should consider E.L.K.
u You may have many or even just a few disparate systems that you
need to get information from (monitoring, logging, auditing, status,
etc.)
u Some of which may need augmented data e.g. geo ip data
u But I already have …
u Datadog, New Relic, Nagios, Icinga, Cisco, Tableau, Splunk, Sumo Logic, SQL database, or
any number of other tools designed to monitor, analyze, and display any aspect of an app,
system or network
u A custom built solution L
u No worries, that is great, keep using those tools
u ! except the custom built solution (why do that to yourself?)
10. Why consider the E.L.K. Stack? (cont.)
u Works well with a 3 node setup
u Can scale as needed, just add another node
u E.L.K. Stack is great when you need a single pane of glass
u One Interface to rule them all via Kibana
u Someone else is working on the code
u Logstash eliminates the need for separate processing pipelines to
augment/transform/clean data
u Elasticsearch is a Search Engine
u 95% of setup is done with configuration alone
u You enjoy a challenge
11. What the E.L.K stack does well
u It’s Free
u The E.L.K stack is a robust pipeline that allows you to get insights very quickly into whatever it
is you’re working on without having to do everything from scratch (hours vs. weeks or months)
u Elasticsearch is a search engine first
u Uses an Inverted Index for full-text searches
u Significant advantages over a traditional ANSI SQL database like relevancy, range queries,
fuzzy logic queries and Geo Point/Shape queries
u Logstash = Data Normalization through Configuration = No coding
u n inputs to n outputs
u Kibana is worth it’s code in Gold
u Being able to quickly build a customized dashboard in a few hours vs. weeks is fantastic
12. What the E.L.K stack doesn’t do well
u It’s only Free if you don’t value your time
u It’s Search Engine moonlighting as a Datastore
u Many things that are defaults in normal datastores e.g. security are
expensive paid add-on’s (X-pack)
u Steep learning curve
u Schema maintenance is painful
u Reindexing is a painful and expensive process
u Better at reading than writing (can take up to 1 sec for a write)
u ETL processes are not automated out of the box resulting in a lot of
custom workflows to manage things
14. Options (cont.)
u AWS Elasticsearch Service
u Pros (this is an AWS conference after all):
u Handles some of the PITA qualities of Elasticsearch
u Kibana is automatically integrated with the service
u AWS manages security through network segmentation VPC, IAM, Roles etc.
u Automates node management easily
u Integrates with a ton of AWS services, S3, EBS, Snapshots etc.
u Cons:
u No automation for the PITA qualities of Elasticsearch
u Sharding, Index Rollup, ETL, reindexing etc. (this is not exclusive to AWS)
u Logstash isn’t part of the stack out of the box
u Other AWS services require manual intervention
15. Options (cont.)
u Elastic Cloud
u Pros:
u Multi platform, multi environment, public, private, hybrid, bare metal
u AWS, GCE, Azure etc.
u X-Pack
u Security, Alerting
u Managed by the people who built it
u Centralized management of your complete environment
u single pane of glass
u Swiftype integration GUI for search tuning
u Cons:
u It’s no longer free and can be quite expensive depends on your use case
16. Options (cont.)
u D.I.Y.
u Pros:
u Build exactly what you need, when you need it
u Customize the dashboards the way you want
u Once you figure it out, configuration becomes immutable infrastructure,
infrastructure as code, automation
u Cons:
u Figuring it out can take some time
u Steep learning curve
u As mentioned before, some of the PITA qualities of Elasticsearch require
expensive add-ons or custom workflows to handle some things
17. Use case:
u Monitoring an AWS bastion host for activity (HIPAA)
u Access, Security, Files, Commands, Auditing, Status
u Install
u Filebeat
u Monitor What files are being created, copied, deleted, manipulated
u Read log files of X applications, processes, services
u Metricbeat
u Monitor things like CPU, Memory, Networking throughput
u Auditbeat
u File integrity
u Heartbeat
u Instance Status e.g. ping