Meshify is the IOT platform focused on wireless sensor technology for industrial/insurance IOT. This talk will provide an overview of how Meshify is using Scylla. It will also explain why, when everything else in Meshify’s platform is moving to a managed cloud service or a container based microservice, why and how the Scylla nodes are the only pet “seamonsters” in Meshify’s platform.
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
1. Meshify: A Case study
or Petshop Seamonsters
Sam Kenkel
DevOps Lead, Meshify
2. Presenter bio: Sam Kenkel
DevOps lead at Meshify.
I’ve been building, maintaining, troubleshooting high
performance databases my entire career.
Made the transition from bare metal to VMs. From Monoliths
to Microservices. From On Prem to the Cloud.
Now I run the DevOps for Meshify, where I keep a highly
performant, scalable, and resilient platform working, as well
as the tooling to aid Meshify’s developers.
4. Meshify
Who are we?
▪ Founded in 2010
▪ Purchased by Hartford Steam Boiler 2016.
▪ We provide the “platform” for HSB and its parent company
Munich:Re to develop IOT products for insurance.
7. What we do: With Scylla
▪ We ingest data from sensors
▪ That data is processed,
augmented and stored by golang
microservices
▪ The data is compared to user
defined alarms
▪ We send alerts (SMS, Email,
Webhook)
▪ We allow API calls for access to
the time-series data.
▪ Frontend allows for visualization
of historical data
▪ All time-series Data is stored in
Scylla, such as:
• Values of every sensor
• Every time a sensor enters or
exits an alarm
• Every time our platform sends
a notification
9. Microservices 101
As many parts of our application as possible are stateless:
▪ Containers can be relaunched
• Important when the container or the host die (which they always
do).
• Can be scaled when more sensors are on our platform.
▪ The idea of “Pets” vs “Cattle” (you want servers to be cattle)
▪ But what about state?
10. State inside containers:
Have containers that attach to a persistent disk (EBS for example),
on launch.
▪ This means the container can preserve state.
• The application still must be failure tolerant (handling a failure mid-
write/ mid operation)
11. State in Managed services (i.e. Serverless):
Another common option is store state in database.
A managed cloud database. Let AWS/ GCP/ AZURE patch, update,
backup etc.
▪ Sometimes called “NoOps”
13. Vendor Neutrality:
We only use cloud services that have drop-in replacements (open-
source or from other cloud vendors)
▪ AWS RDS MySQL:
• You can go back to a server on AWS, Azure or GCP
• Or you can migrate to CloudSQL (GCP) or Azure Database (Azure)
▪ DynamoDB: Great options to migrate to DynamoDB. Options for
migrating away?
14. There is no cloud;
Only someone else’s server.
15. Data Locality, Database Performance
Using an AMI means I can answer with confidence: What region is
my data in? What performance tuning has been set?
▪ Managed Services are still on a server somewhere. Location
might have legal implications (GDPR)
▪ Managed Services are making tradeoffs for performance tuning
vs cost.
▪ Scylla’s performance comes from more direct access
to/knowledge of the hardware.
17. AWS AMI importance
Scylla’s Pre-tuned AWS AMI allows for rapid, consistent nodes
▪ Time to Deploy a new node: <5 minutes.
▪ No Variance from misconfiguration.
We have the consistency of containers, along with the
performance benefits of tuned EC2 instances.
NVME Scylla performance with container operational overheads.
18. In Practice (Node Failure)
A node dies.
▪ An Alarm goes off.
▪ We launch a new node, using an AMI (~5 minutes for node
launch ~5 minutes to add node to cluster and start streaming
data to node)
This is manual so we can investigate why a node has failed
(and if our current DR plans are sufficient afterwards). This can be
automated.
19. In Practice (Scale Out/Scale in)
There’s always performance tuning.
▪ Add a Larger/Smaller node . (Human time, 5 minutes)
▪ Time to migrate data to a node:~2 hours
▪ Remove old Node.
We keep this manual because we want to be in the loop when our
AWS spend changes(for now), but this can be automated.
20. In Practice (Disaster Recovery)
Worst case (We lose all of our nodes, or the AWS region)
▪ Launch three nodes using the AMI.
▪ Restore our schema, start using sstableloader to load prior
history.
▪ Time for restore of platform functionality: ~10 minutes from
discovery of issue.
21. Thank You
Any Questions ?
Interested in Joining us?
Careers.Meshify.com
Please stay in touch
Sam@meshify.com
@skenkel_atx