SlideShare a Scribd company logo
1 of 51
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Randall Hunt, Technical Evangelist AWS
Ashley Sole, Software Engineer, Skyscanner
Deep Dive: AWS X-Ray
London 2017
Who am I and why am I here?
• Software Engineer at AWS, previously at SpaceX and
NASA (and MongoDB where I learned to code)
• Follow me on twitter @jrhunt for demo and
serverless hot takes
• Email me: randhunt@amazon.com
• Thanks to all the people at AWS
who helped build these slides!
Agenda
• Overview
• Concepts
• API
• Use cases
• Getting started
• Q & A
Challenges
Deploying and managing service-oriented applications is more work
compared to monolithic applications
Services such as AWS Lambda, Amazon EC2 Container Service, AWS
Elastic Beanstalk, AWS CloudFormation, etc. make it easier to deploy
and manage applications consisting of hundreds of services
Still hard to debug application issues in production applications due to:
• Cross-service interactions
• Varying log formats across services
• Collecting, aggregating, and collating logs from services
How does X-Ray help?
Solution
AWS X-Ray makes it easy to:
• Identify performance bottlenecks and errors
• Pinpoint issues to specific service(s) in your application
• Identify impact of issues on users of the application
• Visualize the service call graph of your application
X-Ray service
X-Ray SDK
Available for Java, .NET, and Node.js
Adds filters to automatically captures metadata for calls to:
• AWS services using the AWS SDK
• Non-AWS services over HTTP and HTTPS
• Databases (MySQL, PostgreSQL, and Amazon DynamoDB)
• Queues (Amazon SQS)
Enables you to get started quickly without having to manually
instrument your application code to log metadata about requests
X-Ray daemon
Receives data from the SDK over UDP and acts as a local buffer. Data
is flushed to the backend every second or when the local buffer fills.
Available for Amazon Linux AMI, RHEL, Ubuntu, OS X, and Windows
Concepts
X-Ray Concepts
Trace End-to-end data related a single request across services
Segments Portions of the trace that correspond to a single service
Sub-segments Remote call or local compute sections within a service
Annotations Business data that can be used to filter traces
Metadata Business data that can be added to the trace but not used
for filtering traces
Errors Normalized error message and stack trace
Sampling configuration
{
"rules": {
"move": {
"id": 1,
"service_name": "*",
"http_method": "*",
"url_path": "/api/move/*",
"fixed_target": 0,
"rate": 0.05
},
"base": {
"id": 2,
"service_name": "*",
"http_method": "*",
"url_path": "*",
"fixed_target": 1,
"rate": 0.1
}
}
}
This example defines two rules.
The first rule applies a five-percent sampling rate
with no minimum number of requests to trace to
requests with paths under /api/move
The second overrides the default sampling rule
with a rule that traces the first request each
second and 10 percent of additional requests.
APIs
X-Ray API
X-Ray provides a set of APIs to enable you to send, filter, and retrieve
trace data
You can send trace data directly to the service without having to use
our SDKs (i.e. you can write your own SDKs for languages not
currently supported)
Raw trace data is available using batch get APIs
You can build your own data analysis applications on top of the data
collected by X-Ray
X-Ray API
PutTraceSegments Uploads segment documents to AWS X-Ray
BatchGetTraces Retrieves a list of traces specified by ID
GetServiceGraph Retrieves a document that describes services in your
application and their connections
GetTraceSummaries Retrieves IDs and metadata for traces available for a
specified time frame using an optional filter
Segment document
Minimal example
{
"name" : "example.com",
"id" : "70de5b6f19ff9a0a",
"start_time" : 1.478293361271E9,
"trace_id" : "1-581cf771-a006649127e371903a2de979",
"end_time" : 1.478293361449E9
}
Example showing an in-progress segment
{
"name" : "example.com",
"id" : "70de5b6f19ff9a0b",
"start_time" : 1.478293361271E9,
"trace_id" : "1-581cf771-a006649127e371903a2de979",
“in_progress”: true
}
Traced Request
X-Amzn-Trace-Id: Root=1-5759e988-bd862e3fe1be46a994272793;Sampled=1
Version number
Hex of epoch timestamp
Optional: Parent segment ID
96 bit globally unique id for
trace (24 hex digits)
Ashley Sole
Software Engineer - Skyscanner
Software Engineer and technical leader with over
10 years experience.
Worked in various industries in business from
Defence to Gaming.
I believe in a blameless culture and that the most
important thing to delivering good quality products
is a strong happy team.
X-Ray @ Skyscanner
Skyscanner
We have a goal to ​be the most used and the most trusted online travel
brand in the world.
Skyscanner
Approaching 100 million monthly unique visitors
Getting towards 1,000 microservices in AWS
1000’s of deploys every day in multiple AWS regions
This complexity means we need a way to understand our system
100% Availability
Consumers everywhere expect
Skyscanner to be always
available and reliable
In order to increase the
availability of our services in
production we need to be able to
monitor the communication
happening between them to
anticipate any failures and
diagnose the root cause of them
if/when they happen.
Tracing the path of Requests
Request tracing gives us the ability to track
a single request end-to-end: from a user
searching in our website to returning a list
of flights and everything in between.
What have we built?
We’ve orchestrated http clients to add X-Ray tracing to outgoing
requests
JerseyClientRegistry
to provide Clients that are
orchestrated with tracing.
Wrappers for request and
request-promise-native
What have we built?
We promote
functions for
service
developers to
easily wrap
particular
sections of
code in tracing
What have we built?
We’ve added new
annotations to simply add
tracing to specific functions
What have we built?
Deep linking into X-Ray from log messages
How Does X-Ray Help Us?
Use Case 1 – Understanding your dependencies
When your system is simple, it’s easy to understand your dependencies
Service
Service
Service Service
Use Case 1 – Understanding your dependencies
But microservice architecture has
made understanding your
dependencies very hard!
What happens when a service 7 hops
away from your own goes down?
Service
Service
Service Service
Service
Service
Service
Service
Service
Service
Service
Service
Service
Service
Service
Service
Use Case 1 – Understanding your dependencies
Tracing helps to visualize the reality
of your production system
Going from what you think your
system looks like…
Use Case 1 – Understanding your dependencies
…to understanding what
your system actually looks
like!
Who knew your service was
connected to that legacy
service that was about to be
decommissioned
Use Case 1 – Understanding your dependencies
X-Ray helps us understand the complexity of our distributed system
It allows us to visualise the system as a whole and dive in and out of what we
need
Use Case 2 – Troubleshooting slow performance
You get an alert that something has decreased in performance
How would you typically debug this on a live complex distributed system?
Look at the logs?
Let’s try using X-Ray
The following scenario is a genuine bug, found using X-Ray
Use Case 2 – Troubleshooting slow performance
Look into a trace of the service that is performing slow
What does this tell us?
Use Case 2 – Troubleshooting slow performance
Calls to umsGenerateUrl are taking 41.0ms – slower than we would like!
Use Case 2 – Troubleshooting slow performance
What’s actually being called?
oc-registry is making a call to umsGenerateUrl
umsGenerateUrl calls sharedCacheFetch
sharedCacheFetch makes two calls…
Use Case 2 – Troubleshooting slow performance
The first of these two calls is a call to get a key from the cache
X-Ray tells us this takes 2.0ms and gives us the request details
Use Case 2 – Troubleshooting slow performance
The second call is a remote call to retrieve the requested value from an API
X-Ray tells us this takes 37.0 ms and returns with a 200 response
Use Case 2 – Troubleshooting slow performance
What’s actually happening here?
We’ve got a cache miss, so are making a http GET request
Use Case 2 – Troubleshooting slow performance
We seem to be getting an awful lot of cache misses…
Going back to our original trace – can we spot a problem?
Use Case 2 – Troubleshooting slow performance
We’re doing a cache “get”, but no cache “set”!
Use Case 2 – Troubleshooting slow performance
Fix the bug! Deploy! Check X-Ray again
Use Case 2 – Troubleshooting slow performance
We can see that we’ve added a call to do an asynchronous “set” to add our
value to the cache in the event of a cache miss.
Use Case 2 – Troubleshooting slow performance
So that subsequent calls get many more cache hits!
And our umsGenerateUrl function now returns in 4.0ms instead of 41ms!
Summary
X-Ray gives us deep insight into our services to investigate when things
go wrong
Analysis of the traces allows us to understand detailed information about
how services are interacting with each other
We can track down bugs by more ways than trawling the logs
Briefly back to Randall
X-Ray Pricing
Free during the preview. After that:
Free tier
• The first 100,000 traces recorded per month are free
• The first 1,000,000 traces retrieved or scanned per month are free
Additional charges
• Beyond the free tier, traces recorded cost $5.00 per million per month
• Beyond the free tier, traces retrieved or scanned cost $0.50 per million per
month
Currently AWS Supported SDKs and Frameworks
• Java
• Tomcat, Spring Boot, Servlet filters
• Node.js (express filters)
• C# (.NET, .NET core)
Currently (Natively) Supported Services
• ELB
• Lambda
• API Gateway
• EC2
• Elastic Beanstalk
Thank You!
@jrhunt

More Related Content

What's hot

Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017
Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017
Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017Amazon Web Services
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraAmazon Web Services
 
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsDAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsAmazon Web Services
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Amazon Web Services
 
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Amazon Web Services
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...Amazon Web Services
 
AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAmazon Web Services
 
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWSGPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWSAmazon Web Services
 
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...Amazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSUnlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSAmazon Web Services
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
 
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...Amazon Web Services
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSAmazon Web Services
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiAmazon Web Services
 
Migrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWSMigrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWSAmazon Web Services
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Amazon Web Services
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSAmazon Web Services
 
MCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon RekognitionMCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon RekognitionAmazon Web Services
 

What's hot (20)

Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017
Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017
Tooling Up for Efficiency: DIY Solutions @ Netflix - ABD319 - re:Invent 2017
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise WorkloadsDAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
DAT332_How Verizon is Adopting Amazon Aurora PostgreSQL for Enterprise Workloads
 
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
Deep Learning Using Caffe2 on AWS - MCL313 - re:Invent 2017
 
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
Big Data, Analytics and Machine Learning on AWS Lambda - SRV402 - re:Invent 2017
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
AWS Database and Analytics State of the Union
AWS Database and Analytics State of the UnionAWS Database and Analytics State of the Union
AWS Database and Analytics State of the Union
 
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWSGPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
GPSWKS407-Strategies for Migrating Microsoft SQL Databases to AWS
 
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
Introducing Amazon EC2 P3 Instance - Featuring the Most Powerful GPU for Mach...
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWSUnlocking New Todays - Artificial Intelligence and Data Platforms on AWS
Unlocking New Todays - Artificial Intelligence and Data Platforms on AWS
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
 
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
클라우드 기반 데이터 분석 및 인공 지능을 위한 비지니스 혁신 - 윤석찬 (AWS 테크에반젤리스트)
 
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
 
Optimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWSOptimising Cost and Efficiency on AWS
Optimising Cost and Efficiency on AWS
 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry Pi
 
Migrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWSMigrating Microsoft Workloads to AWS
Migrating Microsoft Workloads to AWS
 
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
Building Serverless Websites with Lambda@Edge - CTD309 - re:Invent 2017
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWS
 
MCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon RekognitionMCL314_Unlocking Media Workflows Using Amazon Rekognition
MCL314_Unlocking Media Workflows Using Amazon Rekognition
 

Similar to Deep Dive: AWS X-Ray London Summit 2017

Operations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from HappeningOperations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from HappeningAmazon Web Services
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeAdrian Cockcroft
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsDynatrace
 
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...Amazon Web Services
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production ReadinessAmazon Web Services
 
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningStart Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningAmazon Web Services
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
Digital Transformation | AWS Webinar
Digital Transformation | AWS WebinarDigital Transformation | AWS Webinar
Digital Transformation | AWS WebinarAmazon Web Services
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWSChristian Beedgen
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...Amazon Web Services
 
PeopleSoft and The Cloud
PeopleSoft and The CloudPeopleSoft and The Cloud
PeopleSoft and The CloudDuncan Davies
 
Cloud Native & Service Mesh
Cloud Native & Service MeshCloud Native & Service Mesh
Cloud Native & Service MeshRoi Ezra
 
SHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditDSHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditDThreat Stack
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...Amazon Web Services
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...Amazon Web Services
 
Evolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceEvolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceAdrian Cockcroft
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...stevemcpherson
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum JapanBrian Brazil
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11Amazon Web Services
 

Similar to Deep Dive: AWS X-Ray London Summit 2017 (20)

Operations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from HappeningOperations: Production Readiness Review – How to stop bad things from Happening
Operations: Production Readiness Review – How to stop bad things from Happening
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 
Starting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for OpsStarting Your DevOps Journey – Practical Tips for Ops
Starting Your DevOps Journey – Practical Tips for Ops
 
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
Keys to Successfully Monitoring and Optimizing Innovative and Sophisticated C...
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
Operations: Production Readiness
Operations: Production ReadinessOperations: Production Readiness
Operations: Production Readiness
 
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From HappeningStart Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
Start Up Austin 2017: Production Preview - How to Stop Bad Things From Happening
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
Digital Transformation | AWS Webinar
Digital Transformation | AWS WebinarDigital Transformation | AWS Webinar
Digital Transformation | AWS Webinar
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
PeopleSoft and The Cloud
PeopleSoft and The CloudPeopleSoft and The Cloud
PeopleSoft and The Cloud
 
Cloud Native & Service Mesh
Cloud Native & Service MeshCloud Native & Service Mesh
Cloud Native & Service Mesh
 
SHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditDSHOWDOWN: Threat Stack vs. Red Hat AuditD
SHOWDOWN: Threat Stack vs. Red Hat AuditD
 
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
AWS Public Sector Symposium 2014 Canberra | Putting the "Crowd" to work in th...
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
Evolution of Microservices - Craft Conference
Evolution of Microservices - Craft ConferenceEvolution of Microservices - Craft Conference
Evolution of Microservices - Craft Conference
 
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
Anurag Gupta's talk on DevOps at AWS. Nov 17 at the Palo Alto AWS Big Data Me...
 
Prometheus - Open Source Forum Japan
Prometheus  - Open Source Forum JapanPrometheus  - Open Source Forum Japan
Prometheus - Open Source Forum Japan
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11
 

More from Randall Hunt

TIAD - Is Automation Worth My Time?
TIAD - Is Automation Worth My Time?TIAD - Is Automation Worth My Time?
TIAD - Is Automation Worth My Time?Randall Hunt
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioRandall Hunt
 
MongoDB at LAHacks :)
MongoDB at LAHacks :)MongoDB at LAHacks :)
MongoDB at LAHacks :)Randall Hunt
 
Schema Design in MongoDB - TriMug Meetup North Carolina
Schema Design in MongoDB - TriMug Meetup North CarolinaSchema Design in MongoDB - TriMug Meetup North Carolina
Schema Design in MongoDB - TriMug Meetup North CarolinaRandall Hunt
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
Sharding in MongoDB Days 2013
Sharding in MongoDB Days 2013Sharding in MongoDB Days 2013
Sharding in MongoDB Days 2013Randall Hunt
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica setsRandall Hunt
 

More from Randall Hunt (8)

TIAD - Is Automation Worth My Time?
TIAD - Is Automation Worth My Time?TIAD - Is Automation Worth My Time?
TIAD - Is Automation Worth My Time?
 
Git
GitGit
Git
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
MongoDB at LAHacks :)
MongoDB at LAHacks :)MongoDB at LAHacks :)
MongoDB at LAHacks :)
 
Schema Design in MongoDB - TriMug Meetup North Carolina
Schema Design in MongoDB - TriMug Meetup North CarolinaSchema Design in MongoDB - TriMug Meetup North Carolina
Schema Design in MongoDB - TriMug Meetup North Carolina
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
Sharding in MongoDB Days 2013
Sharding in MongoDB Days 2013Sharding in MongoDB Days 2013
Sharding in MongoDB Days 2013
 
Replication and replica sets
Replication and replica setsReplication and replica sets
Replication and replica sets
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Deep Dive: AWS X-Ray London Summit 2017

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Randall Hunt, Technical Evangelist AWS Ashley Sole, Software Engineer, Skyscanner Deep Dive: AWS X-Ray London 2017
  • 2. Who am I and why am I here? • Software Engineer at AWS, previously at SpaceX and NASA (and MongoDB where I learned to code) • Follow me on twitter @jrhunt for demo and serverless hot takes • Email me: randhunt@amazon.com • Thanks to all the people at AWS who helped build these slides!
  • 3. Agenda • Overview • Concepts • API • Use cases • Getting started • Q & A
  • 4. Challenges Deploying and managing service-oriented applications is more work compared to monolithic applications Services such as AWS Lambda, Amazon EC2 Container Service, AWS Elastic Beanstalk, AWS CloudFormation, etc. make it easier to deploy and manage applications consisting of hundreds of services Still hard to debug application issues in production applications due to: • Cross-service interactions • Varying log formats across services • Collecting, aggregating, and collating logs from services
  • 6. Solution AWS X-Ray makes it easy to: • Identify performance bottlenecks and errors • Pinpoint issues to specific service(s) in your application • Identify impact of issues on users of the application • Visualize the service call graph of your application
  • 8. X-Ray SDK Available for Java, .NET, and Node.js Adds filters to automatically captures metadata for calls to: • AWS services using the AWS SDK • Non-AWS services over HTTP and HTTPS • Databases (MySQL, PostgreSQL, and Amazon DynamoDB) • Queues (Amazon SQS) Enables you to get started quickly without having to manually instrument your application code to log metadata about requests
  • 9. X-Ray daemon Receives data from the SDK over UDP and acts as a local buffer. Data is flushed to the backend every second or when the local buffer fills. Available for Amazon Linux AMI, RHEL, Ubuntu, OS X, and Windows
  • 11. X-Ray Concepts Trace End-to-end data related a single request across services Segments Portions of the trace that correspond to a single service Sub-segments Remote call or local compute sections within a service Annotations Business data that can be used to filter traces Metadata Business data that can be added to the trace but not used for filtering traces Errors Normalized error message and stack trace
  • 12. Sampling configuration { "rules": { "move": { "id": 1, "service_name": "*", "http_method": "*", "url_path": "/api/move/*", "fixed_target": 0, "rate": 0.05 }, "base": { "id": 2, "service_name": "*", "http_method": "*", "url_path": "*", "fixed_target": 1, "rate": 0.1 } } } This example defines two rules. The first rule applies a five-percent sampling rate with no minimum number of requests to trace to requests with paths under /api/move The second overrides the default sampling rule with a rule that traces the first request each second and 10 percent of additional requests.
  • 13. APIs
  • 14. X-Ray API X-Ray provides a set of APIs to enable you to send, filter, and retrieve trace data You can send trace data directly to the service without having to use our SDKs (i.e. you can write your own SDKs for languages not currently supported) Raw trace data is available using batch get APIs You can build your own data analysis applications on top of the data collected by X-Ray
  • 15. X-Ray API PutTraceSegments Uploads segment documents to AWS X-Ray BatchGetTraces Retrieves a list of traces specified by ID GetServiceGraph Retrieves a document that describes services in your application and their connections GetTraceSummaries Retrieves IDs and metadata for traces available for a specified time frame using an optional filter
  • 16. Segment document Minimal example { "name" : "example.com", "id" : "70de5b6f19ff9a0a", "start_time" : 1.478293361271E9, "trace_id" : "1-581cf771-a006649127e371903a2de979", "end_time" : 1.478293361449E9 } Example showing an in-progress segment { "name" : "example.com", "id" : "70de5b6f19ff9a0b", "start_time" : 1.478293361271E9, "trace_id" : "1-581cf771-a006649127e371903a2de979", “in_progress”: true }
  • 17. Traced Request X-Amzn-Trace-Id: Root=1-5759e988-bd862e3fe1be46a994272793;Sampled=1 Version number Hex of epoch timestamp Optional: Parent segment ID 96 bit globally unique id for trace (24 hex digits)
  • 18. Ashley Sole Software Engineer - Skyscanner Software Engineer and technical leader with over 10 years experience. Worked in various industries in business from Defence to Gaming. I believe in a blameless culture and that the most important thing to delivering good quality products is a strong happy team.
  • 20. Skyscanner We have a goal to ​be the most used and the most trusted online travel brand in the world.
  • 21. Skyscanner Approaching 100 million monthly unique visitors Getting towards 1,000 microservices in AWS 1000’s of deploys every day in multiple AWS regions This complexity means we need a way to understand our system
  • 22. 100% Availability Consumers everywhere expect Skyscanner to be always available and reliable In order to increase the availability of our services in production we need to be able to monitor the communication happening between them to anticipate any failures and diagnose the root cause of them if/when they happen.
  • 23. Tracing the path of Requests Request tracing gives us the ability to track a single request end-to-end: from a user searching in our website to returning a list of flights and everything in between.
  • 24. What have we built? We’ve orchestrated http clients to add X-Ray tracing to outgoing requests JerseyClientRegistry to provide Clients that are orchestrated with tracing. Wrappers for request and request-promise-native
  • 25. What have we built? We promote functions for service developers to easily wrap particular sections of code in tracing
  • 26. What have we built? We’ve added new annotations to simply add tracing to specific functions
  • 27. What have we built? Deep linking into X-Ray from log messages
  • 28. How Does X-Ray Help Us?
  • 29. Use Case 1 – Understanding your dependencies When your system is simple, it’s easy to understand your dependencies Service Service Service Service
  • 30. Use Case 1 – Understanding your dependencies But microservice architecture has made understanding your dependencies very hard! What happens when a service 7 hops away from your own goes down? Service Service Service Service Service Service Service Service Service Service Service Service Service Service Service Service
  • 31. Use Case 1 – Understanding your dependencies Tracing helps to visualize the reality of your production system Going from what you think your system looks like…
  • 32. Use Case 1 – Understanding your dependencies …to understanding what your system actually looks like! Who knew your service was connected to that legacy service that was about to be decommissioned
  • 33. Use Case 1 – Understanding your dependencies X-Ray helps us understand the complexity of our distributed system It allows us to visualise the system as a whole and dive in and out of what we need
  • 34. Use Case 2 – Troubleshooting slow performance You get an alert that something has decreased in performance How would you typically debug this on a live complex distributed system? Look at the logs? Let’s try using X-Ray The following scenario is a genuine bug, found using X-Ray
  • 35. Use Case 2 – Troubleshooting slow performance Look into a trace of the service that is performing slow What does this tell us?
  • 36. Use Case 2 – Troubleshooting slow performance Calls to umsGenerateUrl are taking 41.0ms – slower than we would like!
  • 37. Use Case 2 – Troubleshooting slow performance What’s actually being called? oc-registry is making a call to umsGenerateUrl umsGenerateUrl calls sharedCacheFetch sharedCacheFetch makes two calls…
  • 38. Use Case 2 – Troubleshooting slow performance The first of these two calls is a call to get a key from the cache X-Ray tells us this takes 2.0ms and gives us the request details
  • 39. Use Case 2 – Troubleshooting slow performance The second call is a remote call to retrieve the requested value from an API X-Ray tells us this takes 37.0 ms and returns with a 200 response
  • 40. Use Case 2 – Troubleshooting slow performance What’s actually happening here? We’ve got a cache miss, so are making a http GET request
  • 41. Use Case 2 – Troubleshooting slow performance We seem to be getting an awful lot of cache misses… Going back to our original trace – can we spot a problem?
  • 42. Use Case 2 – Troubleshooting slow performance We’re doing a cache “get”, but no cache “set”!
  • 43. Use Case 2 – Troubleshooting slow performance Fix the bug! Deploy! Check X-Ray again
  • 44. Use Case 2 – Troubleshooting slow performance We can see that we’ve added a call to do an asynchronous “set” to add our value to the cache in the event of a cache miss.
  • 45. Use Case 2 – Troubleshooting slow performance So that subsequent calls get many more cache hits! And our umsGenerateUrl function now returns in 4.0ms instead of 41ms!
  • 46. Summary X-Ray gives us deep insight into our services to investigate when things go wrong Analysis of the traces allows us to understand detailed information about how services are interacting with each other We can track down bugs by more ways than trawling the logs
  • 47. Briefly back to Randall
  • 48. X-Ray Pricing Free during the preview. After that: Free tier • The first 100,000 traces recorded per month are free • The first 1,000,000 traces retrieved or scanned per month are free Additional charges • Beyond the free tier, traces recorded cost $5.00 per million per month • Beyond the free tier, traces retrieved or scanned cost $0.50 per million per month
  • 49. Currently AWS Supported SDKs and Frameworks • Java • Tomcat, Spring Boot, Servlet filters • Node.js (express filters) • C# (.NET, .NET core)
  • 50. Currently (Natively) Supported Services • ELB • Lambda • API Gateway • EC2 • Elastic Beanstalk

Editor's Notes

  1. Two main components of the x-ray system: The daemon and the SDK