SlideShare a Scribd company logo
1 of 32
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Petabyte-scale Migrations to Amazon
S3: How Photobox Moved Europe’s
Largest Photo Library to AWS
S T G 3 9 3
Chris Astall
Chief Architect
Photobox Group
Giorgio Bonfiglio
Sr. TAM
AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Photobox’s journey to AWS
History of a large photo library
Moving to Amazon Simple Storage Service (Amazon S3)
Challenges in large-scale migrations
Re-designing the storage and serving architecture
What’s next?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Group history
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Photobox’s journey to AWS
Started moving in 2012
the web platform was 100% on AWS by 2015
a lift and shift evolved in a cloud native containerized infrastructure
Photo storage wasn’t immediately migrated
no significant reasons to move
not expected to be cost effective
a migration would have been an hard technical challenge
on-premise equipment was still fairly new and supported
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Growth in photos
During 2017 uploads peaked at 150 millions a month and 1.2 millions in an hour
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Photobox’s photo library
5.7 billion individual photos / 9 Petabytes
• each of them stored in three different resolutions:
• full-size ~ 1.6 MB
• scaled photo < 100 KB
• thumbnail < 10 KB
Growing at a rate of 1.2 B/year
… expected to keep increasing
Peaking at 120k GET/s and 26k PUT/s
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Access patterns
Only 48% of photos uploaded are printed
Of those:
• 41% are printed on the day they are uploaded
• 83% within 30 days
• … and 95% within 180 days
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The on-premise architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why moving?
Cloud first strategy
Photobox had already migrated the website into AWS and improved the operating model
Cost
About to face large capital investments for hardware expansion and replacement
Cloud storage cost kept dropping consistently
Ecosystem
AWS services can be leveraged to enhance the upload and serving workflow
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why moving? (continued)
Improved security
access control in AWS is more flexible and granular than on-premise
Improved resilience and flexibility
any given photo was stored in a single datacenter
uploads were “sharded” between the two datacenters (AMS and PAR)
storage arrays took months to scale up
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to move?
Early dilemmas: network vs AWS Snowball Edge
with the existing 4 Gbps internet uplink the migration would have taken ¾ of a year
upgrading the production network was slow and expensive
There was no need to move everything
scaled photos and thumbnails can be rebuilt directly into AWS
… or on the fly
Snowball Edge proved to be a good fit
82 TB of useable storage per device
~110 of them needed to carry out the migration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Integrity checking
After the first use photos are hardly ever accessed again
impossible to spot corrupted objects
the migration strategy/workflow must immediately fix any inconsistency
Hashing comes to the rescue!
in our use case the ETag of objects always matched their MD5 hash (careful: This is not always
true)
easy to query it from Amazon S3 Inventory and compare to what has been computed at the
very beginning of the process
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The migration workflow
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges … at scale
Network congestion
the on-premise network had never been tested at 100%
traffic flows during a migration are “atypical”
Handling an huge amount of small objects
the overhead of every single operation becomes significant
both the data extraction and loading process had to be optimized with this in mind
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges … at scale (continued)
Tracking migration status
needed to follow the status of every single object
… this results in a massive database and heavy data flow
Logistics in the physical world
in EU UPS doesn’t deliver or collect on weekends
next day delivery … doesn’t always happen next day
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage tiering
Photos are heavily accessed in the first 90 days
When they are needed access time should be in ms
A mix of Amazon S3 and Amazon S3 Infrequent Access helps achieving
the performance and cost targets
Why not Amazon Glacier?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage architecture
How to choose the right file name?
we had to consider how AWS partitions data in Amazon S3 and our R/W throughput
we used an MD5 hash of the photo name to generate a random prefix
What is the right bucket design?
group by prefix
group by bucket name
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The serving architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evolution of the serving architecture
Serverless as a natural evolution
Low-management overhead
Can scale instantly
Multi-region is the next big step
Keeping the live user experience consistent in case of a single region outage
Amazon S3 One Zone-Infrequent Access empowers this model
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evolution of the user journey
Amazon S3 is only the foundation
Now that the photo assets are in AWS they can be retrieved/used right away
A world of possibilities
Amazon Rekognition can be leveraged to radically change Photobox’s customer journey
Magic Book
Using AI to curating customer photos to enhance their experience
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chris Astall
chris.astall@photobox.com
Giorgio Bonfiglio
bonfigg@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018Amazon Web Services
 
Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...
Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...
Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...Amazon Web Services
 
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018Amazon Web Services
 
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018Amazon Web Services
 
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...Amazon Web Services
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018Amazon Web Services
 
Operating at Scale - Preparing for the Journey
Operating at Scale - Preparing for the JourneyOperating at Scale - Preparing for the Journey
Operating at Scale - Preparing for the JourneyAmazon Web Services
 
VMware Cloud on AWS – Technical Deep Dive.pdf
VMware Cloud on AWS – Technical Deep Dive.pdfVMware Cloud on AWS – Technical Deep Dive.pdf
VMware Cloud on AWS – Technical Deep Dive.pdfAmazon Web Services
 
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...Amazon Web Services
 
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...Amazon Web Services
 
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018Amazon Web Services
 
Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018
Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018
Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018Amazon Web Services
 
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...Amazon Web Services
 
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018Amazon Web Services
 
Compliance and Security Mitigation Techniques
Compliance and Security Mitigation TechniquesCompliance and Security Mitigation Techniques
Compliance and Security Mitigation TechniquesAmazon Web Services
 
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Amazon Web Services
 
Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Amazon Web Services
 

What's hot (20)

Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
 
Building a Monitoring Plan.pdf
Building a Monitoring Plan.pdfBuilding a Monitoring Plan.pdf
Building a Monitoring Plan.pdf
 
Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...
Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...
Enterprise Governance and Security Build Your AWS Landing Zone (SEC315) - AWS...
 
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
 
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
 
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
 
Operating at Scale - Preparing for the Journey
Operating at Scale - Preparing for the JourneyOperating at Scale - Preparing for the Journey
Operating at Scale - Preparing for the Journey
 
Container Scheduling
Container SchedulingContainer Scheduling
Container Scheduling
 
VMware Cloud on AWS – Technical Deep Dive.pdf
VMware Cloud on AWS – Technical Deep Dive.pdfVMware Cloud on AWS – Technical Deep Dive.pdf
VMware Cloud on AWS – Technical Deep Dive.pdf
 
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
 
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
 
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
 
Design with Ops in Mind.pdf
Design with Ops in Mind.pdfDesign with Ops in Mind.pdf
Design with Ops in Mind.pdf
 
Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018
Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018
Executive Security Simulation Workshop (WPS206) - AWS re:Invent 2018
 
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
Best Practices for Centrally Monitoring Resource Configuration & Compliance (...
 
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
 
Compliance and Security Mitigation Techniques
Compliance and Security Mitigation TechniquesCompliance and Security Mitigation Techniques
Compliance and Security Mitigation Techniques
 
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
Automated Solution for Deploying AWS Landing Zone (GPSWS407) - AWS re:Invent ...
 
Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops
 

Similar to Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) - AWS reInvent 2018.pdf

Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Amazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...Amazon Web Services
 
Using Tableau and AWS for Fearless Reporting at UMD
Using Tableau and AWS for Fearless Reporting at UMDUsing Tableau and AWS for Fearless Reporting at UMD
Using Tableau and AWS for Fearless Reporting at UMDAmazon Web Services
 
Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...
Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...
Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...Amazon Web Services
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
Hybrid Cloud Customer Use Cases on AWS
Hybrid Cloud Customer Use Cases on AWSHybrid Cloud Customer Use Cases on AWS
Hybrid Cloud Customer Use Cases on AWSTom Laszewski
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...Amazon Web Services
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Amazon Web Services
 
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...Amazon Web Services
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Amazon Web Services
 
Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...
Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...
Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...Amazon Web Services
 
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...Amazon Web Services
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsTim Wagner
 
How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018
How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018
How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018Amazon Web Services
 
How a Biotech Firm Streamlined Data Protection on AWS
 How a Biotech Firm Streamlined Data Protection on AWS How a Biotech Firm Streamlined Data Protection on AWS
How a Biotech Firm Streamlined Data Protection on AWSAmazon Web Services
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Amazon Web Services
 
Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...
Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...
Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...Amazon Web Services
 

Similar to Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) - AWS reInvent 2018.pdf (20)

Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
 
Using Tableau and AWS for Fearless Reporting at UMD
Using Tableau and AWS for Fearless Reporting at UMDUsing Tableau and AWS for Fearless Reporting at UMD
Using Tableau and AWS for Fearless Reporting at UMD
 
Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...
Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...
Using AMS to get FSI Regulated Workloads on the Cloud, Fast - AWS Summit Sydn...
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Hybrid Cloud Customer Use Cases on AWS
Hybrid Cloud Customer Use Cases on AWSHybrid Cloud Customer Use Cases on AWS
Hybrid Cloud Customer Use Cases on AWS
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
Building a Hybrid Architecture: Enterprise Backup & Recovery (ENT212-S) - AWS...
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
 
Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...
Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...
Cloud-Based Media Archival, Media Asset Management, & Supply Chain Workflows ...
 
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
How UCSD Simplified Data Protection with Rubrik and AWS (STG207-S) - AWS re:I...
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless Myths
 
How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018
How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018
How to Do it Right - Your First 90 Days - AWS Summit Sydney 2018
 
How a Biotech Firm Streamlined Data Protection on AWS
 How a Biotech Firm Streamlined Data Protection on AWS How a Biotech Firm Streamlined Data Protection on AWS
How a Biotech Firm Streamlined Data Protection on AWS
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 
Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...
Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...
Inventory, Track, and Respond to AWS Asset Changes within Seconds at Scale (S...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) - AWS reInvent 2018.pdf

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Petabyte-scale Migrations to Amazon S3: How Photobox Moved Europe’s Largest Photo Library to AWS S T G 3 9 3 Chris Astall Chief Architect Photobox Group Giorgio Bonfiglio Sr. TAM AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Photobox’s journey to AWS History of a large photo library Moving to Amazon Simple Storage Service (Amazon S3) Challenges in large-scale migrations Re-designing the storage and serving architecture What’s next?
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Group history
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Photobox’s journey to AWS Started moving in 2012 the web platform was 100% on AWS by 2015 a lift and shift evolved in a cloud native containerized infrastructure Photo storage wasn’t immediately migrated no significant reasons to move not expected to be cost effective a migration would have been an hard technical challenge on-premise equipment was still fairly new and supported
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Growth in photos During 2017 uploads peaked at 150 millions a month and 1.2 millions in an hour
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Photobox’s photo library 5.7 billion individual photos / 9 Petabytes • each of them stored in three different resolutions: • full-size ~ 1.6 MB • scaled photo < 100 KB • thumbnail < 10 KB Growing at a rate of 1.2 B/year … expected to keep increasing Peaking at 120k GET/s and 26k PUT/s
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Access patterns Only 48% of photos uploaded are printed Of those: • 41% are printed on the day they are uploaded • 83% within 30 days • … and 95% within 180 days
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The on-premise architecture
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why moving? Cloud first strategy Photobox had already migrated the website into AWS and improved the operating model Cost About to face large capital investments for hardware expansion and replacement Cloud storage cost kept dropping consistently Ecosystem AWS services can be leveraged to enhance the upload and serving workflow
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why moving? (continued) Improved security access control in AWS is more flexible and granular than on-premise Improved resilience and flexibility any given photo was stored in a single datacenter uploads were “sharded” between the two datacenters (AMS and PAR) storage arrays took months to scale up
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to move? Early dilemmas: network vs AWS Snowball Edge with the existing 4 Gbps internet uplink the migration would have taken ¾ of a year upgrading the production network was slow and expensive There was no need to move everything scaled photos and thumbnails can be rebuilt directly into AWS … or on the fly Snowball Edge proved to be a good fit 82 TB of useable storage per device ~110 of them needed to carry out the migration
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Integrity checking After the first use photos are hardly ever accessed again impossible to spot corrupted objects the migration strategy/workflow must immediately fix any inconsistency Hashing comes to the rescue! in our use case the ETag of objects always matched their MD5 hash (careful: This is not always true) easy to query it from Amazon S3 Inventory and compare to what has been computed at the very beginning of the process
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The migration workflow
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges … at scale Network congestion the on-premise network had never been tested at 100% traffic flows during a migration are “atypical” Handling an huge amount of small objects the overhead of every single operation becomes significant both the data extraction and loading process had to be optimized with this in mind
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges … at scale (continued) Tracking migration status needed to follow the status of every single object … this results in a massive database and heavy data flow Logistics in the physical world in EU UPS doesn’t deliver or collect on weekends next day delivery … doesn’t always happen next day
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Storage tiering Photos are heavily accessed in the first 90 days When they are needed access time should be in ms A mix of Amazon S3 and Amazon S3 Infrequent Access helps achieving the performance and cost targets Why not Amazon Glacier?
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Storage architecture How to choose the right file name? we had to consider how AWS partitions data in Amazon S3 and our R/W throughput we used an MD5 hash of the photo name to generate a random prefix What is the right bucket design? group by prefix group by bucket name
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The serving architecture
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evolution of the serving architecture Serverless as a natural evolution Low-management overhead Can scale instantly Multi-region is the next big step Keeping the live user experience consistent in case of a single region outage Amazon S3 One Zone-Infrequent Access empowers this model
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evolution of the user journey Amazon S3 is only the foundation Now that the photo assets are in AWS they can be retrieved/used right away A world of possibilities Amazon Rekognition can be leveraged to radically change Photobox’s customer journey Magic Book Using AI to curating customer photos to enhance their experience
  • 31. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chris Astall chris.astall@photobox.com Giorgio Bonfiglio bonfigg@amazon.com
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.