In this session we will discuss the numerous ways to ingest data into AWS including options such as physical media import & direct connect. We also talk about policy-based Hierarchical Storage Management (HSM) in the cloud, total cost of ownership, the importance of storage durability, and the infinite scalability of Amazon S3. Also, the founder of photo-share sensation IMGUR, Alan Schaaf, speaks about their migration to AWS.
23. HSM
with AWS Amazon Amazon
SAN
S3 AWS Cloud
Glacier
Corporate Data Center
versus
Traditional
Approach
to HSM
offsite tape
SAN tier 2 disk
storage
storage backup
Corporate Data Center
24. Compliance
with AWS O/S image Amazon Amazon
S3 AWS Cloud Glacier
versus
Traditional
Approach
off-site tape
O/S image disk
storage
backup
Corporate Data Center
31. 5
HTTP –
2 fasp 3
multipart
4 Parallel Transcoding
fasp 5 14 instances: 3 min
Herndon, VA
1 6
1. Video broadcast capture
2. High-speed upload Direct-to-S3
3. Scale out parallel transcode
`
4. Deliver back to S3
5. High-speed download from S3 to UFC
6. Insert into CMS for streaming to mobile devices
35. What is Imgur?
• A simple image sharer
• Has the most viral images on the
Internet
• Anyone can upload as many
images as they want – without an
account
• 2,000,000 images uploaded per
day
• That’s 23 images per second
• Can be embedded and shared on
any site
36. The greatest image site. Full of the all the wonders
and magic of the interwebs. Be forewarned, time has
been known to quicken in this realm.
“I spent half a day on Imgur, and it was the greatest 6 hours
of my life.”
- Urban Dictionary
38. • Started as a side project while at
Ohio University
• Redditors needed a place to host
their images
• Organically grew into a business
• Alan was the only developer for 3
years
• Moved to San Francisco
• Now a team of 7
• (600 million pageviews per engineer)
39. • Every month. There are: • 11 minutes average visit duration
• 2.9 billion page views • 11 pages per visit
• 38 billion image views (images loaded)
• 46th biggest site in the US
• 54 million unique visitors – (according to alexa.com)
• 4.7 petabytes of bandwidth used
• 600 million objects stored in S3
• 62 million images uploaded
* All data as of Nov
2012
40. • Pageviews are growing 15% every
month.
• How are we able to support this kind of
growth?
41. User make a request for an
image
(Don’t do this!)
46. • Site traffic is increasing more than ever. How many more
servers do we need?
• Hardware failures
• Tweaking every little thing is really hard and easy to get
wrong, but necessary
• There’s only one man doing all this; how can we make his
life easier, while scaling the site at the same time?
47. • Autoscaling is awesome
• Automated DB backups are awesome
• Security features are awesome
• Much easier to manage in the long run
• Because everything’s managed, you
require less admins to look over
everything all the time
• AWS has managed solutions for all the
core services your website needs
(server, database, cache, backups,
security, etc.)
48. • Lots of new stuff to learn and
set up
• Possible downtime during
migration
• Very time consuming at first
because you’re reconfiguring
your entire stack
49. • AWS has a lot of services; find out which
ones can work for you and how
• Use the price calculator:
http://calculator.s3.amazonaws.com/calc5.html
• Read the docs: http://aws.amazon.com
• Set up a test environment
• Install the AWS SDK
• Call AWS if you have questions.
• (You don’t need AWS Support to call in)
• Start coding!
50. • How do you get all your data to S3?
• Duplicate writes: 1 to native, 1 to S3.
• Upload all your data to S3 in parallel.We had 12
background processes running around the
clock all uploading a different subset of data to
S3 – it took 2 weeks to finish.
• No need to store more than one copy
• Turn on versioning for even more protection
• Very similar process for Amazon RDS
51. • There’s no web interface
• Have to do everything from
command line
• Confusing terminology
• Hard to verify that it’s working as
intended
• But in the end, it’s amazing
• If you’re not using it, you’d better
have a really good reason why
52. • EC2:
• Maximum performance with RAID0 Elastic Block Store
• RAID0 EBS requires a pretty significant amount of maintenance overhead
• Have to come up with your own backup plan
• RDS:
• Will provide very good performance out of the box (but not maximum)
• Management console is fantastic
• Easy to upgrade instances
• High availability and read-only slaves are a click away
• Managed service, which makes it more expensive
• If you enjoy tuning every last little bit for maximum performance, then you
can consider EC2 + EBS RAID 0
• Still on the fence? Go with RDS
53. • There’s no access to the underlying file system
• Migrating requires a dump and an import of your data, which
is extremely time consuming for large databases
• No access to the logs when things break
• We were able to do it live – without taking the site down – but
with lots of headaches
56. • Wed (1:00 p.m.–1:50 p.m.) MED203: Scalable Media Processing with AWS
• Wed (2:05 p.m.–2:55 p.m.) MED202: Netflix’s Transcoding Transformation
• Wed (3:25 p.m.–4:15 p.m.) MED303: Addressing Security in Media Workflow
• Thu (10:30 a.m.–11:20 a.m.) STG205: Amazon S3: Reduce costs, save time, and better
protect your data
• Thu (11:35 a.m.–12:25 p.m.) STG203: Cloud Storage War Stories: From the front lines of
some of the biggest battles
• Thu (4:05 p.m.–4:55 p.m.) STG302: Archive in the Cloud with Amazon Glacier
http://aws.amazon.com/s3/
http://aws.amazon.com/glacier/faqs/
http://aws.amazon.com/digital-media/
• Wed (1:00 p.m.–1:50 p.m.) STG201: Understanding AWS Storage Options
57. We are sincerely eager to tweet #reinvent
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.