SlideShare una empresa de Scribd logo
1 de 138
Media Processing in the
        Cloud




                                  Deepak Singh
          P r i n c i p a l   P r o d u c t   M a n a g e r
media today
lots of content
professional content




Image: chapter.one
Image: Tonz
everything in
  between
streaming
devices
Image: pennstatelive
Image: NASA
higher resolutions
3D
so what?
lots and lots and lots
and lots and lots and
lots and lots and lots
   and lots of data
lots of compute
how can the
cloud help?
let us orchestrate
   a processing
    application
I	
  want	
  to	
  process	
  data,
for	
  example	
  encode	
  movies,	
  …




   Job Queue
                           EC2
                         Instance




               Data Processing
…	
  and	
  store	
  the	
  results	
  in	
  S3.




    Job Queue
                             EC2
                           Instance



                                          S3
                 Data Processing
And	
  I	
  want	
  to	
  be	
  no@fied	
  on
comple@on




   Job Queue
                            EC2
                          Instance
                                               Result
                                               e-mail
                                        S3
                Data Processing
I	
  like	
  using	
  SNS,	
  because	
  …




   Job Queue
                            EC2
                          Instance
                                       Topic   Result
                                               e-mail
                                        S3
                Data Processing
I	
  like	
  using	
  SNS,	
  because	
  …
I	
  can	
  integrate	
  other	
  systems	
  via
SQS,	
  or	
  HTTP(S)	
  web-­‐hooks




   Job Queue
                           EC2
                         Instance
                                    Topic    Result
                                             e-mail
                                     S3
               Data Processing               SQS
                                             HTTP
I	
  should	
  be	
  done	
  ...	
  right?




    Job Queue
                            EC2
                          Instance
                                        Topic     Result
                                                  e-mail
                                             S3
                Data Processing
Not	
  quite	
  –	
  I’m	
  also	
  cost	
  conscious	
  …
Don’t	
  want	
  to	
  pay	
  anything	
  when	
  
there’s	
  no	
  work




                       Autoscaling

   Job Queue
                       Charge
    empty
                         $0	
         Topic      Result
                                                 e-mail
                                        S3
                Data Processing
Need	
  a	
  controller	
  …



       Capacity Control (on-demand launch)




                         Autoscaling

   Job Queue
                        Charge
                          $0	
               Topic   Result
                                                     e-mail
                                              S3
                 Data Processing
Measure	
  …	
  Detect	
  work	
  …



       Capacity Control (on-demand launch)

   CW Alarm




        CW Metric
        #msgs



                         Autoscaling

   Job Queue
                        Charge
                          $0	
               Topic   Result
                                                     e-mail
                                              S3
                 Data Processing
When	
  there’s	
  a	
  message	
  queued	
  …



       Capacity Control (on-demand launch)

   CW Alarm
                                   Autoscaling
                                   Policy
                                   add instance
        CW Metric                  if #msg > 0
        #msgs



                         Autoscaling

   Job Queue
                        Charge
                          $0	
                    Topic   Result
                                                          e-mail
                                                   S3
                 Data Processing
When	
  there’s	
  a	
  message	
  queued	
  …
start	
  an	
  EC2	
  instance

       Capacity Control (on-demand launch)

   CW Alarm
                                   Autoscaling
                                   Policy
                                   add instance
        CW Metric                  if #msg > 0
        #msgs



                         Autoscaling

   Job Queue
                              EC2
                            Instance
                                                  Topic   Result
                                                          e-mail
                                                   S3
                 Data Processing
...	
  or	
  5000	
  EC2	
  instances



       Capacity Control (on-demand launch)

   CW Alarm
                                   Autoscaling
                                   Policy
                                   keep adding instances
        CW Metric                  if #msg > 0
        #msgs



                         Autoscaling

   Job Queue
                             5000
                              EC2
                           Instances             Topic     Result
                                                           e-mail
                                                  S3
                 Data Processing
When	
  the	
  work	
  is	
  done	
  …



       Capacity Control (on-demand launch)

   CW Alarm
                                   Autoscaling
                                   Policy
                                   add instance
        CW Metric                  if #msg > 0
        #msgs



                         Autoscaling

   Job Queue
                              EC2
                            Instance
                                                  Topic   Result
                                                          e-mail
                                                   S3
                 Data Processing
When	
  the	
  work	
  is	
  done	
  …
Terminate	
  the	
  EC2	
  instance

       Capacity Control (on-demand launch)

   CW Alarm
                                   Autoscaling
                                   Policy
                                   add instance
        CW Metric                  if #msg > 0
        #msgs



                         Autoscaling

   Job Queue

                            TERM’D
                                                  Topic   Result
                                                          e-mail
                                                   S3
                 Data Processing
For	
  full	
  produc@on	
  use,	
  add
IAM	
  Users,	
  Permissions,	
  Policies,	
  
OperaFonal	
  Alarm,	
  etc.
       Capacity Control (on-demand launch)

   CW Alarm
                                            Autoscaling
                                            Policy
                                            add instance
          CW Metric                         if #msg > 0
          #msgs



                              Autoscaling

   Job Queue
                                        EC2
                                      Instance
                                                            Topic   Result
                                                                    e-mail
                                                               S3


     IAM	
  User
       IAM	
  User                                   IAM	
  
                                                       IAM	
  
                                                 Permissions
                                                         IAM	
  
                                                  Permissions
                          IAM	
                     Permissions
                            IAM	
  
                         Policy
                           Policy


                     Data Processing
On	
  Demand	
  
           SQS	
  Worker	
  
           Example
                  Capacity Control (on-demand launch)

              CW Alarm
                                              Autoscaling
                                              Policy
                                              add instance
                   CW Metric                  if #msg > 0
                   #msgs



                                    Autoscaling

              Job Queue
                                         EC2
                                       Instance
                                                             Topic   Result
                                                                     e-mail
                                                              S3
                            Data Processing




No	
  glue-­‐scrip.ng	
  (or	
  UI	
  code)	
  required	
  …
We	
  let	
  you	
  focus	
  on	
  the	
  business	
  problem!
Nested	
  templates:	
  enable	
  custom	
  abstrac@ons



          Job Queue
                                      S3




          Job Queue
                                      S3




          Job Queue
                                      S3




          Job Queue
                                      S3
One	
  substack	
  per	
  e.g.	
  Video	
  format	
  to	
  encode

           Parent	
  template/stack
                                      Parameterized	
  by	
  e.g.	
  format/resolu1on
                                 Job Queue
                                                        Encode for
                                                        low-res, …                  S3   S3




                                 Job Queue
                                                        Encode for
Job:                                                    high-res, …                 S3   S3
Render X   Topic



                                 Job Queue
                                                                                    S3   S3




                                 Job Queue
                                                                                    S3   S3
4
1. Infrastructure
ec2-run-instances
on demand
  global
 secure
programmable
elastic
instance types
t1.micro




  standard (m1)
high memory (m2)
  high CPU (c1)
high performance
cluster computing
MPI
bandwidth intensive
Cluster Compute
    Instance
2*Intel Xeon 5570
   23 GB RAM
   1.7 TB disk
      HVM
10 gig E
Placement
  Group
ull- on
 f i
   ect
bis




            Placement
              group
linpack
Cores      7040
R   max
           41.82
R   peak
           82.51
231
451
WIEN2K Parallel
                                                                    Performance

                                                                          H size 56,000 (25GB)
                                                                     Runtime (16x8 processors)
                                                                        Local (Infiniband) 3h:48
                                                                   Cloud (10Gbps) 1h:30 ($40)




                    1200 atom unit cell; SCALAPACK+MPI
                    diagonalization, matrix size 50k-100k

Credit: K. Jorissen, F. D. Villa, and J. J. Rehr (U. Washington)
HPC is evolving
2*Intel Xeon 5570
   22 GB RAM
   1.7 TB disk
      HVM
2*NVidia M2050
optimizing costs
on-demand
reserved
spot
2. Orchestration
AWS CloudFormation
bootstrap
Cloud Init
chef/puppet
familiar tools
Oracle Grid Engine
LSF
Condor
combining worlds
MIT Starcluster
$ starcluster start mycluster
$ starcluster listclusters
http://www.bioteam.net/2011/03/dude-you-got-some-chef-in-my-starcluster/
30,472 cores
$1279/hr
Big Data
Amazon
Elastic MapReduce
S3

Input data
S3

        Input data




Code     Elastic
       MapReduce
S3

        Input data




Code     Elastic     Name
       MapReduce     node
S3

        Input data




Code     Elastic     Name
       MapReduce     node




                            Elastic
                            cluster
S3

        Input data




Code     Elastic     Name
       MapReduce     node


                                      HDFS


                            Elastic
                            cluster
S3

        Input data




Code     Elastic              Name
       MapReduce              node

                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
S3

        Input data




Code     Elastic              Name                            Output
       MapReduce              node                          S3 + SimpleDB


                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
3. Applications
the layer of
innovation
NASA JPL
Netflix needed to transcode
                                   17,000 titles (80TB of data) to
                                   support the launch of Sony PS3.
                                   They provisioned 1200 Amazon
                                   EC2 instances and completed
                                   the transcoding process in just
                                   days.




Source: Adrian Cockroft (Netflix)
Source: Adrian Cockroft (Netflix)
http://vimeo.com/judpratt
“Our tests have shown more than 90
percent scaling efficiency on
clusters of up to 128 GPUs each”
4. People
constraints
everywhere
CPU, storage,
   Hardware
                       memory

                  Collections, datasets,
Data management
                       provenance

                     parallelization,
   Software
                      optimization

                  Backup, redundant,
  Availability
                      replicated


     Cost                 Small
where should we
  optimize?
Image: Pieter Musterd
removing constraints
undifferentiated
  heavy lifting
focus on value
faster
more
rendering
compositing
transcoding
creating art
Image: Chris Dagdigian
4
1. Infrastructure
2. Orchestration
3. Applications
4. People
creating
content
deesingh@amazon.com
                                                            Twitter:@mndoci
                                               http://slideshare.net/mndoci
                                                   http://mndoci.github.com




         Inspiration and ideas from
          Matt Wood & Larry Lessig


Credit” Oberazzi under a CC-BY-NC-SA license

Más contenido relacionado

Más de Deepak Singh

Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop Keynote
Deepak Singh
 
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
Deepak Singh
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale science
Deepak Singh
 
Talk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshopTalk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshop
Deepak Singh
 

Más de Deepak Singh (20)

High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
#arseniclife
#arseniclife#arseniclife
#arseniclife
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Systems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop KeynoteSystems Bioinformatics Workshop Keynote
Systems Bioinformatics Workshop Keynote
 
Talk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's MeetingTalk at NCRR P41 Director's Meeting
Talk at NCRR P41 Director's Meeting
 
Platforms for data science
Platforms for data sciencePlatforms for data science
Platforms for data science
 
Discovery 2015 Workshop
Discovery 2015 WorkshopDiscovery 2015 Workshop
Discovery 2015 Workshop
 
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talkBio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
 
Talk at Microsoft Cloud Futures 2010
Talk at Microsoft Cloud Futures 2010Talk at Microsoft Cloud Futures 2010
Talk at Microsoft Cloud Futures 2010
 
NHGRI Cloud Computing talk
NHGRI Cloud Computing talkNHGRI Cloud Computing talk
NHGRI Cloud Computing talk
 
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
 
Masterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale scienceMasterworks talk on Big Data and the implications of petascale science
Masterworks talk on Big Data and the implications of petascale science
 
Talk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshopTalk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshop
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for Bioinformatics
 
Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)Big Data & the networked future of Science (at Ignite Seattle 7)
Big Data & the networked future of Science (at Ignite Seattle 7)
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science Connected
 
Bioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frameBioscreencast: Capturing the life sciences frame by frame
Bioscreencast: Capturing the life sciences frame by frame
 
Searching Science
Searching ScienceSearching Science
Searching Science
 
Nanotechnology and medicine
Nanotechnology and medicineNanotechnology and medicine
Nanotechnology and medicine
 
An Open Scientific Future
An Open Scientific FutureAn Open Scientific Future
An Open Scientific Future
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Media Processing in the Cloud