SlideShare una empresa de Scribd logo
1 de 124
SVC201 - Automate Your Big Data Workflows
Jinesh Varia, Technology Evangelist
@jinman

November 14, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Automating Big Data Workflows
Automating Compute

Automating Data

Worker
Activity
Decider

Data Node

Worker

Amazon SWF

AWS Data Pipeline
Amazon SWF – Your Distributed State Machine in the Cloud

Amazon SWF

Worker
Starters

Activity
Worker

AWS Management
Console

History

Activity
Worker

Decider

SWF helps you scale
your business logic
- Data/science Architect, Manager
Tim James
Vijay Ramesh - Data/science Engineer
the world's largest petition platform
At Change.org in the last year

• 120M+ signatures — 15% on victories
• 4000 declared victories
This works.
How?
60-90% signatures at Change.org
driven by email
This works.
*

This works.

* up to a point!
Manual Targeting doesn‟t
Manual Targeting doesn‟t scale

cognitively.
Manual Targeting doesn‟t scale

in personnel.
Manual Targeting doesn‟t scale

into mass
customization.
Manual Targeting doesn‟t scale

culturally or
internationally.
Manual Targeting doesn‟t scale

with data size
and load.
So what did we do?
We used big-compute machine learning
to automatically target our mass emails
across each week‟s set of campaigns.
We started from here...
And finished here...
First: Incrementally extract (and verify)
MySQL data to Amazon S3
Best Practice:
Incrementally extract with high watermarking.
(not wall-clock intervals)
Best Practice:
Verify data continuity after extract.
We used Cascading/Amazon EMR + Amazon SNS.
Transform extracted data on S3 into “Feature Matrix”
using Cascading/Hadoop on Amazon Elastic MapReduce
100-instance EMR cluster
A Feature Matrix is just a text file.

Sparse vector file line format, one line per user.

<user_id>[ <feature_id>:<feature_value>]...
Example:

123

12:0.237

18:1

101:0.578
So

how

do we do

big-compute
Machine Learning?
Enter Amazon

• Simple Workflow Service SWF
• Elastic Compute Cloud
EC2
SWF and EC2 allowed us to decouple:

•
•
•

Control (and error) flow
Task business logic
Compute resource provisioning
SWF provides a distributed application model
SWF provides a distributed application model
Decider processes make discrete workflow
decisions
Independent task lists (queues) are processed by
task list-affined worker processes
(thus coupling task types to provisioned resource types)
SWF provides a distributed application model
Allows deciders and workers to be implemented
in any language.

We used Ruby
with ML calculations done by Python, R, or C.
SWF provides a distributed application model
Rich web interface via
the AWS Management Console.
Flexible API for control and monitoring.
Resource Provisioning with EC2

Our EC2 instances each provide service
via Simple Workflow Service
for a single Feature Matrix file.
Simplifying Assumption:

Full feature matrix file fits on disk
of a m1.medium EC2 instance
(although we compute it with 100-instance EMR cluster)
Best Practice:

Treat compute resources as

hotel rooms, not mansions.
Worker EC2 Instance bootstrap from base
Amazon Machine Image (AMI)
EC2 instance tags provide highly visible, searchable
configuration.
Update local git repo to configured software version.
EC2 instance tags
Best Practice:
Log bootstrap steps to S3
mapping essential config tags to EC2 instance names and log files
Amazon SWF and EC2
allowed us to build a
common reliable scaffold
for R&D and production
Machine Learning
systems.
Provisioning in R&D for Training
• Used 100 small EC2 instances to explore the
Support Vector Machine (SVM) algorithm
to repeatedly brute-force search a 1000-combination
parameter space
• Used a 32-core on-premises box
to explore a Random Forest implementation in
multithreaded Python
Provisioning in Production
Start n m3.2xlarge EC2 instances on-demand
for each campaign in the sample group
• Train with single SWF worker using multiple cores
(python multithreaded Random Forest)
• Predict with 8 SWF workers — 1 per core, 4 cores per instance
Provisioning in Production
Best Practice:

Use Amazon SWF
to decouple and defer crucial
provisioning and
application design decisions
until you’re getting results.
Forward scale

So from here,
how can we expect this system to scale?
Forward scale
for 10x users

• Run more EMR instances
to build Feature Matrix
• Run more SWF predict workers
per campaign
Forward scale
for 10x campaigns

• already automatically start a SWF worker
group per campaign
• ―user generated campaigns‖ require no
campaigner time and are targeted
automatically
Forward scale
for 2x+ campaigners

• system eliminates mass email targeting
contention, so team can scale
Win for our Campaigners... and Users.
Our user base can now be automatically segmented
across a wide pool of campaigns, even internationally.

30%+ conversion boost over manual targeting.
Do you build systems like these?
Do you want to?

We‟d love to talk.
(And yes, we‟re hiring.)
UNSILO
Dr. Francisco Roque, Co-Founder and CTO
A collaborative search platform
that helps you see patterns
across Science & Innovation
Mission

UNSILO breaks down silos and makes it easy and fast for
you to find relevant knowledge written in domain-specific
terminologies
Unsilo
Describe

Discover

Analyze & Share
New way of
searching
Big Data Challenges
4.5 million USPTO granted patents
12 million scientific articles
Heterogeneous processing pipeline
(multiple steps, variable times)
A small test

1000 documents
20 minutes/doc average
A bigger test

100k documents
3.8 years?
A bigger test

100k documents
8x8 cores
~21 days
4.5 million patents?
12 million articles?
Focus on the goal
Amazon SWF to the rescue
•
•
•
•
•

Scaling
Concurrency
Reliability
Flexibility to experiment
Easily adaptable
SWF makes it very easy to
separate algorithmic logic
and workflow logic
Easy to get started:
First document batch
running in just 2 weeks
AWS services
Adding content
Job Loading
• Content loaded by
traversing S3 buckets
• Reprocessing by
traversing tables on
DynamoDB

DynamoDB
Decision Workers
• Crawls Workflow History
for Decision Tasks
• Schedules new Activity
Tasks

DynamoDB
Activity Workers
• Read/write to S3
• Status in DynamoDB
• SWF task inputs passed
between workflow steps
• Specialized workers

DynamoDB
Best practice
Use DynamoDB for content status
Index on different columns (local indexes)
More efficient content status queries
Give me all the items that completed step X

Elastic service!
Key to scalability
File organization on S3 for scalability
– 50 req/s naïve approach
– >1500 req/seq
logs/2013-11-14T23:01:34/...
logs/2013-11-14T23:01:23/...
logs/2013-11-14T23:01:15/..."

43:10:32T41-11-3102/logs/...
32:10:32T41-11-3102/logs/...
51:10:32T41-11-3102/logs/..."

http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html
http://goo.gl/JnaQZV
Gearing

Ratio?
Monitoring

Give me all the
workers/instances that have
not responded in the past hour
Amazon SWF components

DynamoDB
Throttling and eventual consistency

Failed?
Try Again
Development environment
Huge benefits

100k Documents
21 days

< 1 hour

4.5 Million USPTO

~30 hours
Huge benefits

Focus on our goal, faster time to market
Using Spot instances, 1/10 cost
Key SWF Takeaways
Flexibility
– Room for experimentation
Worker

Transparency

Decider

Worker

– Easy to adapt

Growing with the system
– Not constrained by the framework

Amazon SWF
UNSILO
Sign up to be invited for the Public Beta

www.unsilo.com
Automating Big Data Workflows
Automating Compute

Automating Data

Worker
Activity
Decider

Data Node

Worker

Amazon SWF

AWS Data Pipeline
AWS Data Pipeline

Data
Data Stores

Your ETL in the Cloud

Data

Compute Resources

Data Stores
S3

EMR

S3

S3

S3

EMR

Redshift

DynamoDB

EMR

EMR

DynamoDB

DynamoDB

S3

EC2

RDS

S3

Hive/Pig

Redshift

Intra-region ETL
Inter-region ETL

AWS Data Pipeline Patterns (ActivityWorkers)

Cloud-On-Prem
ETL
Fred Benenson, Data Engineer
A new way to fund creative projects:

All-or-nothing fundraising.
5.1 million people have
backed a project
51,000+
successful projects
44% of projects hit their
goal
$872 million pledged
78% of projects raise under $10,000

51 projects raised more
than $1 million
Project case study: Oculus Rift
Data @
• We have many different data sources

• Some relational data, like MySQL on Amazon RDS
• Other unstructured data like JSON stored in a

third-party service like Mixpanel
• What if we want to JOIN between them in Amazon

Redshift?
Case study: Find the users that have Page View A but
not User Action B
• Page View A is instrumented in Mixpanel, a third-party
service whose API we have access:
{ “Page View A”, { user_uid : 1231567, ... } }

• But User Action B is just the existence of a timestamp
in a MySQL row:
6975, User Action B, 1231567, 2012-08-31 21:55:46
6976, User Action B, 9123811, NULL
6977, User Action B, 2913811, NULL
Redshift to the Rescue!
SELECT
users.id,
COUNT(DISTINCT
CASE
WHEN user_actions.timestamp IS NOT NULL
THEN user_actions.id
ELSE NULL
END) as event_b_count
FROM users
INNER JOIN mixpanel_events
ON mixpanel_events.user_uid = users.uid AND mixpanel_events.event
= 'Page View A'
LEFT JOIN user_actions ON user_actions.user_id = users.id
GROUP BY users.id
How we do automate the
data flow to keep it fresh
daily?
But how do we get the data to Redshift?
This is where AWS
Data Pipeline comes in.
Pipeline 1: RDS to Redshift - Step 1
AWS

First, we run sqoop on
Elastic MapReduce to
extract MySQL tables into
CSVs.
Pipeline 1: RDS to Redshift - Step 2

Then we run another Elastic
MapReduce streaming job
to convert NULLs into
empty strings for Redshift.
Pipeline 1: RDS to Redshift - Transfer to S3

• 150 - 200 gigabytes
• New DB every day, drop old
tables
• Using AWS Data Pipeline‟s 1day „now‟ schedule
Pipeline 1: RDS to Redshift Again

Run a similar pipeline
job in parallel for our
other database.
Pipeline 2: Mixpanel to Redshift - Step 1

Spin up an EC2 instance
to download the day‟s
data from Mixpanel.
Pipeline 2: Mixpanel to Redshift - Step 2

Use Elastic MapReduce to
transform Mixpanel‟s
unstructured JSON into CSVs.
Pipeline 2: Mixpanel to Redshift - Transfer to S3

• 9-10 gb per day
• Incremental data
• 2.2+ billion events
• Backfilled a year in 7 days
AWS Data Pipeline
Best Practices
• JSON / CLI tools are crucial
• Build scripts to generate JSON
• ShellCommandActivity is powerful
• Really invest time to understand
scheduling
• Use S3 as the “transport” layer
AWS Data Pipeline Takeaways for Kickstarter

15 years ago: $1 million or more
5 years ago: Open source + staff & infrastructure
Now: ~$80 a month on AWS
“It just works”
Automating Big Data Workflows
Automating Compute

Automating Data

Worker
Activity
Decider

Data Node

Worker

Amazon SWF

AWS Data Pipeline
Automating Big Data Workflows
Automating Compute

Automating Data

Worker
Activity
Decider

Data Node

Worker

Amazon SWF

AWS Data Pipeline
Big Thank You to
Customer Speakers!
Jinesh Varia
@jinman
More Sessions on SWF and AWS Data Pipeline

SVC101 - 7 Use Cases in 7 Minutes Each : The Power of Workflows and
Automation (Next Up in this room)

BDT207 - Orchestrating Big Data Integration and Analytics Data Flows
with AWS Data Pipeline (Next Up in Sao Paulo 3406)
Please give us your feedback on this
presentation

SVC201
As a thank you, we will select prize
winners daily for completed surveys!

Más contenido relacionado

La actualidad más candente

Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Amazon Web Services
 
Announcing Lambda @ the Edge - December 2016 Monthly Webinar Series
Announcing Lambda @ the Edge - December 2016 Monthly Webinar SeriesAnnouncing Lambda @ the Edge - December 2016 Monthly Webinar Series
Announcing Lambda @ the Edge - December 2016 Monthly Webinar SeriesAmazon Web Services
 
Continuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECS Continuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECS Amazon Web Services
 
Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016
Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016
Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Building Serverless Web Applications
Building Serverless Web Applications Building Serverless Web Applications
Building Serverless Web Applications Amazon Web Services
 
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAmazon Web Services
 
SMC305 Building CI/CD Pipelines for Serverless Applications
SMC305 Building CI/CD Pipelines for Serverless ApplicationsSMC305 Building CI/CD Pipelines for Serverless Applications
SMC305 Building CI/CD Pipelines for Serverless ApplicationsAmazon Web Services
 
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)Amazon Web Services
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Amazon Web Services
 
Announcing AWS Step Functions - December 2016 Monthly Webinar Series
Announcing AWS Step Functions - December 2016 Monthly Webinar SeriesAnnouncing AWS Step Functions - December 2016 Monthly Webinar Series
Announcing AWS Step Functions - December 2016 Monthly Webinar SeriesAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Amazon Web Services
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Distributed Serverless Stack Tracing and Monitoring
Distributed Serverless Stack Tracing and MonitoringDistributed Serverless Stack Tracing and Monitoring
Distributed Serverless Stack Tracing and MonitoringAmazon Web Services
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerAmazon Web Services
 
NEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step FunctionsNEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step FunctionsAmazon Web Services
 

La actualidad más candente (20)

Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
 
Announcing Lambda @ the Edge - December 2016 Monthly Webinar Series
Announcing Lambda @ the Edge - December 2016 Monthly Webinar SeriesAnnouncing Lambda @ the Edge - December 2016 Monthly Webinar Series
Announcing Lambda @ the Edge - December 2016 Monthly Webinar Series
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Continuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECS Continuous Delivery to Amazon ECS
Continuous Delivery to Amazon ECS
 
Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016
Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016
Enterprise DevOps at Scale with AWS | AWS Public Sector Summit 2016
 
Serverless Apps with AWS Step Functions
Serverless Apps with AWS Step FunctionsServerless Apps with AWS Step Functions
Serverless Apps with AWS Step Functions
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Building Serverless Web Applications
Building Serverless Web Applications Building Serverless Web Applications
Building Serverless Web Applications
 
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh VariaAWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
AWS Architecting Cloud Apps - Best Practices and Design Patterns By Jinesh Varia
 
SMC305 Building CI/CD Pipelines for Serverless Applications
SMC305 Building CI/CD Pipelines for Serverless ApplicationsSMC305 Building CI/CD Pipelines for Serverless Applications
SMC305 Building CI/CD Pipelines for Serverless Applications
 
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 
Announcing AWS Step Functions - December 2016 Monthly Webinar Series
Announcing AWS Step Functions - December 2016 Monthly Webinar SeriesAnnouncing AWS Step Functions - December 2016 Monthly Webinar Series
Announcing AWS Step Functions - December 2016 Monthly Webinar Series
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Distributed Serverless Stack Tracing and Monitoring
Distributed Serverless Stack Tracing and MonitoringDistributed Serverless Stack Tracing and Monitoring
Distributed Serverless Stack Tracing and Monitoring
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and Docker
 
NEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step FunctionsNEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step Functions
 

Similar a Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013

AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Amazon Web Services
 
10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M Users10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M UsersAmazon Web Services
 
10 Pro Tips for scaling your startup from 0-10M users
10 Pro Tips for scaling your startup from 0-10M users10 Pro Tips for scaling your startup from 0-10M users
10 Pro Tips for scaling your startup from 0-10M usersAmazon Web Services
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAmazon Web Services
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon KinesisAmazon Web Services
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesisJampp
 
Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recapCloudHesive
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...Amazon Web Services
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAmazon Web Services
 
Cloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New InfrastructureCloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New InfrastructureAmazon Web Services
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAmazon Web Services Korea
 
"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor IvaniukFwdays
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreAmazon Web Services
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAmazon Web Services
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...Amazon Web Services
 
Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...
Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...
Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...Amazon Web Services
 
Cloud Roundtable | Amazon Web Services: Key = Iteration
Cloud Roundtable | Amazon Web Services: Key = IterationCloud Roundtable | Amazon Web Services: Key = Iteration
Cloud Roundtable | Amazon Web Services: Key = IterationCodemotion
 

Similar a Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013 (20)

AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M Users10 Pro Tips for Scaling Your Startup from 0-10M Users
10 Pro Tips for Scaling Your Startup from 0-10M Users
 
Serverless: State Of the Union
Serverless: State Of the UnionServerless: State Of the Union
Serverless: State Of the Union
 
Serverless - State Of the Union
Serverless - State Of the UnionServerless - State Of the Union
Serverless - State Of the Union
 
10 Pro Tips for scaling your startup from 0-10M users
10 Pro Tips for scaling your startup from 0-10M users10 Pro Tips for scaling your startup from 0-10M users
10 Pro Tips for scaling your startup from 0-10M users
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recap
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
 
AWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWSAWS Cloud Kata | Manila - Getting to Scale on AWS
AWS Cloud Kata | Manila - Getting to Scale on AWS
 
Cloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New InfrastructureCloud First: New Architecture for New Infrastructure
Cloud First: New Architecture for New Infrastructure
 
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku LepistoAWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
AWS Enterprise Summit - 엔터프라이즈에서의 AWS 클라우드 활용 - Markku Lepisto
 
"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk"Fast Start to Building on AWS", Igor Ivaniuk
"Fast Start to Building on AWS", Igor Ivaniuk
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
 
Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...
Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...
Dix conseils pour supporter la croissance de votre Startup de 0 à 10 millions...
 
Cloud Roundtable | Amazon Web Services: Key = Iteration
Cloud Roundtable | Amazon Web Services: Key = IterationCloud Roundtable | Amazon Web Services: Key = Iteration
Cloud Roundtable | Amazon Web Services: Key = Iteration
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Último (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Automate Your Big Data Workflows (SVC201) | AWS re:Invent 2013

Notas del editor

  1. Welcome to Application Services Track SVC 201 Automating. When it comes to cloud, its all about automation. Automating data-driven workflows is one of most important tenets of any cloud-native application. I am super excited because there while they are data engineers, I call them data alchemists because they are the people who turn data into gold with help from AWS.
  2. There are number of different ways on can automated data-driven workflows on AWS. I am going to discuss two aspects in this talk – how you can automate compute using SWF and Automate data using AWS Data Pipeline.
  3. SimpleWorklfow Service is one of the most powerful building block services in AWS umbrella of products. Its an orchestration service that has the power to scale your business logic, Maintains distributed app state, tracks workflow and visbility executions, ensures consistency into execution history, Tasks, Timers and Signals and There are really only 3 things to know when you are designing a SWF-based application 1) worker starters 2) Activity Workers 3) decidersWorkflow starters kicksoff the workflow . A decider is an implementation of a workflow&apos;s coordination logic.
  4. - Recent startup based in Denmark
  5. - Silos between domain-specific knowledge- Find information across different disciplines
  6. Concepts are not keywords, but patterns to be recognized across documentsPattern matching poses challenges to the way we handle our dataFull NLP, more computational power and more time required
  7. Fuel our search engineGoal to achieve full coverage in all IP and Science
  8. Linear pipeline glueing scripts togetherTesting scalability
  9. How do we handle this large amount of data?
  10. And more importantly how do we get there fast?Not dealing with additional infrastructureCut corners and focus on core competences
  11. - Started talking with Mario
  12. But more importantlyWork independently
  13. - Brief overview of the system
  14. One EC2 instance fetching data, transforming it into out internal format, and uploading it to S3One bucket per data source, uspto, medline, etc.
  15. Creates a new decision event in the Event HistoryWorkflow startedNew content from S3Re-processing content from DynamoSubset of content from file or DB
  16. Natural approachDifferent computing ratiosDebugging/fixing on the flyMinimal AMIsEC2 user data scripts
  17. Rather than using the Event History for thisSWF history is more expensiveElastic, ramp up provisioning when we need
  18. Internal partitioning of S3
  19. Start up slowlyDecide on a gearing ratioRamp upI/O bound task problems
  20. DynamoDB with local indexes to keep track of workers and instances so that we can make various custom queries for monitoring and managing the instances.SWF Activity task is rescheduled
  21. Account for eventual consistency&quot;back off and try again&quot;-logicCloud issuesThrottling errorsAWS support to raise limits
  22. Debugging from your dev environmentInspect intermediate resultsLocal and remote Workers/decidersAutomated integration testsDynamoDB Local
  23. 150k docs/ hour8500 EC2 cores1500 m3 xlarge
  24. We are iterating on our algorithms quite fastDelivering value to the user
  25. Thanks to Christopher Wright &amp; Erik Kastner, who without them, we wouldn’t be using Data Pipeline
  26. Sqoop requires auto-increment IDs, and can’t handle tables named “public”