SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Big Data & Analytics
Randall Barnes - Bill Moritz - Kevin Dillon
Today’s Session Objectives
• Describe core concepts, common objectives and lessons
learned
• Present specific platforms and products available in AWS
• Provide live, hands-on deployment experience
Big Data applications are defined as having data
volume or variety or velocity characteristics that
render traditional tools/processes impractical
Great potential…
• Keep pace with the accelerating information explosion
• New insights and analytics to improve business decisions
• Create new applications requiring massive real-time data processing
…and at times challenging
• Unpredictable resource demand
• Job orchestration and management complexities
• Geo-distribution of data sources
Reduce costs per workload, saving money and creating
opportunities
Extremely Flexible - ability to provide answers to analytics
questions that don't yet exist
Why Big Data solutions have worked well in the
cloud
AnalyzeCollect
Redshift
EMR EC2
Store
Glacier
S3
DynamoDB
Kinesis
Import Export
Direct Connect
Machine
Learning
Three types of data-driven development
Retrospective
analysis and
reporting
Predictions
to enable smart
applications
Amazon Machine Learning
Amazon EMR
Here-and-now
real-time processing
and dashboards
Amazon Kinesis
Amazon EC2
AWS Lambda
Amazon Redshift
Amazon EMR
Core Principles for Successful Implementations
Elastic resource capacity
• Data Storage, I/O, Computing resources scale on demand
• Dynamically support multiple ephemeral environments such as dev, test and QA
validations
• No up-front capital expenses; pay only for what you use
Streamlined management of platforms and solutions
• Raw infrastructure resources
• Application stacks
Well-supported ecosystem of tools and applications
• Data integration tools
• Analytics and reporting applications
• Resource and job orchestration
What is changing…
Diverse and non-traditional workloads
• Using big data strategies, tools and products to solve problems that
have not traditionally been viewed as big data.
Leverage managed solutions to reduce complexity and staff
constraints
• AWS-managed platforms
• 3rd-Party frameworks
Making The Cloud Work For Your Enterprise
Most common implementation challenges:
• Managing distributed data sets
• Application platform migration – limited resources
• ETL integration, especially leveraging existing IP and
business logic
Making The Cloud Work For Your Enterprise
TCO Mistakes
Overprovisioning
• High I/O storage space for non-active data sets
• Non-linear cost increase for certain instance types
Static resources
• Low overall utilization
• Not leveraging spot instance pricing
Not leveraging Reserved Instance (RI) price strategies
Example 1: MPP Data Warehouse
Bulk Transfer
Reporting
and
Analytics
Data Stage
and Archive
Data Warehouse
Example 1: Elastic Data Warehouse
S3 Glacier
Redshift
Import/Export
Service
Direct Connect
Data Collection, Ingestion and Consumption
• Ship storage devices directly to Amazon
• Transfer to EBS or S3
• Up to 4TB per device
• Higher bandwidth, more consistent performance
• 1Gbps and 10Gbps ports [network providers may offer slice]
Direct Connect
Import/Export Service
Amazon Simple Storage Service (S3)
Object storage container with virtually unlimited capacity
• Store files (objects) in containers (buckets)
• Redundant copies for high durability and reliability
• Available on the internet via REST requests directly or through SDK
• Multiple strategies to secure contents
• Set permissions, access policies and optionally require MFA
• Encryption: Server (simplified) or Client-side
• Audit logging (optional) will record all access requests via api
• Built-in tools for managing versioning, object lifecycle and creating static
websites
• Low pay-as-you-go pricing a function of storage amount (~$.03/GB/Month) plus
metering of I/O requests
Amazon Redshift
• High performance, massively parallel columnar storage
architecture providing streamlined scalability
• Mainstream SQL query syntax (PostgreSQL) allowing for
rapid platform adoption
• Flexible node type and RI options allowing for workload
alignment and cost efficiency
• Integrated with other AWS Big Data Platforms (S3, EMR,
DynamoDB, Data Pipeline)
• Streamlined administrative tasks (snapshot/restore, Node
increase/decrease)
Scalable, fully-managed Data Warehouse
Recap: Elastic Data Warehouse
S3 Glacier
Redshift
Import/Export
Service
Direct Connect
Example 2: Real-Time Data Streaming and NoSQL
Data Warehouse
Application Tier
Backend Apps
Real-Time Processing NoSQL
Example 2: Real-Time Data Streaming and NoSQL
Data Warehouse
Application Tier
Backend Apps
DynamoDBKinesis
Amazon Kinesis
• Fully managed service
• Real-time Log/Application data ingestion and
transformations
• Real-time reporting and analytics
• Data ordering, deterministic routing and replay (up to 24
hours)
• Records: Partition Key, Sequence Number, Data Blob (payload)
• Shards: Units of incremental throughput capacity
• Use SDK APIs for PUT/GET operations
Scalable real-time diverse data processing
Amazon DynamoDB
• Seamless and virtually unlimited scalability; managed
automatically
• Ability to define specific resource allocation limits
• Easy administration and well-supported development model
• Integration with other core Amazon data services
• GET/PUT operations with a user-defined Primary Key
• Tables contain items (PK + Attributes) up to 400KB
• Data Types: Scaler, Set (collections), key-values, documents
• Secondary Indexes (Global and Local)
• Provisioned read- and write-throughput, SSD storage
Challenge: Proprietary API via AWS SDKs (e.g. Java, .NET)
Recap: Real-Time Data Streaming and NoSQL
Data Warehouse
Application Tier
Backend Apps
DynamoDBKinesis
Example 3: Hadoop Workloads
Data Warehouse
Application and Data Stage Tiers
Analytics
Hadoop Processing
Example 3: Hadoop Workloads
Data Warehouse
Application and Data Stage Tiers
Analytics
EMR
Amazon Elastic Map Reduce (EMR)
• Semi-managed service (access to underlying OS)
• Apache Hadoop Framework
• Robust, streamlined management for Map-Reduce jobs
• Simple api for popular extensions, e.g. Hive, Pig, Spark
• Spot Instance pricing available
• HDFS or S3 storage
Your Data + Machine Learning= Smart Applications
Analytics and Reporting
• Broad Vendor Integration
• Reports, Dashboards, BI
Analytics and Reporting
Fast and easy to create
• Reports
• Dashboards
• Near-time analytics decisions
Recap: Hadoop Workloads
Data Warehouse
Application and Data Stage Tiers
Analytics
EMR
Example 4: Machine learning
Machine learning is the technology that
automatically finds patterns in your data and
uses them to make predictions for new data
points as they become available
Your Data + Machine Learning= Smart Applications
Easy to use, managed machine learning service
built for developers
Robust, powerful machine learning technology
based on Amazon’s internal systems
Create models using your data already stored in
the AWS cloud (S3 files, Redshift query, MySQL
RDS query)
Deploy models to production in seconds
Amazon
ML
Smart applications by example
Based on what you
know about an order:
Is this order
fraudulent?
Based on what you
know about the user:
Will they use your
product?
Based on what you know
about a news article:
What other articles are
interesting?
And a few more examples…
Fraud detection Detecting fraudulent transactions, filtering spam emails,
flagging suspicious reviews, …
Personalization Recommending content, predictive content loading,
improving user experience, …
Targeted marketing Matching customers and offers, choosing marketing
campaigns, cross-selling and up-selling, …
Content classification Categorizing documents, matching hiring managers and
resumes, …
Churn prediction Finding customers who are likely to stop using the service,
free-tier upgrade targeting, …
Customer support Predictive routing of customer emails, social media
listening, …
Securing Data in the Cloud
• Secure your AWS console root account
• Use complex passwords and rotate regularly
• Secure data stage locations
Thank You. Questions?
Contact Us
LocationsContact Info
Randall Barnes
Principal Architect, 2nd Watch
rbarnes@2ndwatch.com
Bill Moritz
Sr Cloud Engineer, 2nd Watch
bmoritz@2ndwatch.com
2nd Watch, Inc.
1-888-317-7920
info@2ndwatch.com
www.2ndwatch.com
SEATTLE
NEW YORK
VIRGINIA
ATLANTA
PHILADELPHIA
HOUSTON
LIBERTY LAKE
LOS ANGELES
CHICAGO

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
Cost Optimising Your Architecture Practical Design Steps for Developer Saving...
 
2nd Watch CTO - Kris Blisner
2nd Watch CTO - Kris Blisner2nd Watch CTO - Kris Blisner
2nd Watch CTO - Kris Blisner
 
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
Deep Dive On Object Storage: Amazon S3 and Amazon Glacier - AWS PS Summit Can...
 
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
AWS Summit Berlin 2013 - Optimizing your AWS applications and usage to reduce...
 
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
Evolution of Geospatial Workloads on AWS - AWS PS Summit Canberra
 
AWS Summit Berlin 2013 - Keynote Steve Schmidt
AWS Summit Berlin 2013 - Keynote Steve SchmidtAWS Summit Berlin 2013 - Keynote Steve Schmidt
AWS Summit Berlin 2013 - Keynote Steve Schmidt
 
Aws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWSAws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWS
 
The Changing Landscape of Development with AWS Cloud - AWS PS Summit Canberra...
The Changing Landscape of Development with AWS Cloud - AWS PS Summit Canberra...The Changing Landscape of Development with AWS Cloud - AWS PS Summit Canberra...
The Changing Landscape of Development with AWS Cloud - AWS PS Summit Canberra...
 
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
Optimizing Data Management Using AWS Storage and Data Migration Products | AW...
 
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
AWS Summit Berlin 2013 - Euroforum - Moving an Entire Physical Data Center in...
 
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel AvivFinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
 
Building Big Data Applications on AWS
Building Big Data Applications on AWSBuilding Big Data Applications on AWS
Building Big Data Applications on AWS
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
 
Demystifying Storage on AWS | AWS Public Sector Summit 2017
Demystifying Storage on AWS | AWS Public Sector Summit 2017Demystifying Storage on AWS | AWS Public Sector Summit 2017
Demystifying Storage on AWS | AWS Public Sector Summit 2017
 
How Can I Plan for Security, Risk, & Compliance Before Migrating to AWS? | A...
 How Can I Plan for Security, Risk, & Compliance Before Migrating to AWS? | A... How Can I Plan for Security, Risk, & Compliance Before Migrating to AWS? | A...
How Can I Plan for Security, Risk, & Compliance Before Migrating to AWS? | A...
 
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web ServicesAWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
AWS March 2016 Webinar Series - Managed Database Services on Amazon Web Services
 
Wild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New UnicornWild Rides Takes off - The Dawn of a New Unicorn
Wild Rides Takes off - The Dawn of a New Unicorn
 
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
 

Destacado

Josè alberto vàzquez_pèrez_eje1_actividad4.doc
Josè alberto vàzquez_pèrez_eje1_actividad4.docJosè alberto vàzquez_pèrez_eje1_actividad4.doc
Josè alberto vàzquez_pèrez_eje1_actividad4.doc
licbetovaz
 
Introduce Yourself
Introduce YourselfIntroduce Yourself
Introduce Yourself
HZAHRANI
 
Comunicado n°3 toma utfsm(1)
Comunicado n°3 toma utfsm(1)Comunicado n°3 toma utfsm(1)
Comunicado n°3 toma utfsm(1)
Toma Utfsm
 
SAIOSH OHSPr. Mebership Certificate
SAIOSH OHSPr. Mebership CertificateSAIOSH OHSPr. Mebership Certificate
SAIOSH OHSPr. Mebership Certificate
Andy C. Zimalirana
 
Como se produce el café en colombia
Como se produce el café en colombiaComo se produce el café en colombia
Como se produce el café en colombia
efrainmoraa
 
Debarquement en normandie._._
Debarquement en normandie._._Debarquement en normandie._._
Debarquement en normandie._._
George Martin
 

Destacado (18)

Josè alberto vàzquez_pèrez_eje1_actividad4.doc
Josè alberto vàzquez_pèrez_eje1_actividad4.docJosè alberto vàzquez_pèrez_eje1_actividad4.doc
Josè alberto vàzquez_pèrez_eje1_actividad4.doc
 
Plantilla tpº13
Plantilla tpº13Plantilla tpº13
Plantilla tpº13
 
Presentación Final
Presentación FinalPresentación Final
Presentación Final
 
Introduce Yourself
Introduce YourselfIntroduce Yourself
Introduce Yourself
 
Obras de Misericórdia | O que são? Quais são? E seus efeitos?
Obras de Misericórdia | O que são? Quais são? E seus efeitos?Obras de Misericórdia | O que são? Quais são? E seus efeitos?
Obras de Misericórdia | O que são? Quais são? E seus efeitos?
 
Partnership opportunities
Partnership opportunitiesPartnership opportunities
Partnership opportunities
 
Apresentação cetonas framboesa
Apresentação cetonas framboesaApresentação cetonas framboesa
Apresentação cetonas framboesa
 
Comunicado n°3 toma utfsm(1)
Comunicado n°3 toma utfsm(1)Comunicado n°3 toma utfsm(1)
Comunicado n°3 toma utfsm(1)
 
Tshirts
TshirtsTshirts
Tshirts
 
Vision
VisionVision
Vision
 
SAIOSH OHSPr. Mebership Certificate
SAIOSH OHSPr. Mebership CertificateSAIOSH OHSPr. Mebership Certificate
SAIOSH OHSPr. Mebership Certificate
 
Como se produce el café en colombia
Como se produce el café en colombiaComo se produce el café en colombia
Como se produce el café en colombia
 
Cuestionario
CuestionarioCuestionario
Cuestionario
 
E3JK
E3JKE3JK
E3JK
 
Being an ambassador: Parkinson's UK volunteer induction Module 3, Task 3
Being an ambassador: Parkinson's UK volunteer induction Module 3, Task 3 Being an ambassador: Parkinson's UK volunteer induction Module 3, Task 3
Being an ambassador: Parkinson's UK volunteer induction Module 3, Task 3
 
EC 512: Monte Sinaí
EC 512: Monte SinaíEC 512: Monte Sinaí
EC 512: Monte Sinaí
 
Análisis de Regresión Múltiple
Análisis de Regresión MúltipleAnálisis de Regresión Múltiple
Análisis de Regresión Múltiple
 
Debarquement en normandie._._
Debarquement en normandie._._Debarquement en normandie._._
Debarquement en normandie._._
 

Similar a Big data and Analytics on AWS

Similar a Big data and Analytics on AWS (20)

Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
What's new in AWS?
What's new in AWS?What's new in AWS?
What's new in AWS?
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical SolutionEnterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 

Más de 2nd Watch

Más de 2nd Watch (14)

Managing Multi-Cloud and On-Premises with Microsoft Azure
Managing Multi-Cloud and On-Premises with Microsoft AzureManaging Multi-Cloud and On-Premises with Microsoft Azure
Managing Multi-Cloud and On-Premises with Microsoft Azure
 
Containers, from Production to Development
Containers, from Production to DevelopmentContainers, from Production to Development
Containers, from Production to Development
 
Containers, From Development to Production
Containers, From Development to ProductionContainers, From Development to Production
Containers, From Development to Production
 
Getting Started with VMware Cloud on AWS
Getting Started with VMware Cloud on AWSGetting Started with VMware Cloud on AWS
Getting Started with VMware Cloud on AWS
 
Operating Windows on AWS Using SSM
Operating Windows on AWS Using SSMOperating Windows on AWS Using SSM
Operating Windows on AWS Using SSM
 
Cloud Optimization: Filling in the Gaps
Cloud Optimization: Filling in the GapsCloud Optimization: Filling in the Gaps
Cloud Optimization: Filling in the Gaps
 
Automated Security & Continuous Compliance on Microsoft Azure
Automated Security & Continuous Compliance on Microsoft AzureAutomated Security & Continuous Compliance on Microsoft Azure
Automated Security & Continuous Compliance on Microsoft Azure
 
Migrating Your Windows Datacenter to AWS
Migrating Your Windows Datacenter to AWSMigrating Your Windows Datacenter to AWS
Migrating Your Windows Datacenter to AWS
 
Single Realm Multi-Cloud Security Management with Palo Alto Networks
Single Realm Multi-Cloud Security Management with Palo Alto NetworksSingle Realm Multi-Cloud Security Management with Palo Alto Networks
Single Realm Multi-Cloud Security Management with Palo Alto Networks
 
Drive Thru DevOps, Moving Forward Securely
Drive Thru DevOps, Moving Forward SecurelyDrive Thru DevOps, Moving Forward Securely
Drive Thru DevOps, Moving Forward Securely
 
Secure Clouds are Happy Clouds
Secure Clouds are Happy CloudsSecure Clouds are Happy Clouds
Secure Clouds are Happy Clouds
 
Money Pitfalls and Failed Expectations: Optimizing Essentials for the Cloud
Money Pitfalls and Failed Expectations: Optimizing Essentials for the CloudMoney Pitfalls and Failed Expectations: Optimizing Essentials for the Cloud
Money Pitfalls and Failed Expectations: Optimizing Essentials for the Cloud
 
Aws Architecture Fundamentals
Aws Architecture FundamentalsAws Architecture Fundamentals
Aws Architecture Fundamentals
 
Aws Architecture Fundamentals
Aws Architecture FundamentalsAws Architecture Fundamentals
Aws Architecture Fundamentals
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Big data and Analytics on AWS

  • 1. Big Data & Analytics Randall Barnes - Bill Moritz - Kevin Dillon
  • 2. Today’s Session Objectives • Describe core concepts, common objectives and lessons learned • Present specific platforms and products available in AWS • Provide live, hands-on deployment experience
  • 3. Big Data applications are defined as having data volume or variety or velocity characteristics that render traditional tools/processes impractical Great potential… • Keep pace with the accelerating information explosion • New insights and analytics to improve business decisions • Create new applications requiring massive real-time data processing …and at times challenging • Unpredictable resource demand • Job orchestration and management complexities • Geo-distribution of data sources
  • 4. Reduce costs per workload, saving money and creating opportunities Extremely Flexible - ability to provide answers to analytics questions that don't yet exist Why Big Data solutions have worked well in the cloud
  • 6. Three types of data-driven development Retrospective analysis and reporting Predictions to enable smart applications Amazon Machine Learning Amazon EMR Here-and-now real-time processing and dashboards Amazon Kinesis Amazon EC2 AWS Lambda Amazon Redshift Amazon EMR
  • 7. Core Principles for Successful Implementations Elastic resource capacity • Data Storage, I/O, Computing resources scale on demand • Dynamically support multiple ephemeral environments such as dev, test and QA validations • No up-front capital expenses; pay only for what you use Streamlined management of platforms and solutions • Raw infrastructure resources • Application stacks Well-supported ecosystem of tools and applications • Data integration tools • Analytics and reporting applications • Resource and job orchestration
  • 8. What is changing… Diverse and non-traditional workloads • Using big data strategies, tools and products to solve problems that have not traditionally been viewed as big data. Leverage managed solutions to reduce complexity and staff constraints • AWS-managed platforms • 3rd-Party frameworks Making The Cloud Work For Your Enterprise
  • 9. Most common implementation challenges: • Managing distributed data sets • Application platform migration – limited resources • ETL integration, especially leveraging existing IP and business logic Making The Cloud Work For Your Enterprise
  • 10. TCO Mistakes Overprovisioning • High I/O storage space for non-active data sets • Non-linear cost increase for certain instance types Static resources • Low overall utilization • Not leveraging spot instance pricing Not leveraging Reserved Instance (RI) price strategies
  • 11. Example 1: MPP Data Warehouse Bulk Transfer Reporting and Analytics Data Stage and Archive Data Warehouse
  • 12. Example 1: Elastic Data Warehouse S3 Glacier Redshift Import/Export Service Direct Connect
  • 13. Data Collection, Ingestion and Consumption • Ship storage devices directly to Amazon • Transfer to EBS or S3 • Up to 4TB per device • Higher bandwidth, more consistent performance • 1Gbps and 10Gbps ports [network providers may offer slice] Direct Connect Import/Export Service
  • 14. Amazon Simple Storage Service (S3) Object storage container with virtually unlimited capacity • Store files (objects) in containers (buckets) • Redundant copies for high durability and reliability • Available on the internet via REST requests directly or through SDK • Multiple strategies to secure contents • Set permissions, access policies and optionally require MFA • Encryption: Server (simplified) or Client-side • Audit logging (optional) will record all access requests via api • Built-in tools for managing versioning, object lifecycle and creating static websites • Low pay-as-you-go pricing a function of storage amount (~$.03/GB/Month) plus metering of I/O requests
  • 15. Amazon Redshift • High performance, massively parallel columnar storage architecture providing streamlined scalability • Mainstream SQL query syntax (PostgreSQL) allowing for rapid platform adoption • Flexible node type and RI options allowing for workload alignment and cost efficiency • Integrated with other AWS Big Data Platforms (S3, EMR, DynamoDB, Data Pipeline) • Streamlined administrative tasks (snapshot/restore, Node increase/decrease) Scalable, fully-managed Data Warehouse
  • 16. Recap: Elastic Data Warehouse S3 Glacier Redshift Import/Export Service Direct Connect
  • 17. Example 2: Real-Time Data Streaming and NoSQL Data Warehouse Application Tier Backend Apps Real-Time Processing NoSQL
  • 18. Example 2: Real-Time Data Streaming and NoSQL Data Warehouse Application Tier Backend Apps DynamoDBKinesis
  • 19. Amazon Kinesis • Fully managed service • Real-time Log/Application data ingestion and transformations • Real-time reporting and analytics • Data ordering, deterministic routing and replay (up to 24 hours) • Records: Partition Key, Sequence Number, Data Blob (payload) • Shards: Units of incremental throughput capacity • Use SDK APIs for PUT/GET operations Scalable real-time diverse data processing
  • 20. Amazon DynamoDB • Seamless and virtually unlimited scalability; managed automatically • Ability to define specific resource allocation limits • Easy administration and well-supported development model • Integration with other core Amazon data services • GET/PUT operations with a user-defined Primary Key • Tables contain items (PK + Attributes) up to 400KB • Data Types: Scaler, Set (collections), key-values, documents • Secondary Indexes (Global and Local) • Provisioned read- and write-throughput, SSD storage Challenge: Proprietary API via AWS SDKs (e.g. Java, .NET)
  • 21. Recap: Real-Time Data Streaming and NoSQL Data Warehouse Application Tier Backend Apps DynamoDBKinesis
  • 22. Example 3: Hadoop Workloads Data Warehouse Application and Data Stage Tiers Analytics Hadoop Processing
  • 23. Example 3: Hadoop Workloads Data Warehouse Application and Data Stage Tiers Analytics EMR
  • 24. Amazon Elastic Map Reduce (EMR) • Semi-managed service (access to underlying OS) • Apache Hadoop Framework • Robust, streamlined management for Map-Reduce jobs • Simple api for popular extensions, e.g. Hive, Pig, Spark • Spot Instance pricing available • HDFS or S3 storage Your Data + Machine Learning= Smart Applications
  • 25. Analytics and Reporting • Broad Vendor Integration • Reports, Dashboards, BI
  • 26. Analytics and Reporting Fast and easy to create • Reports • Dashboards • Near-time analytics decisions
  • 27. Recap: Hadoop Workloads Data Warehouse Application and Data Stage Tiers Analytics EMR
  • 28. Example 4: Machine learning Machine learning is the technology that automatically finds patterns in your data and uses them to make predictions for new data points as they become available Your Data + Machine Learning= Smart Applications
  • 29. Easy to use, managed machine learning service built for developers Robust, powerful machine learning technology based on Amazon’s internal systems Create models using your data already stored in the AWS cloud (S3 files, Redshift query, MySQL RDS query) Deploy models to production in seconds Amazon ML
  • 30. Smart applications by example Based on what you know about an order: Is this order fraudulent? Based on what you know about the user: Will they use your product? Based on what you know about a news article: What other articles are interesting?
  • 31. And a few more examples… Fraud detection Detecting fraudulent transactions, filtering spam emails, flagging suspicious reviews, … Personalization Recommending content, predictive content loading, improving user experience, … Targeted marketing Matching customers and offers, choosing marketing campaigns, cross-selling and up-selling, … Content classification Categorizing documents, matching hiring managers and resumes, … Churn prediction Finding customers who are likely to stop using the service, free-tier upgrade targeting, … Customer support Predictive routing of customer emails, social media listening, …
  • 32. Securing Data in the Cloud • Secure your AWS console root account • Use complex passwords and rotate regularly • Secure data stage locations
  • 34. Contact Us LocationsContact Info Randall Barnes Principal Architect, 2nd Watch rbarnes@2ndwatch.com Bill Moritz Sr Cloud Engineer, 2nd Watch bmoritz@2ndwatch.com 2nd Watch, Inc. 1-888-317-7920 info@2ndwatch.com www.2ndwatch.com SEATTLE NEW YORK VIRGINIA ATLANTA PHILADELPHIA HOUSTON LIBERTY LAKE LOS ANGELES CHICAGO