SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Specialist Solutions Architect, Data and Analytics, EMEA
November 17th, 2017
Full Stack Analytics on AWS
Ian Robinson
Forces and Trends Prompting the Move to Cloud
Cost Optimization
Licenses
Hardware
Data center and operations
Dark Data
Prematurely discarding data
Agility
Experimentation (data & tools)
Democratised Access to Data
Time-to-first-results
Terminate failed experiments early
From BI to Data Science
In-house data science
From back office to product
Storage is the Gravity for Cloud Applications
Store all your data, for ever, at every stage of its lifecycle
Apply it using the right tool for the job
Storage is Job #1
Object Storage is Foundational
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent
Access
Amazon Glacier
Create
Delete
Events and Lifecycle Management
S3 as the Data Lake Fabric
• Unlimited number of objects
and volume
• 99.99% availability
• 99.999999999% durability
• Versioning
• Tiered storage via lifecycle
policies
• SSL, client/server-side
encryption at rest
• Low cost (just over
$2700/month for 100TB)
• Natively supported by big
data frameworks (Spark, Hive,
Presto, etc)
• Decouples storage and
compute
• Run transient compute
clusters (with Amazon EC2
Spot Instances)
• Multiple, heterogeneous
clusters can use same data
Database	Migration
Service
Automated Data Ingestion
Stream Events to S3 Using Kinesis Firehose
Write Database Changes to S3 with DMS
<schema_name>/<table_name>/LOAD001.csv
<schema_name>/<table_name>/LOAD002.csv
<schema_name>/<table_name>/<time-stamp>.csv
Full Load
Change Data Capture
Scalable (secure, versioned, durable) storage +
Immutable data at every stage of its lifecycle +
Versioned schema and metadata
=
Data discovery, lineage
Storage + Catalog
AWS Glue
• Data Catalog Discover
and store metadata
• Job Authoring Auto-
generated ETL code
• Job Execution
Serverless scheduling
and execution
Hive metastore-compatible, highly-
available metadata repository:
• Classification for identifying and
parsing files
• Versioning of table metadata as
schemas evolve
• Table definitions – usable by
Redshift, Athena, Glue, EMR
Populate using Hive DDL, bulk import, or
automatically through crawlers.
Glue Data Catalog
semi-structured
per-file schema
semi-structured
unified schema
identify file type
and parse files
enumerate
S3 objects
file 1
file 2
file N
…
int
array
intchar
struct
char int
array
struct
char
bool int
int
arrayint
char
char int
custom classifiers
app log parser
metrics parser
…
system classifiers
JSON parser
CSV parser
Apache log parser
…
bool
Crawlers: Automatic Schema Inference
AWS Lambda
AWS Lambda
Metadata Index
(Amazon DynamoDB)
Search Index
(Amazon Elasticsearch)
ObjectCreated
ObjectDeleted PutItem
Update Stream
Update Index
Extract Search Fields
Indexing and Searching Using Metadata
Amazon S3
Security is Job #0
Data Access & Authorisation
Give your users easy and secure access
Storage & Catalog
Secure, cost-effective storage in Amazon S3.
Robust metadata in AWS Catalog
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Identity and Access Management
• Manage users,	groups,	and	roles
• Identity	federation with	Open	ID
• Temporary	credentials	with	Amazon	Security	Token	
Service	(Amazon	STS)
• Stored	policy	templates
• Powerful	policy	language	
• Amazon	S3	bucket	policies
IAM
Amazon
S3
Amazon
ElastiCache
Amazon
DynamoDB
Amazon
EMR
Amazon
Kinesis
Amazon
Athena
Service API Access
Security at the Data Level
Third Party Ecosystem Security Tools
Amazon
S3
AWS
CloudTrail
http://amzn.to/2tSimHj
Amazon
Athena
Access Logging
API Logging
Access Log
Analytics
IAM
Amazon
EMR
http://amzn.to/2si6RqS
Storage Level Support for Access Logging and Audit
Encryption Options
AWS	Server-Side	encryption
• AWS	managed	key	infrastructure
AWS	Key	Management	Service
• Automated	key	rotation	&	auditing
• Integration	with	other	AWS	services
AWS	CloudHSM
• Dedicated	Tenancy	SafeNet Luna	SA	HSM	Device
• Common	Criteria	EAL4+,	NIST	FIPS	140-2
Serverless Processing and
Analytics
• Python code generated
by AWS Glue
• Connect a notebook or
IDE to AWS Glue
• Existing code brought
into AWS Glue
Managed ETL with AWS GLue
• Schedule-based
• Event-based
• On demand
Job Execution with AWS Glue
Amazon Kinesis Analytics
• Interact with streaming data in real time using SQL
• Build fully managed and elastic stream processing
applications that process data for real-time visualizations
and alarms
SELECT STREAM author,
count(author) OVER ONE_MINUTE
FROM Tweets
WINDOW ONE_MINUTE AS
(PARTITION BY author
RANGE INTERVAL '1' MINUTE PRECEDING)
WHERE text LIKE ‘%#BigDataSpain%';
Amazon Kinesis Analytics – Simple SQL Interface
Amazon Athena – Analyze Data in S3
• Interactive queries
• ANSI SQL
• No infrastructure or administration
• Zero spin up time
• Query data in its raw format
• AVRO, Text, CSV, JSON, weblogs, AWS service logs
• Convert to an optimized form like ORC or Parquet for the
best performance and lowest cost
• No loading of data, no ETL required
• Stream data from directly from Amazon S3, take advantage
of Amazon S3 durability and availability
Simple query editor
with syntax highlighting
and autocomplete
Data Catalog
Query History, Saved Queries, and
Catalog Management
QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-premises sources including Amazon Athena
Amazon RDS
Amazon S3
Amazon Redshift
Amazon Athena
Using Amazon Athena with Amazon QuickSight
Building Smarter Applications
Add Machine Learning Capabilities
Amazon Machine Learning Service
Batch and online predictions
Train using data in S3, RDS and
Redshift
Amazon EMR
Comprehensive machine learning
libraries (eg Spark MLlib, Anaconda)
Provision analytics clusters in minutes,
autoscale with data volume or query
demand
Amazon AI Services
Amazon Polly – Lifelike Text-to-Speech
47 voices, 24 languages
Low-latency, real time
Amazon Rekognition – Image Analysis
Object and scene detection
Facial analysis
Amazon Lex – Conversational Engine
Speech and text recognition
Enterprise connectors
Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Facial Analysis with Rekognition
Brightness: 25.84
Sharpness: 160
General Attributes
Up to ~40k CUDA cores
Pre-configured CUDA drivers
Jupyter notebook with Python2,
Python3, Anaconda
CloudFormation Template
AWS Marketplace – one-click deploy
AWS Deep Learning AMI
Kinesis Firehose
Athena
Query Service Glue
Machine Learning
Predictive analytics
Data Access & Authorisation
Give your users easy and secure access
Data Ingestion
Get your data into S3
quickly and securely
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Amazon AI
Storage & Catalog
Secure, cost-effective storage in Amazon S3.
Robust metadata in AWS Catalog
Thank You
Full Stack Analytics on AWS

Más contenido relacionado

Más de Big Data Spain

State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Big Data Spain
 
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Big Data Spain
 
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...Big Data Spain
 
End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...
End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...
End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...Big Data Spain
 
A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...Big Data Spain
 
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Big Data Spain
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Big Data Spain
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Big Data Spain
 
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...Big Data Spain
 

Más de Big Data Spain (20)

State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
 
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
Deep reinforcement learning : Starcraft learning environment by Gema Parreño ...
 
End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...
End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...
End-to-End “Exactly Once” with Heron & Pulsar by Ivan Kelly at Big Data Spain...
 
A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...A Deep Learning use case for water end use detection by Roberto Díaz and José...
A Deep Learning use case for water end use detection by Roberto Díaz and José...
 
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
 
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
Spark Streaming + Kafka 0.10: an integration story by Joan Viladrosa Riera at...
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
 
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
Bayesian inference and big data: are we there yet? by Jose Luis Hidalgo at Bi...
 

Último

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Full Stack Analytics on Amazon Web Services by Ian Robinson at Big Data Spain 2017

  • 1.
  • 2. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Specialist Solutions Architect, Data and Analytics, EMEA November 17th, 2017 Full Stack Analytics on AWS Ian Robinson
  • 3. Forces and Trends Prompting the Move to Cloud Cost Optimization Licenses Hardware Data center and operations Dark Data Prematurely discarding data Agility Experimentation (data & tools) Democratised Access to Data Time-to-first-results Terminate failed experiments early From BI to Data Science In-house data science From back office to product
  • 4. Storage is the Gravity for Cloud Applications Store all your data, for ever, at every stage of its lifecycle Apply it using the right tool for the job
  • 6. Object Storage is Foundational
  • 7. Standard Active data Archive dataInfrequently accessed data Standard - Infrequent Access Amazon Glacier Create Delete Events and Lifecycle Management
  • 8. S3 as the Data Lake Fabric • Unlimited number of objects and volume • 99.99% availability • 99.999999999% durability • Versioning • Tiered storage via lifecycle policies • SSL, client/server-side encryption at rest • Low cost (just over $2700/month for 100TB) • Natively supported by big data frameworks (Spark, Hive, Presto, etc) • Decouples storage and compute • Run transient compute clusters (with Amazon EC2 Spot Instances) • Multiple, heterogeneous clusters can use same data
  • 10. Stream Events to S3 Using Kinesis Firehose
  • 11. Write Database Changes to S3 with DMS <schema_name>/<table_name>/LOAD001.csv <schema_name>/<table_name>/LOAD002.csv <schema_name>/<table_name>/<time-stamp>.csv Full Load Change Data Capture
  • 12. Scalable (secure, versioned, durable) storage + Immutable data at every stage of its lifecycle + Versioned schema and metadata = Data discovery, lineage Storage + Catalog
  • 13. AWS Glue • Data Catalog Discover and store metadata • Job Authoring Auto- generated ETL code • Job Execution Serverless scheduling and execution
  • 14. Hive metastore-compatible, highly- available metadata repository: • Classification for identifying and parsing files • Versioning of table metadata as schemas evolve • Table definitions – usable by Redshift, Athena, Glue, EMR Populate using Hive DDL, bulk import, or automatically through crawlers. Glue Data Catalog
  • 15. semi-structured per-file schema semi-structured unified schema identify file type and parse files enumerate S3 objects file 1 file 2 file N … int array intchar struct char int array struct char bool int int arrayint char char int custom classifiers app log parser metrics parser … system classifiers JSON parser CSV parser Apache log parser … bool Crawlers: Automatic Schema Inference
  • 16. AWS Lambda AWS Lambda Metadata Index (Amazon DynamoDB) Search Index (Amazon Elasticsearch) ObjectCreated ObjectDeleted PutItem Update Stream Update Index Extract Search Fields Indexing and Searching Using Metadata Amazon S3
  • 18. Data Access & Authorisation Give your users easy and secure access Storage & Catalog Secure, cost-effective storage in Amazon S3. Robust metadata in AWS Catalog Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified
  • 19. Identity and Access Management • Manage users, groups, and roles • Identity federation with Open ID • Temporary credentials with Amazon Security Token Service (Amazon STS) • Stored policy templates • Powerful policy language • Amazon S3 bucket policies
  • 21. Third Party Ecosystem Security Tools Amazon S3 AWS CloudTrail http://amzn.to/2tSimHj Amazon Athena Access Logging API Logging Access Log Analytics IAM Amazon EMR http://amzn.to/2si6RqS Storage Level Support for Access Logging and Audit
  • 22. Encryption Options AWS Server-Side encryption • AWS managed key infrastructure AWS Key Management Service • Automated key rotation & auditing • Integration with other AWS services AWS CloudHSM • Dedicated Tenancy SafeNet Luna SA HSM Device • Common Criteria EAL4+, NIST FIPS 140-2
  • 24. • Python code generated by AWS Glue • Connect a notebook or IDE to AWS Glue • Existing code brought into AWS Glue Managed ETL with AWS GLue
  • 25. • Schedule-based • Event-based • On demand Job Execution with AWS Glue
  • 26. Amazon Kinesis Analytics • Interact with streaming data in real time using SQL • Build fully managed and elastic stream processing applications that process data for real-time visualizations and alarms
  • 27. SELECT STREAM author, count(author) OVER ONE_MINUTE FROM Tweets WINDOW ONE_MINUTE AS (PARTITION BY author RANGE INTERVAL '1' MINUTE PRECEDING) WHERE text LIKE ‘%#BigDataSpain%'; Amazon Kinesis Analytics – Simple SQL Interface
  • 28. Amazon Athena – Analyze Data in S3 • Interactive queries • ANSI SQL • No infrastructure or administration • Zero spin up time • Query data in its raw format • AVRO, Text, CSV, JSON, weblogs, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No loading of data, no ETL required • Stream data from directly from Amazon S3, take advantage of Amazon S3 durability and availability
  • 29. Simple query editor with syntax highlighting and autocomplete Data Catalog Query History, Saved Queries, and Catalog Management
  • 30. QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-premises sources including Amazon Athena Amazon RDS Amazon S3 Amazon Redshift Amazon Athena Using Amazon Athena with Amazon QuickSight
  • 31.
  • 33. Add Machine Learning Capabilities Amazon Machine Learning Service Batch and online predictions Train using data in S3, RDS and Redshift Amazon EMR Comprehensive machine learning libraries (eg Spark MLlib, Anaconda) Provision analytics clusters in minutes, autoscale with data volume or query demand
  • 34. Amazon AI Services Amazon Polly – Lifelike Text-to-Speech 47 voices, 24 languages Low-latency, real time Amazon Rekognition – Image Analysis Object and scene detection Facial analysis Amazon Lex – Conversational Engine Speech and text recognition Enterprise connectors
  • 35. Demographic Data Facial Landmarks Sentiment Expressed Image Quality Facial Analysis with Rekognition Brightness: 25.84 Sharpness: 160 General Attributes
  • 36. Up to ~40k CUDA cores Pre-configured CUDA drivers Jupyter notebook with Python2, Python3, Anaconda CloudFormation Template AWS Marketplace – one-click deploy AWS Deep Learning AMI
  • 37. Kinesis Firehose Athena Query Service Glue Machine Learning Predictive analytics Data Access & Authorisation Give your users easy and secure access Data Ingestion Get your data into S3 quickly and securely Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding Protect and Secure Use entitlements to ensure data is secure and users’ identities are verified Amazon AI Storage & Catalog Secure, cost-effective storage in Amazon S3. Robust metadata in AWS Catalog
  • 38. Thank You Full Stack Analytics on AWS