SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Welcome to Today’s
DBTA Roundtable Discussion
Moderator
Stephen Faig
Manager
Unisphere Research and DBTA
Real-Time Analytics with Hadoop
Speakers
Dale Kim
Director of Industry Solutions
MapR
Paige Roberts
Hadoop & Analytics Evangelist
Actian
© 2015 MapR Technologies 5© 2015 MapR Technologies
© 2015 MapR Technologies 6
Examples of Real-Time
Images licensed under https://creativecommons.org/licenses/by/2.0/
Time image courtesy of Daniel Oldfield: https://www.flickr.com/photos/democlez/4424898002/
Air bag image courtesy of Mike Babcock: https://www.flickr.com/photos/mikebabcock/3098836311/
Tied to clock time Guaranteed response time
For real-time analytics, let’s use: “no built-in delays”
So what is real-time analytics with Hadoop?
© 2015 MapR Technologies 7
Requirements for Real-Time Analytics with Hadoop
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 8
Real-Time Data
Definition: Provide immediate access to live Hadoop data
for analysis
Requirements:
• Analysis uses live real-time data, not batch-copied data
• Business can identify insights immediately (often through
an automated process)
• Critical for use cases such as ad targeting, personalization, network
security analysis.
• System avoids complexity of separate stream processing
or messaging system for recent data
© 2015 MapR Technologies 9
Real-Time Data in Hadoop
For real-time:
• Log files should be written directly
into the cluster or synced across
remote data centers
• Operational applications should
run in the same cluster, or in a
separate cluster with real-time
table replication
• Immediate action should be taken
• E.g., difference between fraud
detection and fraud prevention
• Difference between on-demand ad
bid versus missing opportunity
Existing challenges:
• Log files must be batch uploaded
periodically (e.g., every 30
minutes)
• Due to HDFS limitations (not R/W,
file-close semantics, no direct NFS)
• Operational applications run on a
separate cluster/stack
• Data must be batch uploaded
• With batch uploads, the window to
respond is missed
• Fraud, cyber attacks, matches,
anomalies, etc.
© 2015 MapR Technologies 10
Real-Time Applications
Definition: Run operational applications in the cluster
Requirements:
• Address use cases beyond batch and interactive
analysis
• E.g., end-to-end real-time marketing and security applications directly
on Hadoop
• Eliminate separate Hadoop and NoSQL
clusters/technology stacks for apps
© 2015 MapR Technologies 11
Real-Time Applications in Hadoop
For real-time:
• Minimize impact of disrupting
“housekeeping tasks to enable
consistent, real-time operations
• E.g., Compactions, Java garbage
collection, “region splits”
• Process live, operational data in
Hadoop to avoid delays in batch
copies
Existing challenges:
• Other in-Hadoop databases suffer
disruptions, inhibiting real-time
• E.g., Compactions can significantly
slow down the system
• Garbage collection leads to
unpredictable system delays
• Region splits are required to spread
load, but impacts responsiveness
and performance
• Other in-Hadoop databases require
separate clusters
© 2015 MapR Technologies 12
Real-Time Querying
Definition: Query any data as soon as it lands in the
cluster (self-service)
Requirements:
• Analysts can explore data immediately, no waiting
days/weeks for data prep by IT
• IT is not burdened with repeated schema management
and ETL requests
© 2015 MapR Technologies 13
Real-Time Querying in Hadoop
For real-time:
• Minimize time to get started on
data exploration
• Leverage query engines that can
query data in place
– Eliminate IT dependencies for
schema preparation
Existing challenges:
• New data that lands in the cluster
necessarily requires IT-built
schemas
• Data exploration and analysis is
contingent on IT backlog
© 2015 MapR Technologies 14© 2015 MapR Technologies
So How Are These Implemented?
© 2014 MapR Technologies 15
Fraud model
Recommendations
table
MapR Distribution for Hadoop
Fraud
investigator
Interactive
marketer
Online
transactions
Fraud
detection
Personalized
offers
Clickstream
analysis
Fraud
investigation tool
Real-time Operational Applications
Analytics
Case Study: Global Financial Services Firm
Analytics + Operational Applications on one platform
© 2015 MapR Technologies 16
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 17
Faster/Secure NFS Access
Redundant gateways
for high availability
CLIENT NODE(S)
NFS
Gateway
NFS
Gateway
MapR data access options:
1. HDFS API – apps written for Hadoop
2. Standard read/write NFS (POSIX) – existing
file system-based apps, no code changes
3. MapR POSIX Client – advanced read/write
NFS requirements, includes:
1. Compression
2. Parallelism
3. Authentication
4. Encryption
NFS client
(included in OS)
Native applications
HDFS API
(hadoop-core-*.jar)
MapR POSIX
Client
MapR cluster
Hadoop
applications
(e.g. “hadoop fs –put”)
File-based apps/utils
(e.g. cp, emacs)
NFS client
(included in OS)
NFS
Gateway
2
3
1
© 2015 MapR Technologies 18
YCSB
Benchmark
MapR-DB 4.X Other NoSQL
MapR-DB
Increase
Load
(10, 100)*
27,097 14,753 1.8x
Read
(75, 150)
4,402 1,902 2.3x
50% read /
50% update
(75, 100)
8,684 2,012 4.3x
95% read /
5% update
(75, 100)
3,776 1,127 3.4x
Scan
(32, 32)
478 Client hangs N/A
MapR-DB and “Other NoSQL” Throughput on YCSB
Throughput performance in operations/second/node (higher is better)
*Numbers in parentheses represent threads per client used in test runs for MapR-DB, other NoSQL, respectively
© 2015 MapR Technologies 19
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 20
YCSB Mixed (50% Read / 50% Put) - Compare Read Latency
MapR-DB
HBase on other
Hadoop distribution
Lower is better
© 2015 MapR Technologies 21
MapR-DB Table Replication
Multi-master (aka, active/active)
replication
Active Read/Write
End Users
• Faster data access – minimize network
latency on global data with local clusters
• Reduced risk of data loss – real-time,
bi-directional replication for synchronized
data across active clusters
• Application failover – upon any cluster
failure, applications continue via
redirection to another cluster
© 2015 MapR Technologies 22
MapR-DB Real-Time Analytics
Active clusters close to the end users,
with real-time analytics at central cluster
Active Read/Write
MapR-DB cluster
(London)
MapR-DB cluster
(New York)
MapR-DB cluster
(Singapore)
MapR-DB/Hadoop
cluster
Hadoop analytics
Operational and analytical workloads
combined in a single deployment
Operationally efficient,
consolidated MapR cluster
Database
operations
Hadoop
analytics
End Users
© 2015 MapR Technologies 23
REAL-TIME
DATA
REAL-TIME
APPLICATIONS
REAL-TIME
QUERIES
© 2015 MapR Technologies 24
One SQL Interface for All Data Formats
Unstructured data will
account for more than 80%
of the data collected by
organizations
ANSI SQL queries on rapidly evolving schemas
UNSTRUCTURED
DATA
STRUCTURED DATA
2000 20101990 2020
TotalDataStored
Existing
SQL
Engines
Apache
Drill
Self-Service
Data
Exploration
IT-Driven BI
Self-Service BI
SQL Options for
Analytics
© 2015 MapR Technologies 25
Traditional
Approach
Agility by Reducing Distance to Data
Short analytic life cycles with no upfront schema creation and management
Hadoop Data
Schema
Design
Transforma
tion
Data
Movement
Users
Hadoop Data Users
New Business Questions
Total Time to Value: Weeks to Months
Total Time to Value: Minutes
New
Approach
Data Preparation
New Business Questions
Drill enables the
“As It Happens” business
with instant SQL analytics
on complex data
FROM:
TO:
© 2015 MapR Technologies 26© 2015 MapR Technologies
Summary
© 2015 MapR Technologies 27
Batch Bottlenecks
1. Data streaming – real-time,
but…
2. Further analysis is limited by
batch loads into HDFS
3. Most databases must run in
separate cluster, leading to
batch copies
4. Append-only HDFS leads to
heavy I/O for database
defragmentation
(“compactions”)
5. Data exploration requires IT
intervention
1
2
3
4
5
© 2015 MapR Technologies 28
Removing the Batch Limitations
1. Data streaming – real-time
as before, and now….
2. Further analysis is allowed
with real-time loading
3. MapR-DB runs in Hadoop
4. With full read/write file
system, defragmentation
delays are eliminated
5. Data exploration performed
in a self-service manner
Real-Time
Data
Real-Time
Applications
Real-Time
Querying
1
2
34
5
© 2015 MapR Technologies 29
And Don’t Forget…
• Real-time analytics doesn’t help you if the other key pieces
aren’t in place
• Include security
– Interoperability with any authentication mechanism
– Fine-grained access controls
– Auditing capabilities beyond simple log files
• Also include enterprise-grade reliability
– An automated high availability configuration
– Incremental mirroring/replication for disaster recovery
– Consistent snapshots
• Talk to us about what else you should consider
© 2015 MapR Technologies 30
Q&A
@mapr maprtech
dalekim@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
Confidential © 2014 Actian Corporation31
Real-Time Analytics
Paige Roberts
April, 2015
Hadoop & Analytics Evangelist
Actian Hadoop & Analytics Center of Excellence
Confidential © 2014 Actian Corporation32
Agenda
About Actian
Advantages of Data-Driven Business
What Do I Mean By Real-Time?
Real-Time Challenge: ATM Fraud
How Actian Does It
Confidential © 2014 Actian Corporation33
$140M Revenues + Profitable
10,000+ Customers
Global Presence: 8 world-wide offices, 7x 24 multinational support model
33
“Fast becoming a big data
powerhouse to challenge
the market.” Forrester
“Actian is now very powerfully
positioned in the big data and analytics
markets.” Bloor
A Few Words About Actian
Confidential © 2014 Actian Corporation34
Note: Percentage, 10 year CAGR McKinsey Report on Big Data.
8
9
5
5
-1
6
9
14
11
9
24
12
Revenue
Big Data Other Companies
Grocers
Online Retailers
Big Box Retailers
Casinos
Credit Cards
Insurance
EBITDA
• Predictive
• Real-time
• All Data
• New Insights
• Accuracy
5
-1
1
2
-15
3
14
9
12
10
22
11
…. At the Expense of Those That Don’t
Companies Using Big Data Strategically Outperform
Confidential © 2014 Actian Corporation35
What Does Real-Time Mean to Us?
Human comfortable interactivity
Streaming data processing
Sub-second response
Confidential © 2014 Actian Corporation36
Real-Time Analytics – Many Applications
Solar Power Company
New customer targeting
Smart meter data
Sportswear Company
Brand loyalty
Wearable data
Bank
ATM Fraud
Router data
Confidential © 2014 Actian Corporation37
Large US Bank Needs Help
• Multi-billion dollar American
bank / financial holding
company
• Provides deposit, credit,
trust, and investment
services to a broad range of
clients
• Operates nearly 1,500 retail
branches and more than
2,000 ATMs
Confidential © 2014 Actian Corporation38
Numberoftimesfasterthan
Impala
Fraud Kept This Bank’s Execs Up at Night
Confidential © 2014 Actian Corporation39
What is the Worst Gotcha About ATM Fraud?
In spite of that, 67% of U.S. adults would
switch to another institution after
experiencing ATM fraud or a data breach.
http://www.harrisinteractive.com/NewsRoom/HarrisPolls/tabid/447/ctl/ReadCustom%20Default/mid/1508/ArticleId/1515/Default.aspx
In the majority of cases, banks are required
to reimburse customer losses.
https://www.tycois.com/insights-and-opinions/articles/atm-skimming-costs-banks
Confidential © 2014 Actian Corporation40
This is What You Call a Delayed Reaction
Confidential © 2014 Actian Corporation41
Time to Call in the Elephant
Confidential © 2014 Actian Corporation42
Actian Management Console
DATAPLATFORM
Actian Vortex
Elastic Data
Preparation
DataFlow
SQL Analytics
Vector in Hadoop
Library of Analytic Blueprints
Graph Analytics
SPARQLverse
Machine Learning & Predictive Analytics
DataFlow
ANALYTIC
APPS
Financial
Services
Health Care
Other
Verticals
SQL
Java,C/++,
Python
SOURCE
DATA
Databases / Marts
Warehouses
Cloud / SaaS
Applications
Structured &
Unstructured
Data
Enterprise
Applications
APPLICATIONDEV
Application Development and Tools
INFRASTRUCTURE
Deployment Options
@Customer
Actian Vortex: The Elephant’s Best Friend
powered by KNIME
Confidential © 2014 Actian Corporation43
Actian Vortex:
High Performance Analytics at Scale in Hadoop
Powered by KNIME
Confidential © 2014 Actian Corporation44
Stopping Fraud in Real Time
https://www.youtube.com/watch?v=u1QoHCpOUOU
Confidential © 2014 Actian Corporation45
Actian Vector in Hadoop: Built for Speed
Vector Processing
Single
Instruction
Multiple
Data
2nd Gen Column Store
Limit I/O
Efficient real time updates
Smarter Compression
Maximize throughput
Vectorized decompression
Exploiting Chip Cache
Process data on chip – not in RAM
1
2
3
4
Multi-core Parallelism
Maximize system resource
utilization…
Storage Indexes
Quickly identify candidate data
blocks
Minimize IO
5
6
Time/CyclestoProcess
Data Processed
DISK
RAM
CHIP
10GB2-3GB40-400MB
2-20150-250Millions
Confidential © 2014 Actian Corporation46
How Fast?
Confidential © 2014 Actian Corporation47
How Fast?
Confidential © 2014 Actian Corporation48
What to Look For in SQL in Hadoop
• Collaborative architecture
• Open access to Actian data
storage formats
• Support for other formats
• Hadoop distribution and
ecosystem application
support
No vendor lock-in
• Fastest data prep and
ingestion
• Fastest analytic engines
• Unbridled processing
power on data nodes in a
Hadoop cluster
• Full SQL support
• Extreme scalability
• Full security
• High Availability &
Disaster Recovery
Results you need when
you need them
Proven technology
advantages
Open Fast Enterprise-Grade
Confidential © 2014 Actian Corporation49
Free Actian Vortex Express Edition
Confidential © 2014 Actian Corporation50
www.actian.com
facebook.com/actiancorp
@actiancorp
Thank You
Download Actian Vortex Express
Free Forever
http://bigdata.actian.com/sql-in-hadoop
Question and Answer Session
(please submit questions)
Q & A
Dale Kim
Director of Industry Solutions
MapR
Paige Roberts
Hadoop & Analytics Evangelist
Actian
Please use the same URL you used to view today’s live event
for the archive event, plus we will be sending you a follow-up
email with that URL once the archive is posted!
Thank you for participating in
today’s roundtable web event
Just by attending this event the winner of the
$100 AmEx Gift Card is…….

Más contenido relacionado

La actualidad más candente

Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Flink Forward
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 

La actualidad más candente (20)

Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streamsSharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTestistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex Black
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
 
In Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging service
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 

Destacado

Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view upload
Jen Stirrup
 

Destacado (20)

Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief Introduction
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality Approach
 
Cloud computing by Bhavesh
Cloud computing by BhaveshCloud computing by Bhavesh
Cloud computing by Bhavesh
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view upload
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
Cloud Computing Architecture Primer
Cloud Computing Architecture PrimerCloud Computing Architecture Primer
Cloud Computing Architecture Primer
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software Architecture
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
How Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform EducationHow Universities Use Big Data to Transform Education
How Universities Use Big Data to Transform Education
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Hive - 1455: Cloud Storage
Hive - 1455: Cloud StorageHive - 1455: Cloud Storage
Hive - 1455: Cloud Storage
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 

Similar a Realtime analytics with_hadoop

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
EMC
 

Similar a Realtime analytics with_hadoop (20)

Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 

Más de Edgar Alejandro Villegas

Más de Edgar Alejandro Villegas (20)

What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016
 
Oracle big data discovery 994294
Oracle big data discovery   994294Oracle big data discovery   994294
Oracle big data discovery 994294
 
Actian Ingres10.2 Datasheet
Actian Ingres10.2 DatasheetActian Ingres10.2 Datasheet
Actian Ingres10.2 Datasheet
 
Actian Matrix Datasheet
Actian Matrix DatasheetActian Matrix Datasheet
Actian Matrix Datasheet
 
Actian Matrix Whitepaper
 Actian Matrix Whitepaper Actian Matrix Whitepaper
Actian Matrix Whitepaper
 
Actian Vector Whitepaper
 Actian Vector Whitepaper Actian Vector Whitepaper
Actian Vector Whitepaper
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone BeforeSQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
 
BITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETBITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEET
 
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - Actuate
 
Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 

Último

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 

Último (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 

Realtime analytics with_hadoop

  • 1. Welcome to Today’s DBTA Roundtable Discussion
  • 4. Speakers Dale Kim Director of Industry Solutions MapR Paige Roberts Hadoop & Analytics Evangelist Actian
  • 5. © 2015 MapR Technologies 5© 2015 MapR Technologies
  • 6. © 2015 MapR Technologies 6 Examples of Real-Time Images licensed under https://creativecommons.org/licenses/by/2.0/ Time image courtesy of Daniel Oldfield: https://www.flickr.com/photos/democlez/4424898002/ Air bag image courtesy of Mike Babcock: https://www.flickr.com/photos/mikebabcock/3098836311/ Tied to clock time Guaranteed response time For real-time analytics, let’s use: “no built-in delays” So what is real-time analytics with Hadoop?
  • 7. © 2015 MapR Technologies 7 Requirements for Real-Time Analytics with Hadoop REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 8. © 2015 MapR Technologies 8 Real-Time Data Definition: Provide immediate access to live Hadoop data for analysis Requirements: • Analysis uses live real-time data, not batch-copied data • Business can identify insights immediately (often through an automated process) • Critical for use cases such as ad targeting, personalization, network security analysis. • System avoids complexity of separate stream processing or messaging system for recent data
  • 9. © 2015 MapR Technologies 9 Real-Time Data in Hadoop For real-time: • Log files should be written directly into the cluster or synced across remote data centers • Operational applications should run in the same cluster, or in a separate cluster with real-time table replication • Immediate action should be taken • E.g., difference between fraud detection and fraud prevention • Difference between on-demand ad bid versus missing opportunity Existing challenges: • Log files must be batch uploaded periodically (e.g., every 30 minutes) • Due to HDFS limitations (not R/W, file-close semantics, no direct NFS) • Operational applications run on a separate cluster/stack • Data must be batch uploaded • With batch uploads, the window to respond is missed • Fraud, cyber attacks, matches, anomalies, etc.
  • 10. © 2015 MapR Technologies 10 Real-Time Applications Definition: Run operational applications in the cluster Requirements: • Address use cases beyond batch and interactive analysis • E.g., end-to-end real-time marketing and security applications directly on Hadoop • Eliminate separate Hadoop and NoSQL clusters/technology stacks for apps
  • 11. © 2015 MapR Technologies 11 Real-Time Applications in Hadoop For real-time: • Minimize impact of disrupting “housekeeping tasks to enable consistent, real-time operations • E.g., Compactions, Java garbage collection, “region splits” • Process live, operational data in Hadoop to avoid delays in batch copies Existing challenges: • Other in-Hadoop databases suffer disruptions, inhibiting real-time • E.g., Compactions can significantly slow down the system • Garbage collection leads to unpredictable system delays • Region splits are required to spread load, but impacts responsiveness and performance • Other in-Hadoop databases require separate clusters
  • 12. © 2015 MapR Technologies 12 Real-Time Querying Definition: Query any data as soon as it lands in the cluster (self-service) Requirements: • Analysts can explore data immediately, no waiting days/weeks for data prep by IT • IT is not burdened with repeated schema management and ETL requests
  • 13. © 2015 MapR Technologies 13 Real-Time Querying in Hadoop For real-time: • Minimize time to get started on data exploration • Leverage query engines that can query data in place – Eliminate IT dependencies for schema preparation Existing challenges: • New data that lands in the cluster necessarily requires IT-built schemas • Data exploration and analysis is contingent on IT backlog
  • 14. © 2015 MapR Technologies 14© 2015 MapR Technologies So How Are These Implemented?
  • 15. © 2014 MapR Technologies 15 Fraud model Recommendations table MapR Distribution for Hadoop Fraud investigator Interactive marketer Online transactions Fraud detection Personalized offers Clickstream analysis Fraud investigation tool Real-time Operational Applications Analytics Case Study: Global Financial Services Firm Analytics + Operational Applications on one platform
  • 16. © 2015 MapR Technologies 16 REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 17. © 2015 MapR Technologies 17 Faster/Secure NFS Access Redundant gateways for high availability CLIENT NODE(S) NFS Gateway NFS Gateway MapR data access options: 1. HDFS API – apps written for Hadoop 2. Standard read/write NFS (POSIX) – existing file system-based apps, no code changes 3. MapR POSIX Client – advanced read/write NFS requirements, includes: 1. Compression 2. Parallelism 3. Authentication 4. Encryption NFS client (included in OS) Native applications HDFS API (hadoop-core-*.jar) MapR POSIX Client MapR cluster Hadoop applications (e.g. “hadoop fs –put”) File-based apps/utils (e.g. cp, emacs) NFS client (included in OS) NFS Gateway 2 3 1
  • 18. © 2015 MapR Technologies 18 YCSB Benchmark MapR-DB 4.X Other NoSQL MapR-DB Increase Load (10, 100)* 27,097 14,753 1.8x Read (75, 150) 4,402 1,902 2.3x 50% read / 50% update (75, 100) 8,684 2,012 4.3x 95% read / 5% update (75, 100) 3,776 1,127 3.4x Scan (32, 32) 478 Client hangs N/A MapR-DB and “Other NoSQL” Throughput on YCSB Throughput performance in operations/second/node (higher is better) *Numbers in parentheses represent threads per client used in test runs for MapR-DB, other NoSQL, respectively
  • 19. © 2015 MapR Technologies 19 REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 20. © 2015 MapR Technologies 20 YCSB Mixed (50% Read / 50% Put) - Compare Read Latency MapR-DB HBase on other Hadoop distribution Lower is better
  • 21. © 2015 MapR Technologies 21 MapR-DB Table Replication Multi-master (aka, active/active) replication Active Read/Write End Users • Faster data access – minimize network latency on global data with local clusters • Reduced risk of data loss – real-time, bi-directional replication for synchronized data across active clusters • Application failover – upon any cluster failure, applications continue via redirection to another cluster
  • 22. © 2015 MapR Technologies 22 MapR-DB Real-Time Analytics Active clusters close to the end users, with real-time analytics at central cluster Active Read/Write MapR-DB cluster (London) MapR-DB cluster (New York) MapR-DB cluster (Singapore) MapR-DB/Hadoop cluster Hadoop analytics Operational and analytical workloads combined in a single deployment Operationally efficient, consolidated MapR cluster Database operations Hadoop analytics End Users
  • 23. © 2015 MapR Technologies 23 REAL-TIME DATA REAL-TIME APPLICATIONS REAL-TIME QUERIES
  • 24. © 2015 MapR Technologies 24 One SQL Interface for All Data Formats Unstructured data will account for more than 80% of the data collected by organizations ANSI SQL queries on rapidly evolving schemas UNSTRUCTURED DATA STRUCTURED DATA 2000 20101990 2020 TotalDataStored Existing SQL Engines Apache Drill Self-Service Data Exploration IT-Driven BI Self-Service BI SQL Options for Analytics
  • 25. © 2015 MapR Technologies 25 Traditional Approach Agility by Reducing Distance to Data Short analytic life cycles with no upfront schema creation and management Hadoop Data Schema Design Transforma tion Data Movement Users Hadoop Data Users New Business Questions Total Time to Value: Weeks to Months Total Time to Value: Minutes New Approach Data Preparation New Business Questions Drill enables the “As It Happens” business with instant SQL analytics on complex data FROM: TO:
  • 26. © 2015 MapR Technologies 26© 2015 MapR Technologies Summary
  • 27. © 2015 MapR Technologies 27 Batch Bottlenecks 1. Data streaming – real-time, but… 2. Further analysis is limited by batch loads into HDFS 3. Most databases must run in separate cluster, leading to batch copies 4. Append-only HDFS leads to heavy I/O for database defragmentation (“compactions”) 5. Data exploration requires IT intervention 1 2 3 4 5
  • 28. © 2015 MapR Technologies 28 Removing the Batch Limitations 1. Data streaming – real-time as before, and now…. 2. Further analysis is allowed with real-time loading 3. MapR-DB runs in Hadoop 4. With full read/write file system, defragmentation delays are eliminated 5. Data exploration performed in a self-service manner Real-Time Data Real-Time Applications Real-Time Querying 1 2 34 5
  • 29. © 2015 MapR Technologies 29 And Don’t Forget… • Real-time analytics doesn’t help you if the other key pieces aren’t in place • Include security – Interoperability with any authentication mechanism – Fine-grained access controls – Auditing capabilities beyond simple log files • Also include enterprise-grade reliability – An automated high availability configuration – Incremental mirroring/replication for disaster recovery – Consistent snapshots • Talk to us about what else you should consider
  • 30. © 2015 MapR Technologies 30 Q&A @mapr maprtech dalekim@mapr.com Engage with us! MapR maprtech mapr-technologies
  • 31. Confidential © 2014 Actian Corporation31 Real-Time Analytics Paige Roberts April, 2015 Hadoop & Analytics Evangelist Actian Hadoop & Analytics Center of Excellence
  • 32. Confidential © 2014 Actian Corporation32 Agenda About Actian Advantages of Data-Driven Business What Do I Mean By Real-Time? Real-Time Challenge: ATM Fraud How Actian Does It
  • 33. Confidential © 2014 Actian Corporation33 $140M Revenues + Profitable 10,000+ Customers Global Presence: 8 world-wide offices, 7x 24 multinational support model 33 “Fast becoming a big data powerhouse to challenge the market.” Forrester “Actian is now very powerfully positioned in the big data and analytics markets.” Bloor A Few Words About Actian
  • 34. Confidential © 2014 Actian Corporation34 Note: Percentage, 10 year CAGR McKinsey Report on Big Data. 8 9 5 5 -1 6 9 14 11 9 24 12 Revenue Big Data Other Companies Grocers Online Retailers Big Box Retailers Casinos Credit Cards Insurance EBITDA • Predictive • Real-time • All Data • New Insights • Accuracy 5 -1 1 2 -15 3 14 9 12 10 22 11 …. At the Expense of Those That Don’t Companies Using Big Data Strategically Outperform
  • 35. Confidential © 2014 Actian Corporation35 What Does Real-Time Mean to Us? Human comfortable interactivity Streaming data processing Sub-second response
  • 36. Confidential © 2014 Actian Corporation36 Real-Time Analytics – Many Applications Solar Power Company New customer targeting Smart meter data Sportswear Company Brand loyalty Wearable data Bank ATM Fraud Router data
  • 37. Confidential © 2014 Actian Corporation37 Large US Bank Needs Help • Multi-billion dollar American bank / financial holding company • Provides deposit, credit, trust, and investment services to a broad range of clients • Operates nearly 1,500 retail branches and more than 2,000 ATMs
  • 38. Confidential © 2014 Actian Corporation38 Numberoftimesfasterthan Impala Fraud Kept This Bank’s Execs Up at Night
  • 39. Confidential © 2014 Actian Corporation39 What is the Worst Gotcha About ATM Fraud? In spite of that, 67% of U.S. adults would switch to another institution after experiencing ATM fraud or a data breach. http://www.harrisinteractive.com/NewsRoom/HarrisPolls/tabid/447/ctl/ReadCustom%20Default/mid/1508/ArticleId/1515/Default.aspx In the majority of cases, banks are required to reimburse customer losses. https://www.tycois.com/insights-and-opinions/articles/atm-skimming-costs-banks
  • 40. Confidential © 2014 Actian Corporation40 This is What You Call a Delayed Reaction
  • 41. Confidential © 2014 Actian Corporation41 Time to Call in the Elephant
  • 42. Confidential © 2014 Actian Corporation42 Actian Management Console DATAPLATFORM Actian Vortex Elastic Data Preparation DataFlow SQL Analytics Vector in Hadoop Library of Analytic Blueprints Graph Analytics SPARQLverse Machine Learning & Predictive Analytics DataFlow ANALYTIC APPS Financial Services Health Care Other Verticals SQL Java,C/++, Python SOURCE DATA Databases / Marts Warehouses Cloud / SaaS Applications Structured & Unstructured Data Enterprise Applications APPLICATIONDEV Application Development and Tools INFRASTRUCTURE Deployment Options @Customer Actian Vortex: The Elephant’s Best Friend powered by KNIME
  • 43. Confidential © 2014 Actian Corporation43 Actian Vortex: High Performance Analytics at Scale in Hadoop Powered by KNIME
  • 44. Confidential © 2014 Actian Corporation44 Stopping Fraud in Real Time https://www.youtube.com/watch?v=u1QoHCpOUOU
  • 45. Confidential © 2014 Actian Corporation45 Actian Vector in Hadoop: Built for Speed Vector Processing Single Instruction Multiple Data 2nd Gen Column Store Limit I/O Efficient real time updates Smarter Compression Maximize throughput Vectorized decompression Exploiting Chip Cache Process data on chip – not in RAM 1 2 3 4 Multi-core Parallelism Maximize system resource utilization… Storage Indexes Quickly identify candidate data blocks Minimize IO 5 6 Time/CyclestoProcess Data Processed DISK RAM CHIP 10GB2-3GB40-400MB 2-20150-250Millions
  • 46. Confidential © 2014 Actian Corporation46 How Fast?
  • 47. Confidential © 2014 Actian Corporation47 How Fast?
  • 48. Confidential © 2014 Actian Corporation48 What to Look For in SQL in Hadoop • Collaborative architecture • Open access to Actian data storage formats • Support for other formats • Hadoop distribution and ecosystem application support No vendor lock-in • Fastest data prep and ingestion • Fastest analytic engines • Unbridled processing power on data nodes in a Hadoop cluster • Full SQL support • Extreme scalability • Full security • High Availability & Disaster Recovery Results you need when you need them Proven technology advantages Open Fast Enterprise-Grade
  • 49. Confidential © 2014 Actian Corporation49 Free Actian Vortex Express Edition
  • 50. Confidential © 2014 Actian Corporation50 www.actian.com facebook.com/actiancorp @actiancorp Thank You Download Actian Vortex Express Free Forever http://bigdata.actian.com/sql-in-hadoop
  • 51. Question and Answer Session (please submit questions)
  • 52. Q & A Dale Kim Director of Industry Solutions MapR Paige Roberts Hadoop & Analytics Evangelist Actian
  • 53. Please use the same URL you used to view today’s live event for the archive event, plus we will be sending you a follow-up email with that URL once the archive is posted!
  • 54. Thank you for participating in today’s roundtable web event Just by attending this event the winner of the $100 AmEx Gift Card is…….