SlideShare una empresa de Scribd logo
1 de 47
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
How	
  to	
  integrate	
  Hadoop	
  
with	
  your	
  NoSQL	
  database?
Tugdual	
  “Tug”	
  Grall
Technical	
  Evangelist
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
About	
  Me	
  
• Tugdual	
  “Tug”	
  Grall
­ Couchbase
• Technical	
  Evangelist
­ eXo
• CTO
­ Oracle
• Developer/Product	
  Manager
• Mainly	
  Java/SOA
­ Developer	
  in	
  consul@ng	
  firms
• Web
• @tgrall
• hAp://blog.grallandco.com
• tgrall
• NantesJUG	
  co-­‐founder
• Pet	
  Project	
  :
• hAp://www.resultri.com
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013 4
0
0.50
1.00
1.50
2.00
2000 2006 2011
Source:	
  IDC	
  2011	
  Digital	
  Universe	
  Study	
  (hKp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm)
Trillions	
  of	
  Gigabytes	
  (ZeKabytes) Big	
  Data
High	
  Data	
  Variety	
  and	
  Velocity
Unstructured	
  and	
  Semi-­‐
Structured	
  Data
Structured	
  Data
Text,	
  Log	
  Files,	
  Click	
  
Streams,	
  Blogs,	
  
Tweets,	
  Audio,	
  
Video,	
  etc.
More	
  Flexible	
  Data	
  Model	
  Required
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
<50%?
2027
95%
RelaOonal	
  Technology
$30B	
  Database	
  Market
	
  Being	
  Disrupted
2013
All	
  new	
  database	
  growth	
  will	
  be	
  NoSQL
RelaOonal	
  
Technology
RelaOonal	
  
Technology
RelaOonal	
  Technology
NoSQL
Technology
Other
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Cloudera
Hortonworks
Opera@onal	
  vs.	
  Analy@c	
  Databases
Couchbase
Mongo
AnalyOc
Databases
Get	
  insights	
  from	
  
data
Real-­‐Ome,	
  
InteracOve	
  Databases
Fast	
  access	
  
to	
  data
NoSQL
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Lack	
  of	
  flexibility/
rigid	
  schemas
Inability	
  to	
  scale	
  
out	
  data
Performance	
  challenges Cost All	
  of	
  these Other
49%
35%
29%
16%
12%
11%
Source:	
  Couchbase	
  Survey,	
  December	
  2011,	
  n	
  =	
  1351.
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Hadoop
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
What	
  is	
  Hadoop?
• Highly	
  scalable
• Unstructured	
  data
• Open	
  source
• Big	
  Data	
  OperaOng	
  System
• Changing	
  the	
  World	
  One	
  Petabyte	
  at	
  a	
  Time
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
What	
  is	
  Hadoop?
• Simplest	
  unit	
  of	
  compute	
  and	
  storage
CPU
Disks Application
Data
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
What	
  is	
  Hadoop?
• And	
  when	
  it	
  grows?
Application
Data
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
What	
  is	
  Hadoop?
• And	
  when	
  it	
  grows	
  more?
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
What	
  is	
  Hadoop?
• NoSQL	
  to	
  the	
  rescue
Application
Data
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
What	
  is	
  Hadoop?
• Hadoop	
  is	
  a	
  different	
  paradigm
Application
Data
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Hadoop	
  and	
  NoSQL
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
events
profiles,	
  campaigns
profiles,	
  real	
  @me	
  campaign	
  
sta@s@cs
40	
  milliseconds	
  to	
  respond	
  with	
  
the	
  decision.
2
3
1
Ad	
  and	
  offer	
  targeOng
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Logs
Couchbase Server Cluster
Hadoop Cluster
sqoop import
Logs
Logs
Logs
Logs
Ad Targeting
Platform
sqoop export
flume
flow
Moving	
  Parts
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
events&
user&profiles&
make&&
recommenda2ons&
2&
3&
1&
Content
Oriented Site
Legacy Relational
Database
Content	
  &	
  RecommendaOon	
  TargeOng
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Logs
Couchbase Server Cluster
Hadoop Cluster
sqoop import
Logs
Logs
Logs
Logs
Content Driven
Web Site
sqoop export
Original RDBMS
In order to keep up with changing needs on
richer, more targeted content that is delivered
to larger and larger audiences very quickly,
data behind content driven sites is shifting to
Couchbase.
Hadoop excels at complex analytics which
may involve multiple steps of processing
which incorporate a number of different data
sources.
sqoop import
flume
flow
Moving	
  Parts
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Sqoop is a tool designed to transfer data between Hadoop and relational
databases.
You can use Sqoop to import data from a relational database management
system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File
System (HDFS), transform the data in Hadoop MapReduce, and then
export the data back into an RDBMS.
sqoop.apache.org
What	
  is	
  Sqoop?
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
• Traditional ETL
Application DataData
T
What	
  is	
  Sqoop?
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
• A different paradigm
Data
Applicatio
n
Data
What	
  is	
  Sqoop?
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
• A very scalable different paradigm
Data
Application
Data
Application
Data
Application
Data
What	
  is	
  Sqoop?
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
• Where did the Transform go?
Application
Data
TTT TTT TTT TTT
What	
  is	
  Sqoop?
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
What	
  is	
  Sqoop?
• Sqoop	
  “SQL-­‐Hadoop”
­ Default	
  connec@on	
  is	
  via	
  JDBC
• Lots	
  of	
  custom	
  connectors
­ Couchbase,	
  VoltDB,	
  Ver@ca
­ Teradata,	
  Netezza
­ Oracle,	
  MySQL,	
  Postgres
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Sqoop	
  :	
  Import
sqoop import --connect jdbc:mysql://rdbms1.demo.com/CRM
--table customers
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Sqoop	
  :	
  Export
sqoop export --connect jdbc:mysql://rdbms1.demo.com/ANALYTICS
--table sales
--export-dir /user/hive/warehouse/zip_profits
--input-fields-terminated-by '0001'
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Sqoop	
  :	
  Import
sqoop import –-connect http://localhost:8091/pools
--table DUMP
Monday, June 10, 13
MapReduceJob
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Sqoop	
  :	
  Import
HDFS
Map
HDFS
Map
HDFS
Map
Sqoop	
  
Client
Metadata
Launches
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Sqoop	
  :	
  Export
sqoop export --connect http://localhost:8091/pools
--table DUMP
--export-dir /user/hive/profiles/recommendation
--username social
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Sqoop	
  :	
  Export
MapReduceJob
HDFS
Map
HDFS
Map
HDFS
Map
Sqoop	
  
Client
Metadata
Launches
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
DemonstraOon
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Couchbase
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Easy	
  
Scalability
Consistent	
  High	
  
Performance
Always	
  On	
  
24x365
Grow	
  cluster	
  without	
  applica@on	
  
changes,	
  without	
  down@me	
  with	
  
a	
  single	
  click
Consistent	
  sub-­‐millisecond	
  
read	
  and	
  write	
  response	
  @mes	
  
with	
  consistent	
  high	
  throughput
No	
  down@me	
  for	
  so`ware	
  
upgrades,	
  hardware	
  maintenance,	
  
etc.
Flexible	
  Data	
  
Model
JSON	
  document	
  model	
  with	
  no	
  
fixed	
  schema.
JSON
JSON
JSON
JSONJSON
PERFORMANCE
Couchbase	
  Server	
  Core	
  Principles
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Couchbase	
  Handles	
  Real	
  World	
  Scale
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Couchbase	
  Server	
  2.0
Heartbeat
Process	
  monitor
Global	
  singleton	
  supervisor
ConfiguraQon	
  manager
on	
  each	
  node
Rebalance	
  orchestrator
Node	
  health	
  monitor
one	
  per	
  cluster
vBucket	
  state	
  and	
  replicaQon	
  manager
hdp
REST	
  management	
  API/Web	
  UI
HTTP
8091
Erlang	
  port	
  mapper
4369
Distributed	
  Erlang
21100	
  -­‐	
  21199
Erlang/OTP
storage	
  interface
Couchbase	
  EP	
  Engine
11210
Memcapable	
  	
  2.0
Moxi
11211
Memcapable	
  	
  1.0
Memcached
New	
  Persistence	
  Layer
8092
Query	
  APIQuery	
  Engine
Data	
  Manager Cluster	
  Manager
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Couchbase	
  Server	
  2.0
Heartbeat
Process	
  monitor
Global	
  singleton	
  supervisor
ConfiguraQon	
  manager
on	
  each	
  node
Rebalance	
  orchestrator
Node	
  health	
  monitor
one	
  per	
  cluster
vBucket	
  state	
  and	
  replicaQon	
  manager
hdp
REST	
  management	
  API/Web	
  UI
HTTP
8091
Erlang	
  port	
  mapper
4369
Distributed	
  Erlang
21100	
  -­‐	
  21199
Erlang/OTP
storage	
  interface
Couchbase	
  EP	
  Engine
11210
Memcapable	
  	
  2.0
Moxi
11211
Memcapable	
  	
  1.0
Memcached
New	
  Persistence	
  Layer
8092
Query	
  APIQuery	
  Engine
Monday, June 10, 13
The	
  Classic	
  Order	
  Entry	
  Structure
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013 39
hKp://mar@nfowler.com/bliki/AggregateOrientedDatabase.html
Rela%onal	
  databases	
  were	
  not	
  designed	
  with	
  clusters	
  in	
  mind,	
  which	
  is	
  why	
  people	
  
have	
  cast	
  around	
  for	
  an	
  alterna%ve.	
  Storing	
  aggregates	
  as	
  fundamental	
  units	
  makes	
  
a	
  lot	
  of	
  sense	
  for	
  running	
  on	
  a	
  cluster.	
  
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013 40
o::1001
{
uid:	
  “ji22jd”,
customer:	
  “Ann”,
line_items:	
  [	
  
{	
  sku:	
  0321293533,	
  quan:	
  3,	
  	
  unit_price:	
  48.0	
  },
{	
  sku:	
  0321601912,	
  quan:	
  1,	
  unit_price:	
  39.0	
  },
{	
  sku:	
  0131495054,	
  quan:	
  1,	
  unit_price:	
  51.0	
  }	
  
],
payment:	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  type:	
  “Amex”,
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  expiry:	
  “04/2001”,	
  
last5:	
  12345
}
• Easy	
  to	
  distribute	
  data
• Makes	
  sense	
  to	
  applicaQon	
  programmers
Aggregate	
  by	
  Comparison
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
COUCHBASE	
  SERVER	
  	
  CLUSTER
• Docs	
  distributed	
  evenly	
  across	
  
servers	
  
• Each	
  server	
  stores	
  both	
  acOve	
  and	
  
replica	
  docs
Only	
  one	
  server	
  acQve	
  at	
  a	
  Qme
• Client	
  library	
  provides	
  app	
  with	
  
simple	
  interface	
  to	
  database
• Cluster	
  map	
  provides	
  map	
  
to	
  which	
  server	
  doc	
  is	
  on
App	
  never	
  needs	
  to	
  know
• App	
  reads,	
  writes,	
  updates	
  docs
• MulOple	
  app	
  servers	
  can	
  access	
  same	
  
document	
  at	
  same	
  Ome
User	
  Configured	
  Replica	
  Count	
  =	
  1
READ/WRITE/UPDATE
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  1
ACTIVE
Doc	
  4
Doc	
  7
Doc
Doc
Doc
SERVER	
  2
Doc	
  8
ACTIVE
Doc	
  1
Doc	
  2
Doc
Doc
Doc
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
REPLICA
Doc	
  6
Doc	
  3
Doc	
  2
Doc
Doc
Doc
REPLICA
Doc	
  7
Doc	
  9
Doc	
  5
Doc
Doc
Doc
SERVER	
  3
Doc	
  6
APP	
  SERVER	
  1
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
APP	
  SERVER	
  2
Doc	
  9
Basic	
  OperaOons
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
COUCHBASE	
  SERVER	
  	
  CLUSTER
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  1
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
APP	
  SERVER	
  1
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
COUCHBASE	
  Client	
  Library
CLUSTER	
  MAP
APP	
  SERVER	
  2
Doc	
  9
• Indexing	
  work	
  is	
  distributed	
  amongst	
  
nodes
• Large	
  data	
  set	
  possible
• Parallelize	
  the	
  effort
• Each	
  node	
  has	
  index	
  for	
  data	
  stored	
  on	
  it
• Queries	
  combine	
  the	
  results	
  from	
  required	
  
nodes
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  2
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
Doc	
  9
ACTIVE
Doc	
  5
Doc	
  2
Doc
Doc
Doc
SERVER	
  3
REPLICA
Doc	
  4
Doc	
  1
Doc	
  8
Doc
Doc
Doc
Doc	
  9
Query
Indexing
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
DemonstraOon
Monday, June 10, 13
≠
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Map	
  Reduce	
  ...
• Deal	
  with	
  “Big	
  Data”
• “More”	
  is	
  beder	
  than	
  “Faster”
• Batch	
  Oriented
• Usually	
  used	
  to	
  “extract/transform”	
  data
• Fully	
  distributed
­ Map,	
  Shuffle,	
  Reduce
• Distributed	
  
• Executed	
  where	
  the	
  document	
  is
• Deal	
  with	
  “indexing”	
  data	
  
• As	
  fast	
  as	
  possible
• Use	
  to	
  query	
  the	
  data	
  in	
  the	
  Database
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Conclusion
• Big	
  Data	
  and	
  Big	
  Users	
  working	
  together
• Use	
  Hadoop	
  to	
  store	
  “everything”
­ Batch	
  oriented
­ Complex	
  data	
  processing
• MapReduce
• Expose	
  a	
  subset	
  of	
  the	
  dataset	
  to	
  your	
  applicaOon
­ Real	
  @me	
  analy@cs
­ Low	
  latency
­ Simple	
  data	
  interac@ons	
  and	
  queries
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Q&A
We’re	
  Hiring!	
  couchbase.com/careers
@tgrall
tug@couchbase.com
Monday, June 10, 13
Goto	
  Night	
  CPH,	
  June	
  6th	
  2013
Q&A
Monday, June 10, 13

Más contenido relacionado

La actualidad más candente

Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysisData Science London
 
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...DataStax Academy
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaJazz Yao-Tsung Wang
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big dataJ Singh
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...Big Data Week
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona KeynoteTravis Oliphant
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015hadooparchbook
 
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Zekeriya Besiroglu
 
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1Donghan Kim
 
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata LondonHadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata Londonhadooparchbook
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsDataWorks Summit
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsJim Dowling
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applicationshadooparchbook
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and FutureMasahiko Sawada
 
Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014hadooparchbook
 

La actualidad más candente (20)

Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysis
 
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
 
HDP2 and YARN operations point
HDP2 and YARN operations pointHDP2 and YARN operations point
HDP2 and YARN operations point
 
Full Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and GrafanaFull Stack Monitoring with Prometheus and Grafana
Full Stack Monitoring with Prometheus and Grafana
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
 
Hadoop Jungle
Hadoop JungleHadoop Jungle
Hadoop Jungle
 
PyData Barcelona Keynote
PyData Barcelona KeynotePyData Barcelona Keynote
PyData Barcelona Keynote
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
 
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
 
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
 
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata LondonHadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and AnalyticsA Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
 
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUsScaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
Scaling out Tensorflow-as-a-Service on Spark and Commodity GPUs
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
 
Introduction to HCFS
Introduction to HCFSIntroduction to HCFS
Introduction to HCFS
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and Future
 
Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014Application Architectures with Hadoop - Big Data TechCon SF 2014
Application Architectures with Hadoop - Big Data TechCon SF 2014
 

Similar a HADOOP AND NOSQL INTEGRATION

Proud to be polyglot!
Proud to be polyglot!Proud to be polyglot!
Proud to be polyglot!NLJUG
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseTugdual Grall
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭台灣資料科學年會
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Hortonworks
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioBig Data Aplications Meetup
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshopFang Mac
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaDataStax Academy
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series dataAnuj Sahni
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQLEDB
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionJongwook Woo
 

Similar a HADOOP AND NOSQL INTEGRATION (20)

Proud to be polyglot!
Proud to be polyglot!Proud to be polyglot!
Proud to be polyglot!
 
Introduction to NoSQL with Couchbase
Introduction to NoSQL with CouchbaseIntroduction to NoSQL with Couchbase
Introduction to NoSQL with Couchbase
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Con8862 no sql, json and time series data
Con8862   no sql, json and time series dataCon8862   no sql, json and time series data
Con8862 no sql, json and time series data
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 

Más de Tugdual Grall

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkTugdual Grall
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Tugdual Grall
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopTugdual Grall
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Tugdual Grall
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglotTugdual Grall
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignTugdual Grall
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Tugdual Grall
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDBTugdual Grall
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB ApplicationTugdual Grall
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iotTugdual Grall
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0Tugdual Grall
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLTugdual Grall
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataTugdual Grall
 
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0Tugdual Grall
 

Más de Tugdual Grall (20)

Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Introduction to Streaming with Apache Flink
Introduction to Streaming with Apache FlinkIntroduction to Streaming with Apache Flink
Introduction to Streaming with Apache Flink
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015Proud to be Polyglot - Riviera Dev 2015
Proud to be Polyglot - Riviera Dev 2015
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi WorkshopIntroduction to NoSQL with MongoDB - SQLi Workshop
Introduction to NoSQL with MongoDB - SQLi Workshop
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
MongoDB and Hadoop
MongoDB and HadoopMongoDB and Hadoop
MongoDB and Hadoop
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Drop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema DesignDrop your table ! MongoDB Schema Design
Drop your table ! MongoDB Schema Design
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
 
Some cool features of MongoDB
Some cool features of MongoDBSome cool features of MongoDB
Some cool features of MongoDB
 
Building Your First MongoDB Application
Building Your First MongoDB ApplicationBuilding Your First MongoDB Application
Building Your First MongoDB Application
 
Opensourceday 2014-iot
Opensourceday 2014-iotOpensourceday 2014-iot
Opensourceday 2014-iot
 
Neotys conference
Neotys conferenceNeotys conference
Neotys conference
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
 
Big Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQLBig Data Paris : Hadoop and NoSQL
Big Data Paris : Hadoop and NoSQL
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big Data
 
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
FOSDEM 2013 : Getting Started with Couchhbase Server 2.0
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

HADOOP AND NOSQL INTEGRATION

  • 2. Goto  Night  CPH,  June  6th  2013 How  to  integrate  Hadoop   with  your  NoSQL  database? Tugdual  “Tug”  Grall Technical  Evangelist Monday, June 10, 13
  • 3. Goto  Night  CPH,  June  6th  2013 About  Me   • Tugdual  “Tug”  Grall ­ Couchbase • Technical  Evangelist ­ eXo • CTO ­ Oracle • Developer/Product  Manager • Mainly  Java/SOA ­ Developer  in  consul@ng  firms • Web • @tgrall • hAp://blog.grallandco.com • tgrall • NantesJUG  co-­‐founder • Pet  Project  : • hAp://www.resultri.com Monday, June 10, 13
  • 4. Goto  Night  CPH,  June  6th  2013 4 0 0.50 1.00 1.50 2.00 2000 2006 2011 Source:  IDC  2011  Digital  Universe  Study  (hKp://www.emc.com/collateral/demos/microsites/emc-­‐digital-­‐universe-­‐2011/index.htm) Trillions  of  Gigabytes  (ZeKabytes) Big  Data High  Data  Variety  and  Velocity Unstructured  and  Semi-­‐ Structured  Data Structured  Data Text,  Log  Files,  Click   Streams,  Blogs,   Tweets,  Audio,   Video,  etc. More  Flexible  Data  Model  Required Monday, June 10, 13
  • 5. Goto  Night  CPH,  June  6th  2013 <50%? 2027 95% RelaOonal  Technology $30B  Database  Market  Being  Disrupted 2013 All  new  database  growth  will  be  NoSQL RelaOonal   Technology RelaOonal   Technology RelaOonal  Technology NoSQL Technology Other Monday, June 10, 13
  • 6. Goto  Night  CPH,  June  6th  2013 Cloudera Hortonworks Opera@onal  vs.  Analy@c  Databases Couchbase Mongo AnalyOc Databases Get  insights  from   data Real-­‐Ome,   InteracOve  Databases Fast  access   to  data NoSQL Monday, June 10, 13
  • 7. Goto  Night  CPH,  June  6th  2013 Lack  of  flexibility/ rigid  schemas Inability  to  scale   out  data Performance  challenges Cost All  of  these Other 49% 35% 29% 16% 12% 11% Source:  Couchbase  Survey,  December  2011,  n  =  1351. Monday, June 10, 13
  • 8. Goto  Night  CPH,  June  6th  2013 Hadoop Monday, June 10, 13
  • 9. Goto  Night  CPH,  June  6th  2013 What  is  Hadoop? • Highly  scalable • Unstructured  data • Open  source • Big  Data  OperaOng  System • Changing  the  World  One  Petabyte  at  a  Time Monday, June 10, 13
  • 10. Goto  Night  CPH,  June  6th  2013 What  is  Hadoop? • Simplest  unit  of  compute  and  storage CPU Disks Application Data Monday, June 10, 13
  • 11. Goto  Night  CPH,  June  6th  2013 What  is  Hadoop? • And  when  it  grows? Application Data Monday, June 10, 13
  • 12. Goto  Night  CPH,  June  6th  2013 What  is  Hadoop? • And  when  it  grows  more? Monday, June 10, 13
  • 13. Goto  Night  CPH,  June  6th  2013 What  is  Hadoop? • NoSQL  to  the  rescue Application Data Monday, June 10, 13
  • 14. Goto  Night  CPH,  June  6th  2013 What  is  Hadoop? • Hadoop  is  a  different  paradigm Application Data Monday, June 10, 13
  • 15. Goto  Night  CPH,  June  6th  2013 Monday, June 10, 13
  • 16. Goto  Night  CPH,  June  6th  2013 Hadoop  and  NoSQL Monday, June 10, 13
  • 17. Goto  Night  CPH,  June  6th  2013 events profiles,  campaigns profiles,  real  @me  campaign   sta@s@cs 40  milliseconds  to  respond  with   the  decision. 2 3 1 Ad  and  offer  targeOng Monday, June 10, 13
  • 18. Goto  Night  CPH,  June  6th  2013 Logs Couchbase Server Cluster Hadoop Cluster sqoop import Logs Logs Logs Logs Ad Targeting Platform sqoop export flume flow Moving  Parts Monday, June 10, 13
  • 19. Goto  Night  CPH,  June  6th  2013 events& user&profiles& make&& recommenda2ons& 2& 3& 1& Content Oriented Site Legacy Relational Database Content  &  RecommendaOon  TargeOng Monday, June 10, 13
  • 20. Goto  Night  CPH,  June  6th  2013 Logs Couchbase Server Cluster Hadoop Cluster sqoop import Logs Logs Logs Logs Content Driven Web Site sqoop export Original RDBMS In order to keep up with changing needs on richer, more targeted content that is delivered to larger and larger audiences very quickly, data behind content driven sites is shifting to Couchbase. Hadoop excels at complex analytics which may involve multiple steps of processing which incorporate a number of different data sources. sqoop import flume flow Moving  Parts Monday, June 10, 13
  • 21. Goto  Night  CPH,  June  6th  2013 Sqoop is a tool designed to transfer data between Hadoop and relational databases. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. sqoop.apache.org What  is  Sqoop? Monday, June 10, 13
  • 22. Goto  Night  CPH,  June  6th  2013 • Traditional ETL Application DataData T What  is  Sqoop? Monday, June 10, 13
  • 23. Goto  Night  CPH,  June  6th  2013 • A different paradigm Data Applicatio n Data What  is  Sqoop? Monday, June 10, 13
  • 24. Goto  Night  CPH,  June  6th  2013 • A very scalable different paradigm Data Application Data Application Data Application Data What  is  Sqoop? Monday, June 10, 13
  • 25. Goto  Night  CPH,  June  6th  2013 • Where did the Transform go? Application Data TTT TTT TTT TTT What  is  Sqoop? Monday, June 10, 13
  • 26. Goto  Night  CPH,  June  6th  2013 What  is  Sqoop? • Sqoop  “SQL-­‐Hadoop” ­ Default  connec@on  is  via  JDBC • Lots  of  custom  connectors ­ Couchbase,  VoltDB,  Ver@ca ­ Teradata,  Netezza ­ Oracle,  MySQL,  Postgres Monday, June 10, 13
  • 27. Goto  Night  CPH,  June  6th  2013 Sqoop  :  Import sqoop import --connect jdbc:mysql://rdbms1.demo.com/CRM --table customers Monday, June 10, 13
  • 28. Goto  Night  CPH,  June  6th  2013 Sqoop  :  Export sqoop export --connect jdbc:mysql://rdbms1.demo.com/ANALYTICS --table sales --export-dir /user/hive/warehouse/zip_profits --input-fields-terminated-by '0001' Monday, June 10, 13
  • 29. Goto  Night  CPH,  June  6th  2013 Sqoop  :  Import sqoop import –-connect http://localhost:8091/pools --table DUMP Monday, June 10, 13
  • 30. MapReduceJob Goto  Night  CPH,  June  6th  2013 Sqoop  :  Import HDFS Map HDFS Map HDFS Map Sqoop   Client Metadata Launches Monday, June 10, 13
  • 31. Goto  Night  CPH,  June  6th  2013 Sqoop  :  Export sqoop export --connect http://localhost:8091/pools --table DUMP --export-dir /user/hive/profiles/recommendation --username social Monday, June 10, 13
  • 32. Goto  Night  CPH,  June  6th  2013 Sqoop  :  Export MapReduceJob HDFS Map HDFS Map HDFS Map Sqoop   Client Metadata Launches Monday, June 10, 13
  • 33. Goto  Night  CPH,  June  6th  2013 DemonstraOon Monday, June 10, 13
  • 34. Goto  Night  CPH,  June  6th  2013 Couchbase Monday, June 10, 13
  • 35. Goto  Night  CPH,  June  6th  2013 Easy   Scalability Consistent  High   Performance Always  On   24x365 Grow  cluster  without  applica@on   changes,  without  down@me  with   a  single  click Consistent  sub-­‐millisecond   read  and  write  response  @mes   with  consistent  high  throughput No  down@me  for  so`ware   upgrades,  hardware  maintenance,   etc. Flexible  Data   Model JSON  document  model  with  no   fixed  schema. JSON JSON JSON JSONJSON PERFORMANCE Couchbase  Server  Core  Principles Monday, June 10, 13
  • 36. Goto  Night  CPH,  June  6th  2013 Couchbase  Handles  Real  World  Scale Monday, June 10, 13
  • 37. Goto  Night  CPH,  June  6th  2013 Couchbase  Server  2.0 Heartbeat Process  monitor Global  singleton  supervisor ConfiguraQon  manager on  each  node Rebalance  orchestrator Node  health  monitor one  per  cluster vBucket  state  and  replicaQon  manager hdp REST  management  API/Web  UI HTTP 8091 Erlang  port  mapper 4369 Distributed  Erlang 21100  -­‐  21199 Erlang/OTP storage  interface Couchbase  EP  Engine 11210 Memcapable    2.0 Moxi 11211 Memcapable    1.0 Memcached New  Persistence  Layer 8092 Query  APIQuery  Engine Data  Manager Cluster  Manager Monday, June 10, 13
  • 38. Goto  Night  CPH,  June  6th  2013 Couchbase  Server  2.0 Heartbeat Process  monitor Global  singleton  supervisor ConfiguraQon  manager on  each  node Rebalance  orchestrator Node  health  monitor one  per  cluster vBucket  state  and  replicaQon  manager hdp REST  management  API/Web  UI HTTP 8091 Erlang  port  mapper 4369 Distributed  Erlang 21100  -­‐  21199 Erlang/OTP storage  interface Couchbase  EP  Engine 11210 Memcapable    2.0 Moxi 11211 Memcapable    1.0 Memcached New  Persistence  Layer 8092 Query  APIQuery  Engine Monday, June 10, 13
  • 39. The  Classic  Order  Entry  Structure Goto  Night  CPH,  June  6th  2013 39 hKp://mar@nfowler.com/bliki/AggregateOrientedDatabase.html Rela%onal  databases  were  not  designed  with  clusters  in  mind,  which  is  why  people   have  cast  around  for  an  alterna%ve.  Storing  aggregates  as  fundamental  units  makes   a  lot  of  sense  for  running  on  a  cluster.   Monday, June 10, 13
  • 40. Goto  Night  CPH,  June  6th  2013 40 o::1001 { uid:  “ji22jd”, customer:  “Ann”, line_items:  [   {  sku:  0321293533,  quan:  3,    unit_price:  48.0  }, {  sku:  0321601912,  quan:  1,  unit_price:  39.0  }, {  sku:  0131495054,  quan:  1,  unit_price:  51.0  }   ], payment:  {                      type:  “Amex”,                    expiry:  “04/2001”,   last5:  12345 } • Easy  to  distribute  data • Makes  sense  to  applicaQon  programmers Aggregate  by  Comparison Monday, June 10, 13
  • 41. Goto  Night  CPH,  June  6th  2013 COUCHBASE  SERVER    CLUSTER • Docs  distributed  evenly  across   servers   • Each  server  stores  both  acOve  and   replica  docs Only  one  server  acQve  at  a  Qme • Client  library  provides  app  with   simple  interface  to  database • Cluster  map  provides  map   to  which  server  doc  is  on App  never  needs  to  know • App  reads,  writes,  updates  docs • MulOple  app  servers  can  access  same   document  at  same  Ome User  Configured  Replica  Count  =  1 READ/WRITE/UPDATE ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  1 ACTIVE Doc  4 Doc  7 Doc Doc Doc SERVER  2 Doc  8 ACTIVE Doc  1 Doc  2 Doc Doc Doc REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc REPLICA Doc  6 Doc  3 Doc  2 Doc Doc Doc REPLICA Doc  7 Doc  9 Doc  5 Doc Doc Doc SERVER  3 Doc  6 APP  SERVER  1 COUCHBASE  Client  Library CLUSTER  MAP COUCHBASE  Client  Library CLUSTER  MAP APP  SERVER  2 Doc  9 Basic  OperaOons Monday, June 10, 13
  • 42. Goto  Night  CPH,  June  6th  2013 COUCHBASE  SERVER    CLUSTER ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  1 REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc APP  SERVER  1 COUCHBASE  Client  Library CLUSTER  MAP COUCHBASE  Client  Library CLUSTER  MAP APP  SERVER  2 Doc  9 • Indexing  work  is  distributed  amongst   nodes • Large  data  set  possible • Parallelize  the  effort • Each  node  has  index  for  data  stored  on  it • Queries  combine  the  results  from  required   nodes ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  2 REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc Doc  9 ACTIVE Doc  5 Doc  2 Doc Doc Doc SERVER  3 REPLICA Doc  4 Doc  1 Doc  8 Doc Doc Doc Doc  9 Query Indexing Monday, June 10, 13
  • 43. Goto  Night  CPH,  June  6th  2013 DemonstraOon Monday, June 10, 13
  • 44. ≠ Goto  Night  CPH,  June  6th  2013 Map  Reduce  ... • Deal  with  “Big  Data” • “More”  is  beder  than  “Faster” • Batch  Oriented • Usually  used  to  “extract/transform”  data • Fully  distributed ­ Map,  Shuffle,  Reduce • Distributed   • Executed  where  the  document  is • Deal  with  “indexing”  data   • As  fast  as  possible • Use  to  query  the  data  in  the  Database Monday, June 10, 13
  • 45. Goto  Night  CPH,  June  6th  2013 Conclusion • Big  Data  and  Big  Users  working  together • Use  Hadoop  to  store  “everything” ­ Batch  oriented ­ Complex  data  processing • MapReduce • Expose  a  subset  of  the  dataset  to  your  applicaOon ­ Real  @me  analy@cs ­ Low  latency ­ Simple  data  interac@ons  and  queries Monday, June 10, 13
  • 46. Goto  Night  CPH,  June  6th  2013 Q&A We’re  Hiring!  couchbase.com/careers @tgrall tug@couchbase.com Monday, June 10, 13
  • 47. Goto  Night  CPH,  June  6th  2013 Q&A Monday, June 10, 13