SlideShare una empresa de Scribd logo
1 de 78
Descargar para leer sin conexión
Big Data beyond Hadoop
How to integrate ALL your Data

Kai Wähner
Principal Consultant at Talend
@KaiWaehner
www.kai-waehner.de
Kai Wähner
Main Tasks
Requirements Engineering
Enterprise Architecture Management
Business Process Management
Architecture and Development of Applications
Service-oriented Architecture
Integration of Legacy Applications
Cloud Computing
Big Data

Consulting
Developing
Coaching
Speaking
Writing
© Talend 2013

Contact
Email: kontakt@kai-waehner.de
Blog: www.kai-waehner.de/blog
Twitter: @KaiWaehner
Social Networks: Xing, LinkedIn

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Key messages

You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Agenda
• Big data paradigm shift
• Challenges for integrating big data
• Choosing Hadoop for big data integration
• Integration with an open source framework
• Integration with an open source suite
• Custom big data connectors

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Agenda
• Big data paradigm shift
• Challenges for integrating big data
• Choosing Hadoop for big data integration
• Integration with an open source framework
• Integration with an open source suite
• Custom big data connectors

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Why should you care about big data?

“If you can't measure it,
you can't manage it.”
William Edwards Deming
(1900 –1993)
American statistician, professor,
author, lecturer and consultant

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Why should you care about big data?

„Silence the HiPPOs“ (highest-paid person‘s opinion)
 Being able to interpret unimaginable large data
stream, the gut feeling is no longer justified!


© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
What is big data? The Vs of big data

Volume
(terabytes,
petabytes)

Velocity
(realtime or nearrealtime)

Variety
(social networks,
blog posts, logs,
sensors, etc.)

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner

Value
Big data tasks to solve - before analysis
Big Data Integration
– Land data in a Big Data cluster
– Implement or generate parallel processes

Big Data Manipulation
– Simplify manipulation, such as sort and filter
– Computational expensive functions

Big Data Quality & Governance
– Identify linkages and duplicates, validate big data
– Match connector, execute basic quality features

Big Data Project Management
– Place frameworks around big data projects
– Common Repository, scheduling, monitoring
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Use case: Clickstream Analysis

“The advantage of their new system is that they can now look at their data
[from their log processing system] in anyway they want:
➜ Nightly MapReduce jobs collect statistics about their mail system such as
spam counts by domain, bytes transferred and number of logins.
➜ When they wanted to find out which part of the world their customers
logged in from, a quick [ad hoc] MapReduce job was created and they had
the answer within a few hours. Not really possible in your typical ETL
system.”
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Use case: Risk management

Deduce
Customer
Defections

http://hkotadia.com/archives/5021

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Use case: Flexible pricing

➜

➜
➜

➜

With revenue of almost USD 30 billion and a network of
800 locations, Macy's is considered the largest store operator in the
USA
Daily price check analysis of its 10,000 articles in less than two hours
Whenever a neighboring competitor anywhere between New York
and Los Angeles goes for aggressive price reductions, Macy's follows
its example
If there is no market competitor, the prices remain unchanged

http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Storage: Compliance

Global Parcel Service
A lot of data must be stored „forever“
➜ Numbers increase exponentially
➜ Goal: As cheap as possible
➜ Problem: (Fast) queries must still be possible
➜ Solution: Commodity servers and „Hadoop querying“
➜

http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Agenda
• Big data paradigm shift
• Challenges for integrating big data
• Choosing Hadoop for big data integration
• Integration with an open source framework
• Integration with an open source suite
• Custom big data connectors

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Limited big data experts

This is your
company
Big Data Geek
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Data quality

Big Data + Poor Data Quality = Big Problems

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Big data tool selection (business perspective)
Wanna buy a big data solution for your industry?
➜ Maybe a competitor has a big data solution which
adds business value?
➜ The competitor will never publish it (rat-race)!
➜

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Big data tool selection (technical perspective)

Looking for ‚your‘ required big data product?
Support your data from scratch?
Good luck! 

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to solve these big data challenges?

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
What is your Big Data process?

Step 1

Step 2

Step 3

1) Do not begin with the data, think about business opportunities
2) Choose the right data (combine different data sources)
3) Use easy tooling
http://hbr.org/2012/10/making-advanced-analytics-work-for-you

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Agenda
• Big data paradigm shift
• Challenges for integrating big data
• Choosing Hadoop for big data integration
• Integration with an open source framework
• Integration with an open source suite
• Custom big data connectors

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Technology perspective

How to process big data?
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to process big data?

The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing
nodes. This means that every time a large job is run, the data has to first be read from the source,
split N ways and then delivered to the individual nodes. Worse, if the partition key of the source
doesn’t match the partition key of the target, data has to be constantly exchanged among the
nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The
network, which is always the slowest part of the process, becomes the weakest link in the
performance chain.

http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to process big data?

Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java
Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to process big data?

The defacto standard for big data processing

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to process big data?

“A big part of [the
company’s strategy]
includes wiring SQL Server
2012 (formerly known by
the codename “Denali”) to
the Hadoop distributed
computing platform, and
bringing Hadoop to
Windows Server and Azure”

Even Microsoft (the .NET house) relies on Hadoop since 2011
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
What is Hadoop?

Apache Hadoop, an open-source software library, is a
framework that allows for the distributed processing of
large data sets across clusters of commodity hardware
using simple programming models. It is designed to scale
up from single servers to thousands of machines, each
offering local computation and storage.
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Map (Shuffle) Reduce
Simple example
• Input: (very large) text files with lists of strings, such as:
„318, 0043012650999991949032412004...0500001N9+01111+99999999999...“
• We are interested just in some content: year and temperate (marked in red)
• The Map Reduce function has to compute the maximum temperature for every year

Example from the book “Hadoop: The Definitive Guide, 3rd Edition”

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
How to process big data?

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Agenda
• Big data paradigm shift
• Challenges for integrating big data
• Choosing Hadoop for big data integration
• Integration with an open source framework
• Integration with an open source suite
• Custom big data connectors

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Alternatives for systems integration

Integration Suite

Enterprise
Service Bus

Integration
Framework

Low

Connectivity
Routing
Transformation
© Talend 2013

High

+

INTEGRATION
Tooling
Monitoring
Support

+

BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner

Complexity
of Integration
Alternatives for systems integration

Integration
Framework

Enterprise
Service Bus

Low

© Talend 2013

Integration Suite

High

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner

Complexity
of Integration
More details about integration frameworks...

http://www.kai-waehner.de/blog/2012/12/20/showdown-integration-frameworkspring-integration-apache-camel-vs-enterprise-service-bus-esb/
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)

Apache Camel
Implements the EIPs
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise Integration Patterns (EIP)

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Architecture

http://java.dzone.com/articles/apache-camel-integration

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Choose your required connectors
TCP

SQL

SMTP

Netty
RMI

FTP

Lucene

JMX

MQ

Atom
LDAP

Quartz

XSLT

Akka
Many many more

© Talend 2013

IRC

Log

HTTP

File

EJB

AMQP

RSS
AWS-S3

Jetty
JDBC

Bean-Validation

JMS

CXF

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner

Custom Components
Choose your favorite DSL

XML

(not production-ready yet)

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Deploy it wherever you need

Standalone

Application Server

Web Container
Spring Container
OSGi
Cloud
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Enterprise-ready

• Open Source
• Scalability
• Error Handling
• Transaction
• Monitoring
• Tooling
• Commercial Support
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Camel integration route

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Hadoop Integration with Apache Camel

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-hdfs connector
// Producer
from("ftp://user@myServer?password=secret")
.to(“hdfs:///myDirectory/myFile.txt?append=true");

// Consumer
from(“hdfs:///myDirectory/myBigDataAnalysis.csv")
.to(“file:target/reports/report.csv");

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-hbase connector

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real World Use Case: Storage: Compliance

Global Parcel Service
A lot of data must be stored „forever“
➜ Numbers increase exponentially
➜ Goal: As cheap as possible
➜ Problem: (Fast) queries must still be possible
➜ Solution: Commodity servers and „Hadoop querying“
➜

http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real world use case: Storage: Compliance
Orders
(Server 1)

Payments
(Server 2)

ETL

Log Files
(Server 3)

Log Files
(Server 100)
© Talend 2013

Storage
“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner

Query
Live demo

Apache Camel in action...
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-pig? camel-hive? camel-hcatalog?
Not available yet (current Camel version: 2.12)  Workarounds:
•

•
•

Use Pig / Hive-Query scripts (via camel-exec connector or any
scripting language)
Build your own connector (more details later ...)
Use Hive-Hbase-Integration and store data in HBase ( „ugly“)

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
camel-avro connector
Camel is not just about connectors ...
Camel supports a pluggable DataFormat to allow messages to be marshalled to and
unmarshalled from binary or text formats, e.g. CSV, JSON, SOAP, EDI, ZIP, or ...

... Avro, a data serialization system used in Apache Hadoop. camel-avro connector
provides a dataformat for Avro, which allows serialization and deserialization of
messages using Apache Avro's binary data format. Moreover, it provides support for
Apache Avro's RPC, by providing producers and consumers endpoint for using Avro
over Netty or HTTP.
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Agenda
• Big data paradigm shift
• Challenges for integrating big data
• Choosing Hadoop for big data integration
• Integration with an open source framework
• Integration with an open source suite
• Custom big data connectors

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Alternatives for systems integration

Integration Suite

Enterprise
Service Bus

Integration
Framework

Low

Connectivity
Routing
Transformation
© Talend 2013

High

+

INTEGRATION
Tooling
Monitoring
Support

+

BUSINESS PROCESS MGT.
BIG DATA / MDM
REGISTRY / REPOSITORY
RULES ENGINE
„YOU NAME IT“

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner

Complexity
of Integration
Alternatives for systems integration

Integration
Framework

Enterprise
Service Bus

Low

© Talend 2013

Integration Suite

High

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner

Complexity
of Integration
More details about ESBs and suites...

http://www.kai-waehner.de/blog/2013/01/23/spoilt-for-choicehow-to-choose-the-right-enterprise-service-bus-esb/
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Hadoop Integration with Talend Open Studio

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Vision: Democratize big data
Talend Open Studio for Big Data
•

•

…an open source
ecosystem
© Talend 2013

Generates Hadoop code and run transforms
inside Hadoop

•

Native support for HDFS, Pig, Hbase, Hcatalog,
Sqoop and Hive

•

Pig

Improves efficiency of big data job design with
graphic interface

100% open source under an Apache License

•

Standards based

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Vision: Democratize big data
Talend Platform for Big Data
•

Builds on Talend Open Studio for Big Data

•

Adds data quality, advanced scalability and
management functions
•
•

© Talend 2013

Data cleansing

•
•

Data quality and profiling

•

…an open source
ecosystem

Shared Repository and remote deployment

•

Pig

MapReduce massively parallel data
processing

Reporting and dashboards

Commercial support, warranty/IP indemnity
under a subscription license

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Talend Open Studio for Big Data

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real world Use case: Clickstream Analysis

“The advantage of their new system is that they can now look at their data
[from their log processing system] in anyway they want:
➜ Nightly MapReduce jobs collect statistics about their mail system such as
spam counts by domain, bytes transferred and number of logins.
➜ When they wanted to find out which part of the world their customers
logged in from, a quick [ad hoc] MapReduce job was created and they had
the answer within a few hours. Not really possible in your typical ETL
system.”
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Real world use case: Clickstream Analysis

Log Files
(Server 1)

ETL

Log Files
(Server 2)

Log Files
(Server 100)
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Potential Uses of Clickstream Data
One of the original uses of Hadoop at Yahoo was to store and process their massive volume of
clickstream data. Now enterprises of all types can use Hadoop to refine and analyze
clickstream data. They can then answer business questions such as:
•
•

•

What is the most efficient path for a site visitor to research a product, and then buy it?
What products do visitors tend to buy together, and what are they most likely to buy in
the future?
Where should I spend resources on fixing or enhancing the user experience on my
website?

Goal: Data visualization can help you optimize your website and
convert more visits into sales and revenue.

Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: A semi-structured log file

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: ETL Job

„... using Talend’s HDFS and Hive Components”
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: ETL Job

„... using Talend’s Map Reduce Components*”
* Not available in open source version of Talend Studio
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Live demo

„Talend Open Studio for Big Data“ in action...
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Microsoft Excel

We can see that the largest number of page hits in Florida were for
clothing, followed by shoes.
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Microsoft Excel

The chart shows that the majority of men shopping for clothing on our
website are between the ages of 22 and 30. With this information, we can
optimize our content for this market segment.
Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases”
http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Example: Analysis with Tableau

Spoilt for Choice  Use your preferred BI or Analysis tool!
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Agenda
• Big data paradigm shift
• Challenges for integrating big data
• Choosing Hadoop for big data integration
• Integration with an open source framework
• Integration with an open source suite
• Custom big data connectors

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Custom connectors

Easy to realize for all
integration alternatives *
•

Integration Framework

•

Enterprise Service Bus

•

Integration Suite

* At least for open source solutions

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Custom connectors

You might need a ...
•

... Hive connector for Camel

•

... Impala connector for Talend

•

... custom connector for your
internal data format

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Live demo (Example: Apache Camel)

Custom connectors in action...
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Alternative for custom connectors

• SOAP
• REST
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Code example: REST API for Salesforce object store
// Salesforce Query (SOQL) via REST API

from("direct:salesforceViaHttpLIST")
.setHeader("X-PrettyPrint", 1)
.setHeader("Authorization", accessToken)
.setHeader(Exchange.CONTENT_TYPE, "application/json")

.to("https://na14.salesforce.com/services/data/v20.0/query?q=SELECT+name+from
+Article__c")

// Salesforce CREATE via REST API
from("direct:salesforceViaHttpCREATE")
.setHeader("X-PrettyPrint", 1)
.setHeader("Authorization", accessToken)
.setHeader(Exchange.CONTENT_TYPE, "application/json“)
.to("https://na14.salesforce.com/services/data/v20.0/sobjects/Article__c")
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Did you get the key message?

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Key messages

You have to care about big data to be competitive in the future!
You have to integrate different sources to get most value out of it!
Big data integration is no (longer) rocket science!
© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Did you get the key message?

© Talend 2013

“Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
Thanks for the attention!
Follow @KaiWaehner

Contact Kai Wähner at Xing / LinkedIn
Download Talend‘s open source software
Mail kwaehner@talend.com
Visit www.kai-waehner.de

Más contenido relacionado

La actualidad más candente

The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingKai Wähner
 
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...InfluxData
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Kafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKai Wähner
 
Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...
Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...
Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...confluent
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...TheInevitableCloud
 
The Streaming Assessment – An Introduction
The Streaming Assessment – An IntroductionThe Streaming Assessment – An Introduction
The Streaming Assessment – An Introductionconfluent
 
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X Kai Wähner
 
Machine Learning with Apache Kafka in Pharma and Life Sciences
Machine Learning with Apache Kafka in Pharma and Life SciencesMachine Learning with Apache Kafka in Pharma and Life Sciences
Machine Learning with Apache Kafka in Pharma and Life SciencesKai Wähner
 
Kafka Summit SF 2017 - Real time Streaming Platform
Kafka Summit SF 2017 - Real time Streaming Platform Kafka Summit SF 2017 - Real time Streaming Platform
Kafka Summit SF 2017 - Real time Streaming Platform confluent
 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Dataconfluent
 
Hybrid & Global Kafka Architecture
Hybrid & Global Kafka ArchitectureHybrid & Global Kafka Architecture
Hybrid & Global Kafka Architectureconfluent
 
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, ArchitecturesApache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, ArchitecturesKai Wähner
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesKai Wähner
 
Event-Streaming verstehen in unter 10 Min
Event-Streaming verstehen in unter 10 MinEvent-Streaming verstehen in unter 10 Min
Event-Streaming verstehen in unter 10 Minconfluent
 
Event Mesh Presentation at Gartner AADI Mumbai
Event Mesh Presentation at Gartner AADI MumbaiEvent Mesh Presentation at Gartner AADI Mumbai
Event Mesh Presentation at Gartner AADI MumbaiSolace
 
Financial Event Sourcing at Enterprise Scale
Financial Event Sourcing at Enterprise ScaleFinancial Event Sourcing at Enterprise Scale
Financial Event Sourcing at Enterprise Scaleconfluent
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent
 

La actualidad más candente (20)

The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
DataOps on Streaming Data: From Kafka to InfluxDB via Kubernetes Native Flows...
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Kafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance Industry
 
Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...
Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...
Industry-ready NLP Service Framework Based on Kafka (Bernhard Waltl and Georg...
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
The Streaming Assessment – An Introduction
The Streaming Assessment – An IntroductionThe Streaming Assessment – An Introduction
The Streaming Assessment – An Introduction
 
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
IIoT / Industry 4.0 with Apache Kafka, Connect, KSQL, Apache PLC4X
 
Machine Learning with Apache Kafka in Pharma and Life Sciences
Machine Learning with Apache Kafka in Pharma and Life SciencesMachine Learning with Apache Kafka in Pharma and Life Sciences
Machine Learning with Apache Kafka in Pharma and Life Sciences
 
Kafka Summit SF 2017 - Real time Streaming Platform
Kafka Summit SF 2017 - Real time Streaming Platform Kafka Summit SF 2017 - Real time Streaming Platform
Kafka Summit SF 2017 - Real time Streaming Platform
 
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
[INFOGRAPHIC] Event-driven Business: How to Handle the Flow of Event Data
 
Hybrid & Global Kafka Architecture
Hybrid & Global Kafka ArchitectureHybrid & Global Kafka Architecture
Hybrid & Global Kafka Architecture
 
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, ArchitecturesApache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
Apache Kafka and MQTT - Overview, Comparison, Use Cases, Architectures
 
Apache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and ArchitecturesApache Kafka in Financial Services - Use Cases and Architectures
Apache Kafka in Financial Services - Use Cases and Architectures
 
Event-Streaming verstehen in unter 10 Min
Event-Streaming verstehen in unter 10 MinEvent-Streaming verstehen in unter 10 Min
Event-Streaming verstehen in unter 10 Min
 
Event Mesh Presentation at Gartner AADI Mumbai
Event Mesh Presentation at Gartner AADI MumbaiEvent Mesh Presentation at Gartner AADI Mumbai
Event Mesh Presentation at Gartner AADI Mumbai
 
Financial Event Sourcing at Enterprise Scale
Financial Event Sourcing at Enterprise ScaleFinancial Event Sourcing at Enterprise Scale
Financial Event Sourcing at Enterprise Scale
 
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...
 

Destacado

TALEND ETL Introducción
TALEND ETL IntroducciónTALEND ETL Introducción
TALEND ETL IntroducciónSoftware
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendEdureka!
 
Présentation Talend Open Studio
Présentation Talend Open StudioPrésentation Talend Open Studio
Présentation Talend Open Studiohoracio lassey
 
Open Source ETL using Talend Open Studio
Open Source ETL using Talend Open StudioOpen Source ETL using Talend Open Studio
Open Source ETL using Talend Open Studiosantosluis87
 
Talend, Leading Open Source DataIntegration plateform. Cedric Carbone
Talend, Leading Open Source DataIntegration plateform. Cedric CarboneTalend, Leading Open Source DataIntegration plateform. Cedric Carbone
Talend, Leading Open Source DataIntegration plateform. Cedric CarboneCedric CARBONE
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution ArchitectureAlan McSweeney
 

Destacado (7)

TALEND ETL Introducción
TALEND ETL IntroducciónTALEND ETL Introducción
TALEND ETL Introducción
 
Talend
TalendTalend
Talend
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
Présentation Talend Open Studio
Présentation Talend Open StudioPrésentation Talend Open Studio
Présentation Talend Open Studio
 
Open Source ETL using Talend Open Studio
Open Source ETL using Talend Open StudioOpen Source ETL using Talend Open Studio
Open Source ETL using Talend Open Studio
 
Talend, Leading Open Source DataIntegration plateform. Cedric Carbone
Talend, Leading Open Source DataIntegration plateform. Cedric CarboneTalend, Leading Open Source DataIntegration plateform. Cedric Carbone
Talend, Leading Open Source DataIntegration plateform. Cedric Carbone
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution Architecture
 

Similar a JAZOON'13 - Kai Waehner - Hadoop Integration

"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013Kai Wähner
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataKai Wähner
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Kai Wähner
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Edureka!
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.Edureka!
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
John Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudJohn Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudWeAreEsynergy
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...Kai Wähner
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 

Similar a JAZOON'13 - Kai Waehner - Hadoop Integration (20)

"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your Data
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
John Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the CloudJohn Glendenning - Real time data driven services in the Cloud
John Glendenning - Real time data driven services in the Cloud
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Big Data
Big DataBig Data
Big Data
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
How to create intelligent Business Processes thanks to Big Data (BPM, Apache ...
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 

Más de jazoon13

JAZOON'13 - Joe Justice - Test First Saves The World
JAZOON'13 - Joe Justice - Test First Saves The WorldJAZOON'13 - Joe Justice - Test First Saves The World
JAZOON'13 - Joe Justice - Test First Saves The Worldjazoon13
 
JAZOON'13 - Ulrika Park- User Story telling The Lost Art of User Stories
JAZOON'13 - Ulrika Park- User Story telling The Lost Art of User StoriesJAZOON'13 - Ulrika Park- User Story telling The Lost Art of User Stories
JAZOON'13 - Ulrika Park- User Story telling The Lost Art of User Storiesjazoon13
 
JAZOON'13 - Bartosz Majsak - Git Workshop - Kung Fu
JAZOON'13 - Bartosz Majsak - Git Workshop - Kung FuJAZOON'13 - Bartosz Majsak - Git Workshop - Kung Fu
JAZOON'13 - Bartosz Majsak - Git Workshop - Kung Fujazoon13
 
JAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -Essentials
JAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -EssentialsJAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -Essentials
JAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -Essentialsjazoon13
 
JAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScript
JAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScriptJAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScript
JAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScriptjazoon13
 
JAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone before
JAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone beforeJAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone before
JAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone beforejazoon13
 
JAZOON'13 - Andres Almiray - Rocket Propelled Java
JAZOON'13 - Andres Almiray - Rocket Propelled JavaJAZOON'13 - Andres Almiray - Rocket Propelled Java
JAZOON'13 - Andres Almiray - Rocket Propelled Javajazoon13
 
JAZOON'13 - Sven Peters - How to do Kick-Ass Software Development
JAZOON'13 - Sven Peters - How to do Kick-Ass Software DevelopmentJAZOON'13 - Sven Peters - How to do Kick-Ass Software Development
JAZOON'13 - Sven Peters - How to do Kick-Ass Software Developmentjazoon13
 
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...jazoon13
 
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed TeamsJAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teamsjazoon13
 
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next GenerationJAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generationjazoon13
 
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In RealtimeJAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtimejazoon13
 
JAZOON'13 - Andrej Vckovski - Go synchronized
JAZOON'13 - Andrej Vckovski - Go synchronizedJAZOON'13 - Andrej Vckovski - Go synchronized
JAZOON'13 - Andrej Vckovski - Go synchronizedjazoon13
 
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experienceJAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experiencejazoon13
 
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !jazoon13
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Softwarejazoon13
 
JAZOON'13 - Stefan Saasen - True Git: The Great Migration
JAZOON'13 - Stefan Saasen - True Git: The Great MigrationJAZOON'13 - Stefan Saasen - True Git: The Great Migration
JAZOON'13 - Stefan Saasen - True Git: The Great Migrationjazoon13
 
JAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git WorkflowsJAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git Workflowsjazoon13
 

Más de jazoon13 (18)

JAZOON'13 - Joe Justice - Test First Saves The World
JAZOON'13 - Joe Justice - Test First Saves The WorldJAZOON'13 - Joe Justice - Test First Saves The World
JAZOON'13 - Joe Justice - Test First Saves The World
 
JAZOON'13 - Ulrika Park- User Story telling The Lost Art of User Stories
JAZOON'13 - Ulrika Park- User Story telling The Lost Art of User StoriesJAZOON'13 - Ulrika Park- User Story telling The Lost Art of User Stories
JAZOON'13 - Ulrika Park- User Story telling The Lost Art of User Stories
 
JAZOON'13 - Bartosz Majsak - Git Workshop - Kung Fu
JAZOON'13 - Bartosz Majsak - Git Workshop - Kung FuJAZOON'13 - Bartosz Majsak - Git Workshop - Kung Fu
JAZOON'13 - Bartosz Majsak - Git Workshop - Kung Fu
 
JAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -Essentials
JAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -EssentialsJAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -Essentials
JAZOON'13 - Thomas Hug & Bartosz Majsak - Git Workshop -Essentials
 
JAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScript
JAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScriptJAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScript
JAZOON'13 - Oliver Zeigermann - Typed JavaScript with TypeScript
 
JAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone before
JAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone beforeJAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone before
JAZOON'13 - Andres Almiray - Spock: boldly go where no test has gone before
 
JAZOON'13 - Andres Almiray - Rocket Propelled Java
JAZOON'13 - Andres Almiray - Rocket Propelled JavaJAZOON'13 - Andres Almiray - Rocket Propelled Java
JAZOON'13 - Andres Almiray - Rocket Propelled Java
 
JAZOON'13 - Sven Peters - How to do Kick-Ass Software Development
JAZOON'13 - Sven Peters - How to do Kick-Ass Software DevelopmentJAZOON'13 - Sven Peters - How to do Kick-Ass Software Development
JAZOON'13 - Sven Peters - How to do Kick-Ass Software Development
 
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
JAZOON'13 - Nikita Salnikov-Tarnovski - Multiplatform Java application develo...
 
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed TeamsJAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
JAZOON'13 - Pawel Wrzeszcz - Visibility Shift In Distributed Teams
 
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next GenerationJAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
JAZOON'13 - Sam Brannen - Spring Framework 4.0 - The Next Generation
 
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In RealtimeJAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
JAZOON'13 - Guide Schmutz - Kafka and Strom Event Processing In Realtime
 
JAZOON'13 - Andrej Vckovski - Go synchronized
JAZOON'13 - Andrej Vckovski - Go synchronizedJAZOON'13 - Andrej Vckovski - Go synchronized
JAZOON'13 - Andrej Vckovski - Go synchronized
 
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experienceJAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
JAZOON'13 - Paul Brauner - A backend developer meets the web: my Dart experience
 
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
JAZOON'13 - Anatole Tresch - Go for the money (JSR 354) !
 
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling SoftwareJAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
 
JAZOON'13 - Stefan Saasen - True Git: The Great Migration
JAZOON'13 - Stefan Saasen - True Git: The Great MigrationJAZOON'13 - Stefan Saasen - True Git: The Great Migration
JAZOON'13 - Stefan Saasen - True Git: The Great Migration
 
JAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git WorkflowsJAZOON'13 - Stefan Saasen - Real World Git Workflows
JAZOON'13 - Stefan Saasen - Real World Git Workflows
 

Último

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 

Último (20)

Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 

JAZOON'13 - Kai Waehner - Hadoop Integration

  • 1. Big Data beyond Hadoop How to integrate ALL your Data Kai Wähner Principal Consultant at Talend @KaiWaehner www.kai-waehner.de
  • 2. Kai Wähner Main Tasks Requirements Engineering Enterprise Architecture Management Business Process Management Architecture and Development of Applications Service-oriented Architecture Integration of Legacy Applications Cloud Computing Big Data Consulting Developing Coaching Speaking Writing © Talend 2013 Contact Email: kontakt@kai-waehner.de Blog: www.kai-waehner.de/blog Twitter: @KaiWaehner Social Networks: Xing, LinkedIn “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 3. Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science! © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 4. Agenda • Big data paradigm shift • Challenges for integrating big data • Choosing Hadoop for big data integration • Integration with an open source framework • Integration with an open source suite • Custom big data connectors © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 5. Agenda • Big data paradigm shift • Challenges for integrating big data • Choosing Hadoop for big data integration • Integration with an open source framework • Integration with an open source suite • Custom big data connectors © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 6. Why should you care about big data? “If you can't measure it, you can't manage it.” William Edwards Deming (1900 –1993) American statistician, professor, author, lecturer and consultant © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 7. Why should you care about big data? „Silence the HiPPOs“ (highest-paid person‘s opinion)  Being able to interpret unimaginable large data stream, the gut feeling is no longer justified!  © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 8. What is big data? The Vs of big data Volume (terabytes, petabytes) Velocity (realtime or nearrealtime) Variety (social networks, blog posts, logs, sensors, etc.) © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Value
  • 9. Big data tasks to solve - before analysis Big Data Integration – Land data in a Big Data cluster – Implement or generate parallel processes Big Data Manipulation – Simplify manipulation, such as sort and filter – Computational expensive functions Big Data Quality & Governance – Identify linkages and duplicates, validate big data – Match connector, execute basic quality features Big Data Project Management – Place frameworks around big data projects – Common Repository, scheduling, monitoring © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 10. Use case: Clickstream Analysis “The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: ➜ Nightly MapReduce jobs collect statistics about their mail system such as spam counts by domain, bytes transferred and number of logins. ➜ When they wanted to find out which part of the world their customers logged in from, a quick [ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.” http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 11. Use case: Risk management Deduce Customer Defections http://hkotadia.com/archives/5021 © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 12. Use case: Flexible pricing ➜ ➜ ➜ ➜ With revenue of almost USD 30 billion and a network of 800 locations, Macy's is considered the largest store operator in the USA Daily price check analysis of its 10,000 articles in less than two hours Whenever a neighboring competitor anywhere between New York and Los Angeles goes for aggressive price reductions, Macy's follows its example If there is no market competitor, the prices remain unchanged http://www.t-systems.com/about-t-systems/examples-of-successes-companies-analyze-big-data-in-record-time-l-t-systems/1029702 © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 13. Storage: Compliance Global Parcel Service A lot of data must be stored „forever“ ➜ Numbers increase exponentially ➜ Goal: As cheap as possible ➜ Problem: (Fast) queries must still be possible ➜ Solution: Commodity servers and „Hadoop querying“ ➜ http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 14. Agenda • Big data paradigm shift • Challenges for integrating big data • Choosing Hadoop for big data integration • Integration with an open source framework • Integration with an open source suite • Custom big data connectors © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 15. Limited big data experts This is your company Big Data Geek © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 16. Data quality Big Data + Poor Data Quality = Big Problems © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 17. Big data tool selection (business perspective) Wanna buy a big data solution for your industry? ➜ Maybe a competitor has a big data solution which adds business value? ➜ The competitor will never publish it (rat-race)! ➜ © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 18. Big data tool selection (technical perspective) Looking for ‚your‘ required big data product? Support your data from scratch? Good luck!  © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 19. How to solve these big data challenges? © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 20. What is your Big Data process? Step 1 Step 2 Step 3 1) Do not begin with the data, think about business opportunities 2) Choose the right data (combine different data sources) 3) Use easy tooling http://hbr.org/2012/10/making-advanced-analytics-work-for-you © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 21. Agenda • Big data paradigm shift • Challenges for integrating big data • Choosing Hadoop for big data integration • Integration with an open source framework • Integration with an open source suite • Custom big data connectors © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 22. Technology perspective How to process big data? © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 23. How to process big data? The critical flaw in parallel ETL tools is the fact that the data is almost never local to the processing nodes. This means that every time a large job is run, the data has to first be read from the source, split N ways and then delivered to the individual nodes. Worse, if the partition key of the source doesn’t match the partition key of the target, data has to be constantly exchanged among the nodes. In essence, parallel ETL treats the network as if it were a physical I/O subsystem. The network, which is always the slowest part of the process, becomes the weakest link in the performance chain. http://blog.syncsort.com/2012/08/parallel-etl-tools-are-dead © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 24. How to process big data? Slides: http://www.slideshare.net/pavlobaron/100-big-data-0-hadoop-0-java Video: http://www.infoq.com/presentations/Big-Data-Hadoop-Java © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 25. How to process big data? The defacto standard for big data processing © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 26. How to process big data? “A big part of [the company’s strategy] includes wiring SQL Server 2012 (formerly known by the codename “Denali”) to the Hadoop distributed computing platform, and bringing Hadoop to Windows Server and Azure” Even Microsoft (the .NET house) relies on Hadoop since 2011 © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 27. What is Hadoop? Apache Hadoop, an open-source software library, is a framework that allows for the distributed processing of large data sets across clusters of commodity hardware using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 28. Map (Shuffle) Reduce Simple example • Input: (very large) text files with lists of strings, such as: „318, 0043012650999991949032412004...0500001N9+01111+99999999999...“ • We are interested just in some content: year and temperate (marked in red) • The Map Reduce function has to compute the maximum temperature for every year Example from the book “Hadoop: The Definitive Guide, 3rd Edition” © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 29. How to process big data? © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 30. Agenda • Big data paradigm shift • Challenges for integrating big data • Choosing Hadoop for big data integration • Integration with an open source framework • Integration with an open source suite • Custom big data connectors © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 31. Alternatives for systems integration Integration Suite Enterprise Service Bus Integration Framework Low Connectivity Routing Transformation © Talend 2013 High + INTEGRATION Tooling Monitoring Support + BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration
  • 32. Alternatives for systems integration Integration Framework Enterprise Service Bus Low © Talend 2013 Integration Suite High “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration
  • 33. More details about integration frameworks... http://www.kai-waehner.de/blog/2012/12/20/showdown-integration-frameworkspring-integration-apache-camel-vs-enterprise-service-bus-esb/ © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 34. Enterprise Integration Patterns (EIP) Apache Camel Implements the EIPs © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 35. Enterprise Integration Patterns (EIP) © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 36. Enterprise Integration Patterns (EIP) © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 37. Architecture http://java.dzone.com/articles/apache-camel-integration © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 38. Choose your required connectors TCP SQL SMTP Netty RMI FTP Lucene JMX MQ Atom LDAP Quartz XSLT Akka Many many more © Talend 2013 IRC Log HTTP File EJB AMQP RSS AWS-S3 Jetty JDBC Bean-Validation JMS CXF “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Custom Components
  • 39. Choose your favorite DSL XML (not production-ready yet) © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 40. Deploy it wherever you need Standalone Application Server Web Container Spring Container OSGi Cloud © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 41. Enterprise-ready • Open Source • Scalability • Error Handling • Transaction • Monitoring • Tooling • Commercial Support © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 42. Example: Camel integration route © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 43. Hadoop Integration with Apache Camel © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 44. camel-hdfs connector // Producer from("ftp://user@myServer?password=secret") .to(“hdfs:///myDirectory/myFile.txt?append=true"); // Consumer from(“hdfs:///myDirectory/myBigDataAnalysis.csv") .to(“file:target/reports/report.csv"); © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 45. camel-hbase connector © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 46. Real World Use Case: Storage: Compliance Global Parcel Service A lot of data must be stored „forever“ ➜ Numbers increase exponentially ➜ Goal: As cheap as possible ➜ Problem: (Fast) queries must still be possible ➜ Solution: Commodity servers and „Hadoop querying“ ➜ http://archive.org/stream/BigDataImPraxiseinsatz-SzenarienBeispieleEffekte/Big_Data_BITKOM-Leitfaden_Sept.2012#page/n0/mode/2up © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 47. Real world use case: Storage: Compliance Orders (Server 1) Payments (Server 2) ETL Log Files (Server 3) Log Files (Server 100) © Talend 2013 Storage “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Query
  • 48. Live demo Apache Camel in action... © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 49. camel-pig? camel-hive? camel-hcatalog? Not available yet (current Camel version: 2.12)  Workarounds: • • • Use Pig / Hive-Query scripts (via camel-exec connector or any scripting language) Build your own connector (more details later ...) Use Hive-Hbase-Integration and store data in HBase ( „ugly“) © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 50. camel-avro connector Camel is not just about connectors ... Camel supports a pluggable DataFormat to allow messages to be marshalled to and unmarshalled from binary or text formats, e.g. CSV, JSON, SOAP, EDI, ZIP, or ... ... Avro, a data serialization system used in Apache Hadoop. camel-avro connector provides a dataformat for Avro, which allows serialization and deserialization of messages using Apache Avro's binary data format. Moreover, it provides support for Apache Avro's RPC, by providing producers and consumers endpoint for using Avro over Netty or HTTP. © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 51. Agenda • Big data paradigm shift • Challenges for integrating big data • Choosing Hadoop for big data integration • Integration with an open source framework • Integration with an open source suite • Custom big data connectors © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 52. Alternatives for systems integration Integration Suite Enterprise Service Bus Integration Framework Low Connectivity Routing Transformation © Talend 2013 High + INTEGRATION Tooling Monitoring Support + BUSINESS PROCESS MGT. BIG DATA / MDM REGISTRY / REPOSITORY RULES ENGINE „YOU NAME IT“ “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration
  • 53. Alternatives for systems integration Integration Framework Enterprise Service Bus Low © Talend 2013 Integration Suite High “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner Complexity of Integration
  • 54. More details about ESBs and suites... http://www.kai-waehner.de/blog/2013/01/23/spoilt-for-choicehow-to-choose-the-right-enterprise-service-bus-esb/ © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 55. Hadoop Integration with Talend Open Studio © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 56. Vision: Democratize big data Talend Open Studio for Big Data • • …an open source ecosystem © Talend 2013 Generates Hadoop code and run transforms inside Hadoop • Native support for HDFS, Pig, Hbase, Hcatalog, Sqoop and Hive • Pig Improves efficiency of big data job design with graphic interface 100% open source under an Apache License • Standards based “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 57. Vision: Democratize big data Talend Platform for Big Data • Builds on Talend Open Studio for Big Data • Adds data quality, advanced scalability and management functions • • © Talend 2013 Data cleansing • • Data quality and profiling • …an open source ecosystem Shared Repository and remote deployment • Pig MapReduce massively parallel data processing Reporting and dashboards Commercial support, warranty/IP indemnity under a subscription license “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 58. Talend Open Studio for Big Data © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 59. Real world Use case: Clickstream Analysis “The advantage of their new system is that they can now look at their data [from their log processing system] in anyway they want: ➜ Nightly MapReduce jobs collect statistics about their mail system such as spam counts by domain, bytes transferred and number of logins. ➜ When they wanted to find out which part of the world their customers logged in from, a quick [ad hoc] MapReduce job was created and they had the answer within a few hours. Not really possible in your typical ETL system.” http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 60. Real world use case: Clickstream Analysis Log Files (Server 1) ETL Log Files (Server 2) Log Files (Server 100) © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 61. Potential Uses of Clickstream Data One of the original uses of Hadoop at Yahoo was to store and process their massive volume of clickstream data. Now enterprises of all types can use Hadoop to refine and analyze clickstream data. They can then answer business questions such as: • • • What is the most efficient path for a site visitor to research a product, and then buy it? What products do visitors tend to buy together, and what are they most likely to buy in the future? Where should I spend resources on fixing or enhancing the user experience on my website? Goal: Data visualization can help you optimize your website and convert more visits into sales and revenue. Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 62. Example: A semi-structured log file © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 63. Example: ETL Job „... using Talend’s HDFS and Hive Components” © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 64. Example: ETL Job „... using Talend’s Map Reduce Components*” * Not available in open source version of Talend Studio © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 65. Live demo „Talend Open Studio for Big Data“ in action... © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 66. Example: Analysis with Microsoft Excel We can see that the largest number of page hits in Florida were for clothing, followed by shoes. Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 67. Example: Analysis with Microsoft Excel The chart shows that the majority of men shopping for clothing on our website are between the ages of 22 and 30. With this information, we can optimize our content for this market segment. Source: for Clickstream Example: „Hortonworks Hadoop Tutorials - Real Life Use Cases” http://hortonworks.com/blog/hadoop-tutorials-real-life-use-cases-in-the-sandbox © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 68. Example: Analysis with Tableau Spoilt for Choice  Use your preferred BI or Analysis tool! © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 69. Agenda • Big data paradigm shift • Challenges for integrating big data • Choosing Hadoop for big data integration • Integration with an open source framework • Integration with an open source suite • Custom big data connectors © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 70. Custom connectors Easy to realize for all integration alternatives * • Integration Framework • Enterprise Service Bus • Integration Suite * At least for open source solutions © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 71. Custom connectors You might need a ... • ... Hive connector for Camel • ... Impala connector for Talend • ... custom connector for your internal data format © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 72. Live demo (Example: Apache Camel) Custom connectors in action... © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 73. Alternative for custom connectors • SOAP • REST © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 74. Code example: REST API for Salesforce object store // Salesforce Query (SOQL) via REST API from("direct:salesforceViaHttpLIST") .setHeader("X-PrettyPrint", 1) .setHeader("Authorization", accessToken) .setHeader(Exchange.CONTENT_TYPE, "application/json") .to("https://na14.salesforce.com/services/data/v20.0/query?q=SELECT+name+from +Article__c") // Salesforce CREATE via REST API from("direct:salesforceViaHttpCREATE") .setHeader("X-PrettyPrint", 1) .setHeader("Authorization", accessToken) .setHeader(Exchange.CONTENT_TYPE, "application/json“) .to("https://na14.salesforce.com/services/data/v20.0/sobjects/Article__c") © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 75. Did you get the key message? © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 76. Key messages You have to care about big data to be competitive in the future! You have to integrate different sources to get most value out of it! Big data integration is no (longer) rocket science! © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 77. Did you get the key message? © Talend 2013 “Big Data beyond Hadoop – How to integrate ALL your Data” by Kai Wähner
  • 78. Thanks for the attention! Follow @KaiWaehner Contact Kai Wähner at Xing / LinkedIn Download Talend‘s open source software Mail kwaehner@talend.com Visit www.kai-waehner.de