SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Business Integration with
         CDH 4
       (including Apache Hadoop)

   Alexander Alten-Lorenz, Cloudera INC
       Muenchen, 22. February 2013
Challenges




Volume      Velocity   Variety
Business Integration
•   CRM               •   Invoicing

•   Analytics         •   Risk Management

•   Social Networks   •   Universal Data Access

•   Marketing         •   Data Governance

•   Document Store    •   SAP / Salesforce

•   Search-Indices    •   Article and Storage
                          Management
Use Cases
Risk Management

• Problem: Scoring of Customers and
  Projects
• Solution: Finance History, Communication
  and Pattern Detection
• User: Finance, Insurance
Recommendations
• Problem: Recommend convenient products
  to purchased products, matching the
  interests
• Solution: Statistical analysis of interests,
  purchase history, detect matching swarm
  patterns
• Users: eCommerce, Advertising
Graph-Analytics
• Problem: Detect trends and curves in large
  distributed networks (Wired, Social, Mesh)
• Solution: Collecting and Data Mining all
  data, applying to self learning patterns to
  detect trends and forecasts
• User: Enterprises, Gov, NGO, Provider,
  Telco, Stock Exchange
Detection of
       Dangerous Use
• Problem: Spam, Credit Card Abuse
• Solution: Pattern Detection, Prioritizing,
  heuristically Analytics
• Users: Retail, Finance, Reseller
Text Analysis

• Problem: Detect the meaning of the written
  word (Sentiment Analysis)
• Solution: Keyword patterns, Coherences
  detection, Path detection
• Users: eCommerce, Social Media Service
  Provider, Attitude Research
Amounts of real Data

• Ebay: 12 PB, Search Optimization
• Facebook: 50 PB, Logs, Reports
• Walmart, 4.5 PB, Customer Transactions
          http://wiki.apache.org/hadoop/PoweredBy
             http://en.wikipedia.org/wiki/Big_data
Apache Hadoop
• Software Framework for large amounts of
  unstructured data
• Apache-License
• Two main cores
 • HDFS: Distributed data storage
 • MapReduce: Distributed data handling
Hadoop Cluster
Data Node     Data Node   Data Node    Data Node
Data Node     Data Node   Data Node    Data Node
Data Node     Data Node   Data Node    Data Node
Data Node     Data Node   Data Node    Data Node
Data Node     Data Node   Data Node    Data Node
Data Node     Data Node   Data Node    Data Node
Data Node     Data Node   Data Node    Data Node


       Data Node: 4-16 Cores, 4-16 Disks,
        8-64 GB RAM, 1-10GB Network
Hadoop Distributed
      File System
                            File




Block    Block   Block     Block     Block   Block    Block




  Data Node              Data Node              Data Node
MapReduce
                 Data




RDBMS    Query




                 Data




Hadoop   Query
Features
                  HDFS   MapReduce

 Distribution      ✔        ✔


Fault Tolerance    ✔        ✔


  Scalability      ✔        ✔
Hadoop Eco System
         SQL               Scripts            HBase
                                                      Whirr
         Hive               Pig               Oozie


               MapReduce                                Avro
                                        Java API
                 HDFS
                                                 eeper
                                             Zook
 Sqoop           Flume     Connectors                    Hue


 RDBMS            Logs            ...          Mahout
Example of a
 Integration
Scope
• Successful Audits per ISO 27001
• Analyze different Data Sources from
  different Data Bases and CRM Systems
• Realtime and Lifetime Statistics per Product
• Periodical Analytic and Statistic Jobs
• Weekly Re-Import into CRM
• Single Queries per User (Analyst) over a
  Secured GUI
Solution Path
• Cluster Authentication and Authorization via
  Kerberos and crypted data communication / Data
  Protection
• Sqoop Connector to CRM / DB
   • Terradata, Oracle, Postgres, MySQL, MS SQL
• Hive - HBase Integration
• Hive Analytics, controlled automatically over Oozie
  Workload Orchestrator
• Hue Shell, Authentication via Kerberos SPNEGO
CRM Park         Integration         CDH    Authentification




                     Sqoop
                                                        Kerberos
                                                       (AD, MITv5)




Real Time    HBase                   Hive                       Oozie




                                                              Automation
   Enduser                     HUE
How to Manage?
Cloudera Manager
•   Automated Deployment   •   Reporting

•   Monitoring             •   Support Integration

•   Service Management

•   Log Management

•   Events and Alerts
Cloudera
• Founded 2009 in Palo Alto
• Cloudera's Distribution Including Hadoop
• CDH4 / Cloudera Manager 4
• > 320 employees worldwide
• Training, Consulting, Support, Development
• Enterprise Tools
Thank You!

• alexander@cloudera.com
• Twitter: @mapredit
• Blog: mapredit.blogspot.com
• http://www.cloudera.com/
• http://hadoop. apache.org/

Más contenido relacionado

La actualidad más candente

Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture Ganesh B
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use casesJoey Echeverria
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.Data Con LA
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn HadoopSilicon Halton
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaEdureka!
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...Cloudera, Inc.
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 

La actualidad más candente (20)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop white papers
Hadoop white papersHadoop white papers
Hadoop white papers
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | Edureka
 
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop
Hadoop Hadoop
Hadoop
 

Similar a Business Integration with CDH 4 (Apache Hadoop

HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarPlatfora
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - finalHortonworks
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
What is Hadoop & its Use cases-PromtpCloud
What is Hadoop & its Use cases-PromtpCloudWhat is Hadoop & its Use cases-PromtpCloud
What is Hadoop & its Use cases-PromtpCloudPromptCloud
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxNouhaElhaji1
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 

Similar a Business Integration with CDH 4 (Apache Hadoop (20)

HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
hadoop exp
hadoop exphadoop exp
hadoop exp
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
What is Hadoop & its Use cases-PromtpCloud
What is Hadoop & its Use cases-PromtpCloudWhat is Hadoop & its Use cases-PromtpCloud
What is Hadoop & its Use cases-PromtpCloud
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 

Más de Alexander Alten-Lorenz (12)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Business Integration with CDH 4 (Apache Hadoop

  • 1. Business Integration with CDH 4 (including Apache Hadoop) Alexander Alten-Lorenz, Cloudera INC Muenchen, 22. February 2013
  • 2. Challenges Volume Velocity Variety
  • 3. Business Integration • CRM • Invoicing • Analytics • Risk Management • Social Networks • Universal Data Access • Marketing • Data Governance • Document Store • SAP / Salesforce • Search-Indices • Article and Storage Management
  • 5. Risk Management • Problem: Scoring of Customers and Projects • Solution: Finance History, Communication and Pattern Detection • User: Finance, Insurance
  • 6. Recommendations • Problem: Recommend convenient products to purchased products, matching the interests • Solution: Statistical analysis of interests, purchase history, detect matching swarm patterns • Users: eCommerce, Advertising
  • 7. Graph-Analytics • Problem: Detect trends and curves in large distributed networks (Wired, Social, Mesh) • Solution: Collecting and Data Mining all data, applying to self learning patterns to detect trends and forecasts • User: Enterprises, Gov, NGO, Provider, Telco, Stock Exchange
  • 8. Detection of Dangerous Use • Problem: Spam, Credit Card Abuse • Solution: Pattern Detection, Prioritizing, heuristically Analytics • Users: Retail, Finance, Reseller
  • 9. Text Analysis • Problem: Detect the meaning of the written word (Sentiment Analysis) • Solution: Keyword patterns, Coherences detection, Path detection • Users: eCommerce, Social Media Service Provider, Attitude Research
  • 10. Amounts of real Data • Ebay: 12 PB, Search Optimization • Facebook: 50 PB, Logs, Reports • Walmart, 4.5 PB, Customer Transactions http://wiki.apache.org/hadoop/PoweredBy http://en.wikipedia.org/wiki/Big_data
  • 11. Apache Hadoop • Software Framework for large amounts of unstructured data • Apache-License • Two main cores • HDFS: Distributed data storage • MapReduce: Distributed data handling
  • 12. Hadoop Cluster Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node Data Node: 4-16 Cores, 4-16 Disks, 8-64 GB RAM, 1-10GB Network
  • 13. Hadoop Distributed File System File Block Block Block Block Block Block Block Data Node Data Node Data Node
  • 14. MapReduce Data RDBMS Query Data Hadoop Query
  • 15. Features HDFS MapReduce Distribution ✔ ✔ Fault Tolerance ✔ ✔ Scalability ✔ ✔
  • 16. Hadoop Eco System SQL Scripts HBase Whirr Hive Pig Oozie MapReduce Avro Java API HDFS eeper Zook Sqoop Flume Connectors Hue RDBMS Logs ... Mahout
  • 17. Example of a Integration
  • 18. Scope • Successful Audits per ISO 27001 • Analyze different Data Sources from different Data Bases and CRM Systems • Realtime and Lifetime Statistics per Product • Periodical Analytic and Statistic Jobs • Weekly Re-Import into CRM • Single Queries per User (Analyst) over a Secured GUI
  • 19. Solution Path • Cluster Authentication and Authorization via Kerberos and crypted data communication / Data Protection • Sqoop Connector to CRM / DB • Terradata, Oracle, Postgres, MySQL, MS SQL • Hive - HBase Integration • Hive Analytics, controlled automatically over Oozie Workload Orchestrator • Hue Shell, Authentication via Kerberos SPNEGO
  • 20. CRM Park Integration CDH Authentification Sqoop Kerberos (AD, MITv5) Real Time HBase Hive Oozie Automation Enduser HUE
  • 22. Cloudera Manager • Automated Deployment • Reporting • Monitoring • Support Integration • Service Management • Log Management • Events and Alerts
  • 23. Cloudera • Founded 2009 in Palo Alto • Cloudera's Distribution Including Hadoop • CDH4 / Cloudera Manager 4 • > 320 employees worldwide • Training, Consulting, Support, Development • Enterprise Tools
  • 24. Thank You! • alexander@cloudera.com • Twitter: @mapredit • Blog: mapredit.blogspot.com • http://www.cloudera.com/ • http://hadoop. apache.org/