SlideShare una empresa de Scribd logo
1 de 23
Radoop: A Graphical Analytics Tool for Big Data


                 Gabor Makrai, CTO, Radoop
Who we are


• Active members of a Data Mining Research
  Group in Europe

• We started using Hadoop two years ago

• We are using basic Hadoop, Hive, and Mahout



11/9/2011          HadoopWorld 2011             2
Data mining tools

• Closed software
      – SAS Enterprise Miner
      – IBM SPSS Modeler
• Open-source software
      – Rapid-I RapidMiner
      –R
• Graphical user interface
• Data-flow structure
• Adaptability is important
11/9/2011                HadoopWorld 2011   3
Hadoop vs. Data mining tools


Hadoop                       Data mining tools




11/9/2011          HadoopWorld 2011              4
Why is it important?

• Barrier to entry for Hadoop
      – Using Hadoop without expert Hadoop knowledge


• Develop time vs. running time


• User-friendly graphical interface
      – Program readability


11/9/2011                HadoopWorld 2011              5
RapidMiner

•   The most used data mining tool in 2010*
•   Open-source software
•   Supports extensions
•   Data-flow structure
•   Marketplace




•   * http://www.kdnuggets.com/


11/9/2011                         HadoopWorld 2011   6
Radoop architecture




11/9/2011         HadoopWorld 2011   7
Implementation difficulties

  RapidMiner and Hive data types
  RapidMiner                        Hive
  • Nominal                         • TINYINT
        – Text                      • SMALLINT
        – Polynominal
                                    • INT
        – Binominal
                                    • BIGINT
  • Numeric
                                    • BOOLEAN
        – Integer
        – Real                      • FLOAT
  • Date and time                   • DOUBLE
        – Date                      • STRING
        – Time

11/9/2011               HadoopWorld 2011         8
Implementation difficulties

• Input data restrictions for Mahout
      – Conversion between Hive and Mahout
• Mahout needs data in special format
      – Data must be stored in VectorWritable class
• Hive can export data
      – Plain text or Sequence file format
• Solution: simple MapReduce jobs
      – Convert exported plain text Hive table to
        VectorWritable format and vica versa

11/9/2011                 HadoopWorld 2011            9
Implementation difficulties

• Remote Mahout’s jobs running

• Hadoop Commons and Hive handle remote
  connections well

• At the same time, Mahout does not support
  remote running

• Solution: modifications in the Mahout’s base
  source code

11/9/2011            HadoopWorld 2011            10
Implementation status

• Data imports and exports
      – CSV, Excel, and Database import/export


• Data transformations
      – Most used data manipulation functions


• Scalable machine learning and data mining
      – Clustering algorithms
      – Classifications

11/9/2011                 HadoopWorld 2011       11
Radoop base elements

• Operator




• Process




11/9/2011        HadoopWorld 2011   12
Radoop case study




11/9/2011           HadoopWorld 2011   13
Radoop case study




11/9/2011           HadoopWorld 2011   14
Radoop case study




11/9/2011           HadoopWorld 2011   15
Radoop case study

            Gets the Hive table




11/9/2011                  HadoopWorld 2011   16
Radoop case study

             Creates a new view with where statement




11/9/2011           HadoopWorld 2011                   17
Radoop case study




               Creates a new view with group by function




11/9/2011           HadoopWorld 2011                 18
Radoop case study




       Creates a new view with sort by function

11/9/2011                    HadoopWorld 2011     19
Radoop case study




                    Creates a new view with limit

11/9/2011           HadoopWorld 2011                20
Radoop case study




                         Creates a new table from the last view

11/9/2011           HadoopWorld 2011                     21
Future


• “We believe that more than half of the world’s
  data will be stored in Apache Hadoop within
  five years.” Hortonworks

• Radoop is opening the doors for people who
  are less comfortable with Hadoop but want to
  use Hadoop for Big Data analytics


11/9/2011           HadoopWorld 2011           22
Contacts


• Gabor Makrai
      – makrai@radoop.eu


• Webpage
      – http://www.radoop.eu/


• E-mail
      – radoop@radoop.eu


• Twitter
      – @radoopeu


11/9/2011                       HadoopWorld 2011   23

Más contenido relacionado

La actualidad más candente

Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallelmfolk
 
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISThe Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISGolgi Alvarez
 
GI2013 ppt kafka&team-inspire in pocket
GI2013 ppt kafka&team-inspire in  pocketGI2013 ppt kafka&team-inspire in  pocket
GI2013 ppt kafka&team-inspire in pocketIGN Vorstand
 
Inspire in pocket dresden 2
Inspire in  pocket dresden 2Inspire in  pocket dresden 2
Inspire in pocket dresden 2Karel Charvat
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...Amazon Web Services
 
Hdf5 current future
Hdf5 current futureHdf5 current future
Hdf5 current futuremfolk
 
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...WG_ Events
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?Dobrica Pavlinušić
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceBarry Norton
 
Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)Joachim Neubert
 

La actualidad más candente (20)

Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8
Update on HDF5 1.8
 
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GISThe Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
The Whitebox Geospatial-Analyisis Tools Project and Open-Access GIS
 
GI2013 ppt kafka&team-inspire in pocket
GI2013 ppt kafka&team-inspire in  pocketGI2013 ppt kafka&team-inspire in  pocket
GI2013 ppt kafka&team-inspire in pocket
 
Inspire in pocket dresden 2
Inspire in  pocket dresden 2Inspire in  pocket dresden 2
Inspire in pocket dresden 2
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF5 High Level and Lite Libraries
HDF5 High Level and Lite LibrariesHDF5 High Level and Lite Libraries
HDF5 High Level and Lite Libraries
 
Hdf5 current future
Hdf5 current futureHdf5 current future
Hdf5 current future
 
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
 
America Runs on Excel and HDF5 - Glued together by Python
America Runs on Excel and HDF5 - Glued together by PythonAmerica Runs on Excel and HDF5 - Glued together by Python
America Runs on Excel and HDF5 - Glued together by Python
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?
 
hadoop_module
hadoop_modulehadoop_module
hadoop_module
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and Inference
 
Olap operations
Olap operationsOlap operations
Olap operations
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)Linked data enhanced publishing for special collections (with Drupal)
Linked data enhanced publishing for special collections (with Drupal)
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 

Destacado

M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classmcAnalytics99
 
Data mining tools
Data mining toolsData mining tools
Data mining toolssuganmca14
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
radoop - nlp matiné 2014
radoop - nlp matiné 2014radoop - nlp matiné 2014
radoop - nlp matiné 2014Zoltan Varju
 
Slides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSlides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSabrina Kirstein
 
RapidMiner, an entrance to explore MIMIC-III?
RapidMiner, an entrance to explore MIMIC-III?RapidMiner, an entrance to explore MIMIC-III?
RapidMiner, an entrance to explore MIMIC-III?Sven Van Poucke, MD, PhD
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareMohammed Kharma
 
Introduction to RapidMiner Studio V7
Introduction to RapidMiner Studio V7Introduction to RapidMiner Studio V7
Introduction to RapidMiner Studio V7geraldinegray
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidmining Content
 

Destacado (13)

M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson class
 
Data mining tools
Data mining toolsData mining tools
Data mining tools
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
radoop - nlp matiné 2014
radoop - nlp matiné 2014radoop - nlp matiné 2014
radoop - nlp matiné 2014
 
Slides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMinerSlides PAPIs.io'14 RapidMiner
Slides PAPIs.io'14 RapidMiner
 
Data mining tools overall
Data mining tools overallData mining tools overall
Data mining tools overall
 
RapidMiner, an entrance to explore MIMIC-III?
RapidMiner, an entrance to explore MIMIC-III?RapidMiner, an entrance to explore MIMIC-III?
RapidMiner, an entrance to explore MIMIC-III?
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
 
Rapid miner
Rapid minerRapid miner
Rapid miner
 
Rapidminer
RapidminerRapidminer
Rapidminer
 
Introduction to RapidMiner Studio V7
Introduction to RapidMiner Studio V7Introduction to RapidMiner Studio V7
Introduction to RapidMiner Studio V7
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 

Similar a Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Makrai, Radoop

Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadooplamont_lockwood
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinAlex Zeltov
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 

Similar a Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Makrai, Radoop (20)

Hadoop pycon2011uk
Hadoop pycon2011ukHadoop pycon2011uk
Hadoop pycon2011uk
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Dallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: HadoopDallas TDWI Meeting Dec. 2012: Hadoop
Dallas TDWI Meeting Dec. 2012: Hadoop
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Makrai, Radoop

  • 1. Radoop: A Graphical Analytics Tool for Big Data Gabor Makrai, CTO, Radoop
  • 2. Who we are • Active members of a Data Mining Research Group in Europe • We started using Hadoop two years ago • We are using basic Hadoop, Hive, and Mahout 11/9/2011 HadoopWorld 2011 2
  • 3. Data mining tools • Closed software – SAS Enterprise Miner – IBM SPSS Modeler • Open-source software – Rapid-I RapidMiner –R • Graphical user interface • Data-flow structure • Adaptability is important 11/9/2011 HadoopWorld 2011 3
  • 4. Hadoop vs. Data mining tools Hadoop Data mining tools 11/9/2011 HadoopWorld 2011 4
  • 5. Why is it important? • Barrier to entry for Hadoop – Using Hadoop without expert Hadoop knowledge • Develop time vs. running time • User-friendly graphical interface – Program readability 11/9/2011 HadoopWorld 2011 5
  • 6. RapidMiner • The most used data mining tool in 2010* • Open-source software • Supports extensions • Data-flow structure • Marketplace • * http://www.kdnuggets.com/ 11/9/2011 HadoopWorld 2011 6
  • 7. Radoop architecture 11/9/2011 HadoopWorld 2011 7
  • 8. Implementation difficulties RapidMiner and Hive data types RapidMiner Hive • Nominal • TINYINT – Text • SMALLINT – Polynominal • INT – Binominal • BIGINT • Numeric • BOOLEAN – Integer – Real • FLOAT • Date and time • DOUBLE – Date • STRING – Time 11/9/2011 HadoopWorld 2011 8
  • 9. Implementation difficulties • Input data restrictions for Mahout – Conversion between Hive and Mahout • Mahout needs data in special format – Data must be stored in VectorWritable class • Hive can export data – Plain text or Sequence file format • Solution: simple MapReduce jobs – Convert exported plain text Hive table to VectorWritable format and vica versa 11/9/2011 HadoopWorld 2011 9
  • 10. Implementation difficulties • Remote Mahout’s jobs running • Hadoop Commons and Hive handle remote connections well • At the same time, Mahout does not support remote running • Solution: modifications in the Mahout’s base source code 11/9/2011 HadoopWorld 2011 10
  • 11. Implementation status • Data imports and exports – CSV, Excel, and Database import/export • Data transformations – Most used data manipulation functions • Scalable machine learning and data mining – Clustering algorithms – Classifications 11/9/2011 HadoopWorld 2011 11
  • 12. Radoop base elements • Operator • Process 11/9/2011 HadoopWorld 2011 12
  • 13. Radoop case study 11/9/2011 HadoopWorld 2011 13
  • 14. Radoop case study 11/9/2011 HadoopWorld 2011 14
  • 15. Radoop case study 11/9/2011 HadoopWorld 2011 15
  • 16. Radoop case study Gets the Hive table 11/9/2011 HadoopWorld 2011 16
  • 17. Radoop case study Creates a new view with where statement 11/9/2011 HadoopWorld 2011 17
  • 18. Radoop case study Creates a new view with group by function 11/9/2011 HadoopWorld 2011 18
  • 19. Radoop case study Creates a new view with sort by function 11/9/2011 HadoopWorld 2011 19
  • 20. Radoop case study Creates a new view with limit 11/9/2011 HadoopWorld 2011 20
  • 21. Radoop case study Creates a new table from the last view 11/9/2011 HadoopWorld 2011 21
  • 22. Future • “We believe that more than half of the world’s data will be stored in Apache Hadoop within five years.” Hortonworks • Radoop is opening the doors for people who are less comfortable with Hadoop but want to use Hadoop for Big Data analytics 11/9/2011 HadoopWorld 2011 22
  • 23. Contacts • Gabor Makrai – makrai@radoop.eu • Webpage – http://www.radoop.eu/ • E-mail – radoop@radoop.eu • Twitter – @radoopeu 11/9/2011 HadoopWorld 2011 23