Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
How can R and Hadoop be used together
1. Pingax
Big Data Analytics with R and Hadoop
http://pingax.com
How can R and Hadoop be used together?
Author : Vignesh Prajapati
Categories : Hadoop, Machine Learning, R
Tagged as : Hadoop, Machine Learning, R
Date : November 26, 2013
By inspired from this Quora question, I have been started working on how can R and Hadoop
integrated to be used together? By very hard verification process, finally I got the possible ways
to use R and Hadoop together for performing Big Data Analytics.
This blog post is written with consideration of helping to a Data scientist, Data Engineers and
Data Analysts who actually want a solution for running Machine Learning Application with
Larger dataset. So, I would like to suggest some refined ways to get it possible. I assume here
that you are interested to run a Machine Learning (Coursera - Join well known Online course by
Professor Andrew NG) Algorithms over large size dataset due to some memory issues with
single machine.
As such, R users are not required to learn a new language, e.g., Java, or environment, e.g.,
cluster software and hardware, to work with Hadoop. Moreover, functionality from R open
source packages can be used in the writing of mapper and reducer functions.
Since the popularity of combined platform of R and Hadoop increases more and more, I think
the Big Data Analytics can become a emerging trend. With the help of this parallel Data
1 / 3
2. Pingax
Big Data Analytics with R and Hadoop
http://pingax.com
Analytics platform, Large organization can easily derive insightful insights to get bigger and
bigger advantages from Big Data Analytics.
Let's check about the outline of the ways, R and Hadoop can be integrated to scale data
Analytics to Big Data Analytics. There are as given below,
1. RHadoop
2. RHIPE
3. ORCH
4. HadoopStreaming (R package)
5. Hadoop Streaming (HadoopStreaming Utility)
Now have some warm discussion on real world test cases with popular Hadoop tools. To
explain how this is possible, I am going to use various R and Hadoop tools. Why don't we check
a list for useful software that can be used.
We need following useful data driven tools group by technologies:
1. Linux-based Operating system Fast, secure and stylishly simple, the Ubuntu operating
system is used by 20 million people worldwide every day.
1. Ubuntu
- Ubuntu is Fast, secure and stylishly simple, the Ubuntu operating system is
used by 20 million people worldwide every day.
2. CentOS
- CentOS is an Enterprise-class Linux Distribution derived from sources freely
provided to the public by a prominent North American Enterprise Linux vendor.
3. Redhat
2. R
1. R - R programming language for dealing with Machine Learning concepts
2. RStudio - RSTudio One only well-known IDE for R
3. Hadoop -
1. Hadoop
- Hadoop is Open Source and Big Data Solution. Since its little bit hard to install
Hadoop with its components, I would like to suggest you to try classic Hadoop
Distribution provided by HortonWorks, Cloudera, mapR or Amazon EMR.
There are possibly five ways to use R and Hadoop together. Let's lookup ahead on R and
Hadoop integration -
1. RHadoop - RHadoop is a great open source solution for R and Hadoop provided by
Revolution Analytics. RHadoop is bundled with four main R packages to manage and
analyze the data with Hadoop framework.
2 / 3
3. Pingax
Big Data Analytics with R and Hadoop
http://pingax.com
2. RHIPE - RHIPE is the R and Hadoop Integrated Programming Environment specially
designed with Divide and Recombine (D&R) techniques to analyze the large datasets.
3. ORCH - ORCH is Oracle R connector for Hadoop. ORCH can be used on the Oracle Big
Data Appliance or on non-Oracle Hadoop clusters.
4. HadoopStreaming - Hadoopstreaming utilities as R scripts which is R packages
available at CRAN. This R package is developed by David S. Rosenberg with the
consideration of making this Hadoop Streaming more easy as possible for R users.
5. Hadoop Streaming - Hadoop Streamingis Hadoop utility which allows users to develop
and run MapReduce program in language other than java.
In the next of my blogs, I am writing on How Machine Learning can be performed with Big Data
platform R and Hadoop. If you want me to write on a particular Tools and Technologies can be
used for doing the same, let me know.
Powered by TCPDF (www.tcpdf.org)
3 / 3