SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
RHive tutorial - HDFS functions
Hive uses Hadoop’s system to process distributed file systems.
Thus, in order to expertly use Hive and RHive,
you must be able to do things along the lines of using HDFS to put, get, and
remove big data.
RHive possesses Functions that correspond to what the “hadoop fs”
command supports.
Using these Functions, a user can in R environment handle HDFS without
using HADOOP CLI(command line interface) or Hadoop HDFS library.
If you find yourself more comfortable with using “hadoop”’s CLI or Hadoop
library then it is also fine to use them.
But if you are not familiar with using Rstudio server or working from a terminal,
RHive HDFS Functions should prove to be easy-to-use solutions in handling
HDFS for R users.

Before Emulating this Example
rhive.hdfs.* Functions work after RHive has successfully been installed and
library(Rhive) and rhive.connect are successfully executed.
Let’s not forget to do the following before emulating the example.

#	
  Open	
  R	
  
library(RHive)	
  
rhive.connect()	
  


rhive.hdfs.connect
In order to use RHive Functions to use HDFS, a connection to hdfs must be
established.
But if the Hadoop configuration for HDFS is properly set and rhive.connect
Function is executed, then this Function will automatically be
processed/executed* so there is no need to have this separately executed.

If you need to connect to a different HDFS then you can do it like this:

rhive.hdfs.connect("hdfs://10.1.1.1:9000")	
  
[1]	
  "Java-­‐Object{DFS[DFSClient[clientName=DFSClient_630489789,	
  
ugi=root]]}"	
  
The connection will fail to establish itself if you do not insert the exact
hostname and port number servicing HDFS.
Ask the system manager if you do not have this information.

rhive.hdfs.ls
This does the same thing as "hadoop fs -ls" and this is used like this.

rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  13:16	
  
/benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  
03:59	
  	
  	
  /messages	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  /rhive	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
20:19	
  	
  	
  	
  	
  	
  	
  	
  /tmp	
  
7	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  /user	
  

This is the same as the command which uses Hadoop CLI.

hadoop	
  fs	
  -­‐ls	
  /	
  


rhive.hdfs.get
The rhive.hdfs.get Function’s role is to bring the data in HDFS to local.
This functions in the same way as "hadoop fs -get".
The next example entails taking messages data in HDFS and saving them to
local system’s /tmp/messages, then checking the number of Records.

rhive.hdfs.get("/messages",	
  "/tmp/messages")	
  
[1]	
  TRUE	
  
system("wc	
  -­‐l	
  /tmp/messages")	
  
145889	
  /tmp/messages	
  


rhive.hdfs.put
The rhive.hdfs.put Function uploads all data in local to HDFS.
This functions like "hadoop fs -put" and opposite of rhive.hdfs.get.
The following example uploads the “/tmp/messages” in local system to
“/messages_new” in HDFS.

rhive.hdfs.put("/tmp/messages",	
  "/messages_new")	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
13:16	
  	
  	
  /benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  
03:59	
  	
  	
  	
  	
  /messages	
  
4	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐14	
  02:02	
  
/messages_new	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  	
  	
  /rhive	
  
7	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  	
  	
  /user	
  

You can see a new file, "/messages_new", now appears in HDFS.

rhive.hdfs.rm
This does the same thing as "hadoop fs -rm", deleting files in HDFS.
rhive.hdfs.rm("/messages_new")	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  13:16	
  
/benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  
03:59	
  	
  	
  /messages	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  /rhive	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  /user	
  

You can see the "/messages_new" file has been deleted from within HDFS.

rhive.hdfs.rename
This does the same thing as "hadoop fs -mv".
That is, it changes the file name for files in HDFS or moves directories.

rhive.hdfs.rename("/messages",	
  "/messages_renamed")	
  
[1]	
  TRUE	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
13:16	
  	
  	
  	
  	
  	
  	
  /benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  03:59	
  
/messages_renamed	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /rhive	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /user	
  




rhive.hdfs.exists
This checks whether a file exists within HDFS. There is no corresponding
command hadoop that serves as a counterpart.

rhive.hdfs.exists("/messages_renamed")	
  
[1]	
  TRUE	
  
rhive.hdfs.exists("/foobar")	
  
[1]	
  FALSE	
  


rhive.hdfs.mkdirs
This does the same thing as "hadoop fs -mkdir".
This makes directories in HDFS, even subdirectories.

rhive.hdfs.mkdirs("/newdir/newsubdir")	
  
[1]	
  TRUE	
  
rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
13:16	
  	
  	
  	
  	
  	
  	
  /benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  03:59	
  
/messages_renamed	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
02:13	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /newdir	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐13	
  
20:24	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /rhive	
  
7	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  
01:14	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  /user	
  
rhive.hdfs.ls("/newdir")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐14	
  02:13	
  
/newdir/newsubdir	
  


rhive.hdfs.close
This is used to close the connection when you have completed using HDFS
and no longer need to use it.

rhive.hdfs.close()	
  

Más contenido relacionado

La actualidad más candente

101 2.4b use debian package management v2
101 2.4b use debian package management v2101 2.4b use debian package management v2
101 2.4b use debian package management v2Acácio Oliveira
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testingGaruda Trainings
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBaseAmal Abid
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guideNaveed Bashir
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installationhabeebulla g
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성Young Pyo
 
Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16Enrique Davila
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)Bopyo Hong
 
Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Biju Thomas
 

La actualidad más candente (16)

101 2.4b use debian package management v2
101 2.4b use debian package management v2101 2.4b use debian package management v2
101 2.4b use debian package management v2
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Unix commands in etl testing
Unix commands in etl testingUnix commands in etl testing
Unix commands in etl testing
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBase
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16Installing hadoop on ubuntu 16
Installing hadoop on ubuntu 16
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2
 

Destacado

RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functionsAiden Seonghak Hong
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceAiden Seonghak Hong
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache SparkGustavo Arjones
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
 

Destacado (9)

RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functions
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduce
 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
Introduccion a Apache Spark
Introduccion a Apache SparkIntroduccion a Apache Spark
Introduccion a Apache Spark
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
 

Similar a RHive tutorial - HDFS functions

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2SATOSHI TAGOMORI
 
5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTXMiguel720844
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS AppendYue Chen
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemBhavesh Padharia
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)Amir Sedighi
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands SimoniShah6
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 

Similar a RHive tutorial - HDFS functions (20)

Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2
 
RHadoop - beginners
RHadoop - beginnersRHadoop - beginners
RHadoop - beginners
 
Unix Basics Commands
Unix Basics CommandsUnix Basics Commands
Unix Basics Commands
 
5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX5c_BigData_Hadoop_HDFS.PPTX
5c_BigData_Hadoop_HDFS.PPTX
 
Inside HDFS Append
Inside HDFS AppendInside HDFS Append
Inside HDFS Append
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
 
Basics of Linux
Basics of LinuxBasics of Linux
Basics of Linux
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 

Más de Aiden Seonghak Hong

RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치Aiden Seonghak Hong
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치Aiden Seonghak Hong
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치Aiden Seonghak Hong
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스Aiden Seonghak Hong
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수Aiden Seonghak Hong
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수Aiden Seonghak Hong
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수Aiden Seonghak Hong
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정Aiden Seonghak Hong
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveAiden Seonghak Hong
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 

Más de Aiden Seonghak Hong (12)

IoT and Big data with R
IoT and Big data with RIoT and Big data with R
IoT and Big data with R
 
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
 
R hive tutorial 1
R hive tutorial 1R hive tutorial 1
R hive tutorial 1
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing Hive
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 

Último

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

RHive tutorial - HDFS functions

  • 1. RHive tutorial - HDFS functions Hive uses Hadoop’s system to process distributed file systems. Thus, in order to expertly use Hive and RHive, you must be able to do things along the lines of using HDFS to put, get, and remove big data. RHive possesses Functions that correspond to what the “hadoop fs” command supports. Using these Functions, a user can in R environment handle HDFS without using HADOOP CLI(command line interface) or Hadoop HDFS library. If you find yourself more comfortable with using “hadoop”’s CLI or Hadoop library then it is also fine to use them. But if you are not familiar with using Rstudio server or working from a terminal, RHive HDFS Functions should prove to be easy-to-use solutions in handling HDFS for R users. Before Emulating this Example rhive.hdfs.* Functions work after RHive has successfully been installed and library(Rhive) and rhive.connect are successfully executed. Let’s not forget to do the following before emulating the example. #  Open  R   library(RHive)   rhive.connect()   rhive.hdfs.connect In order to use RHive Functions to use HDFS, a connection to hdfs must be established. But if the Hadoop configuration for HDFS is properly set and rhive.connect Function is executed, then this Function will automatically be processed/executed* so there is no need to have this separately executed. If you need to connect to a different HDFS then you can do it like this: rhive.hdfs.connect("hdfs://10.1.1.1:9000")   [1]  "Java-­‐Object{DFS[DFSClient[clientName=DFSClient_630489789,   ugi=root]]}"  
  • 2. The connection will fail to establish itself if you do not insert the exact hostname and port number servicing HDFS. Ask the system manager if you do not have this information. rhive.hdfs.ls This does the same thing as "hadoop fs -ls" and this is used like this. rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27        /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07  13:16   /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06   03:59      /messages   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                /mnt   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24            /rhive   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   20:19                /tmp   7    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14              /user   This is the same as the command which uses Hadoop CLI. hadoop  fs  -­‐ls  /   rhive.hdfs.get The rhive.hdfs.get Function’s role is to bring the data in HDFS to local. This functions in the same way as "hadoop fs -get". The next example entails taking messages data in HDFS and saving them to local system’s /tmp/messages, then checking the number of Records. rhive.hdfs.get("/messages",  "/tmp/messages")  
  • 3. [1]  TRUE   system("wc  -­‐l  /tmp/messages")   145889  /tmp/messages   rhive.hdfs.put The rhive.hdfs.put Function uploads all data in local to HDFS. This functions like "hadoop fs -put" and opposite of rhive.hdfs.get. The following example uploads the “/tmp/messages” in local system to “/messages_new” in HDFS. rhive.hdfs.put("/tmp/messages",  "/messages_new")   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                    file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27            /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   13:16      /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06   03:59          /messages   4    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐14  02:02   /messages_new   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                    /mnt   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24                /rhive   7    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14                  /user   You can see a new file, "/messages_new", now appears in HDFS. rhive.hdfs.rm This does the same thing as "hadoop fs -rm", deleting files in HDFS.
  • 4. rhive.hdfs.rm("/messages_new")   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27        /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07  13:16   /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06   03:59      /messages   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                /mnt   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24            /rhive   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14              /user   You can see the "/messages_new" file has been deleted from within HDFS. rhive.hdfs.rename This does the same thing as "hadoop fs -mv". That is, it changes the file name for files in HDFS or moves directories. rhive.hdfs.rename("/messages",  "/messages_renamed")   [1]  TRUE   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                            file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27                    /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   13:16              /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06  03:59   /messages_renamed   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                            /mnt  
  • 5. 5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13   20:24                        /rhive   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14                          /user   rhive.hdfs.exists This checks whether a file exists within HDFS. There is no corresponding command hadoop that serves as a counterpart. rhive.hdfs.exists("/messages_renamed")   [1]  TRUE   rhive.hdfs.exists("/foobar")   [1]  FALSE   rhive.hdfs.mkdirs This does the same thing as "hadoop fs -mkdir". This makes directories in HDFS, even subdirectories. rhive.hdfs.mkdirs("/newdir/newsubdir")   [1]  TRUE   rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                            file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27                    /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   13:16              /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06  03:59   /messages_renamed   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                            /mnt   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   02:13                      /newdir   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐13  
  • 6. 20:24                        /rhive   7    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐14   01:14                          /user   rhive.hdfs.ls("/newdir")      permission  owner            group  length            modify-­‐ time                            file   1    rwxr-­‐xr-­‐x    root  supergroup            0  2011-­‐12-­‐14  02:13   /newdir/newsubdir   rhive.hdfs.close This is used to close the connection when you have completed using HDFS and no longer need to use it. rhive.hdfs.close()