2. Just like any other join functions in apache hive, pig, sql, excel. R also comes
with the feature of joining two or more datasets using the same universal
join concepts:
Inner join
Outer join – Left outer join, right outer join and full outer join
Inner join: is also known as equijoin which returns rows when there is a match
in both the tables of the dataset based on a common key or a value.
#load the dataset
>join1<-read.csv(join1.csv, header = TRUE)
>join2<-read.csv(join2.csv, header = TRUE)
#join the two tables based on Transcation_ID
>joined<-merge(x=join1,y=join2,by="Transaction_ID")
>View(joined)
Joins: Inner and Outer
3. Full outer join: returns all the rows from both
the tables irrespective of any match between the tables.
In other words it means it will join even if there is no match in the common key.
Example:
>joined_full<-merge(x=join1,y=join2,by="Transaction_ID“, all = TRUE);
>View(joined_full)
Full Outer Join
ID Name Dept.
202 Bob Eng.
203 Vika Admin
207 Ryan IT
209 Paul IT
ID Name Dept.
202 Bob Eng.
203 Vika Admin
204 Chris Med.
205 Robin Med.
ID Name Dept.
202 Bob Eng.
203 Vika Admin
204 Chris Med.
205 Robin Med.
206 Ryan IT
209 Paul IT
ID
Based on
Rupak Roy
4. Left outer join returns all the rows of left table
and only the matching rows of the right table.
For example:
#to apply left outer join set all.x = TRUE
>joined_left<-merge(x=join1,y=join2, by="Transaction_ID", all.x = TRUE)
>View(joined_left)
Left Outer Join
ID Name Dept.
202 Bob Eng.
203 Vika Admin
207 Ryan IT
209 Paul IT
ID Name Dept.
202 Bob Eng.
203 Vika Admin
204 Chris Med.
205 Robin Med.
ID Name Dept.
202 Bob Eng.
203 Vika Admin
206 Ryan IT
209 Paul IT
ID
Based on
Rupak Roy
5. Right outer join is the opposite of left join. It returns
all the rows of right table and only the matching rows
of the left table.
For example:
#to apply right outer join set all.y = TRUE
>joined_right<-merge(x=join1,y=join2, by="Transaction_ID", all.y = TRUE)
>View(joined_right)
Right Outer Join
ID Name Dept.
202 Bob Eng.
203 Vika Admin
207 Ryan IT
209 Paul IT
ID Name Dept.
202 Bob Eng.
203 Vika Admin
204 Chris Med.
205 Robin Med.
ID Name Dept.
202 Bob Eng.
203 Vika Admin
204 Chris Med.
205 Robin Med.
ID
Based on
Rupak Roy
6. Merge() is one of the important function to join different datasets.
To know more about the functions of merge() use ?merge
Merging Tables
Rupak Roy
7. Next:
We will see how to impute the missing values.
Merging tables
Rupak Roy