Más contenido relacionado La actualidad más candente (20) Similar a Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis (20) Process and Visualize Your Data with Revolution R, Hadoop and GoogleVis1. Quick House Keeping Rule
• Q&A panel is available if you have any questions during the
webinar
• There will be time for Q&A at the end
• We will record the webinar for future viewing
• All attendees will receive a copy of the slides an recording
Page 1
© Hortonworks Inc. 2013
2. Hadoop, R, and Google Chart Tools
Data Visualization for Application Developers
Jeff Markham
Solution Engineer
jmarkham@hortonworks.com
© Hortonworks Inc. 2013
3. Agenda
• Introductions
• Use Case Description
• Preparation
• Demo
• Review
• Q&A
Page 3
© Hortonworks Inc. 2013
4. Use Case Description
• Visualizing data
• Tools vs. application development
• Choosing the technology
• Hortonworks Data Platform
• RHadoop
• Google Charts
Page 4
© Hortonworks Inc. 2013
5. Preparation: Install HDP
OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage &
AMBARI FLUME Store, HIVE
PIG
Operate at Process and HBASE Enterprise Hadoop
Scale SQOOP Access Data
OOZIE HCATALOG
• The ONLY 100% open source
WEBHDFS
Distributed MAP REDUCE and complete distribution
HADOOP CORE Storage & Processing (in 2.0)
HDFS YARN
PLATFORM SERVICES Enterprise Readiness: HA,
DR, Snapshots, Security, …
• Enterprise grade, proven and
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
ensure interoperability
OS Cloud VM Appliance
Page 5
© Hortonworks Inc. 2013
6. Preparation: Install R
• Install R language
• Install appropriate packages
– rhdfs
– rmr2
– googleVis
– shiny
– Dependencies for all above
Page 6
© Hortonworks Inc. 2013
7. Preparation
• rmr2
– Functions to allow for MapReduce in R apps
• rhdfs
– Functions allowing HDFS access in R apps
• googleVis
– Use of Google Chart Tools in R apps
• shiny
– Interactive web apps for R developers
Page 7
© Hortonworks Inc. 2013
8. Demo Walkthrough
Using Hadoop, R, and Google Chart Tools
© Hortonworks Inc. 2012
9. Visualization Use Case
• Data from CDC
– Vital statistics publicly available data
– 2010 US birth data file
S 201001 7 2 2 30105
2 011 06 1 123 3405 1 06 01 2 2
SAMPLE RECORD
0321 1006 314 2000 2 222 22
2 2 2 122222 11 3 094 1 M 04 200940 39072 3941
083 22 2 2 22 110 110 00
0000000 00 000000000 000000 000 000000000000000000011
101 1 111 10 1 1 1 111111 11 1 1 11
source: http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm
Page 9
© Hortonworks Inc. 2013
10. Visualization Use Case
• Put data into HDFS
– Create input directory
– Put data into input directory
CREATE HDFS DIR
> hadoop fs –mkdir /user/jeff/natality
PUT DATA INTO HDFS
> hadoop fs –put ~/VS2010NATL.DETAILUS.DAT
/user/jeff/natality/
Page 10
© Hortonworks Inc. 2013
11. Visualization Use Case
• Write R script
– Specify use of RHadoop packages
– Initialize HDFS
– Specify data input and output location
#!/usr/bin/env Rscript
require('rmr2')
require('rhdfs')
hdfs.init()
R SCRIPT
hdfs.data.root = 'natality'
hdfs.data = file.path(hdfs.data.root, 'VS2010NATL.DETAILUS.DAT')
hdfs.out.root = hdfs.data.root
hdfs.out = file.path(hdfs.out.root, 'out')
...
Page 11
© Hortonworks Inc. 2013
12. Visualization Use Case
• Write R script
– Write mapper function
– Write reducer function
...
mapper = function(k, fields) {
keyval(as.integer(substr(fields, 89, 90)),1)
}
R SCRIPT
reducer = function(key, vv) {
# count values for each key
keyval(key, sum(as.numeric(vv),na.rm=TRUE))
}
...
Page 12
© Hortonworks Inc. 2013
13. Visualization Use Case
• Write R script
– Write job function
...
job = function (input, output) {
mapreduce(input = input,
output = output,
R SCRIPT
input.format = "text",
map = mapper,
reduce = reducer,
combine = T)
}...
Page 13
© Hortonworks Inc. 2013
14. Visualization Use Case
• Write R script
– Write result to HDFS output directory
...
R SCRIPT
out = from.dfs(job(hdfs.data, hdfs.out))
results.df = as.data.frame(out,stringsAsFactors=F)
Page 14
© Hortonworks Inc. 2013
15. Visualization Use Case
• Create Shiny application
– Create directory
– Create ui.R
– Create server.R
SHINY APP DIR
> mkdir ~/my-shiny-app
Page 15
© Hortonworks Inc. 2013
16. Visualization Use Case
• Create Shiny application
– Create ui.R
shinyUI(pageWithSidebar(
# Application title
headerPanel("2010 US Births"),
sidebarPanel(. . .),
UI.R SOURCE
mainPanel(
tabsetPanel(
tabPanel("Line Chart", htmlOutput("lineChart")),
tabPanel("Column Chart", htmlOutput("columnChart"))
)
)
))
Page 16
© Hortonworks Inc. 2013
17. Visualization Use Case
• Create Shiny application
– Create server.R
library(googleVis)
library(shiny)
library(rmr2)
library(rhdfs)
SERVER.R SOURCE
hdfs.init()
hdfs.data.root = 'natality'
hdfs.data = file.path(hdfs.data.root, 'out')
df = as.data.frame(from.dfs(hdfs.data))
...
Page 17
© Hortonworks Inc. 2013
18. Visualization Use Case
• Create Shiny application
– Create server.R
...
shinyServer(function(input, output) {
output$lineChart <- renderGvis({
SERVER.R SOURCE
gvisLineChart(df, options=list(
vAxis="{title:'Number of Births'}",
hAxis="{title:'Age of Mother'}",
legend="none"
))
})
...
Page 18
© Hortonworks Inc. 2013
19. Visualization Use Case
• Run Shiny application
> shiny::runApp('~/my-shiny-app')
Loading required package: shiny
Welcome to googleVis version 0.4.0
RUN SHINY APP
...
HADOOP_CMD=/usr/bin/hadoop
Be sure to run hdfs.init()
Listening on port 8100
Page 19
© Hortonworks Inc. 2013
21. Demo Live
Using Hadoop, R, and Google Chart Tools
© Hortonworks Inc. 2012
22. Visualization Use Case
• Architecture recap
– Analyze data sets with R on Hadoop
– Choose RHadoop packages
– Visualize data with Google Chart Tools via googleVis package
– Render googleVis output in Shiny applications
• Architecture next steps
– Integrate Shiny application into existing web apps
– Create further data models with R
Page 22
© Hortonworks Inc. 2013
23. HDP: Enterprise Hadoop Distribution
OPERATIONAL DATA Hortonworks
SERVICES SERVICES
Data Platform (HDP)
Manage &
AMBARI FLUME Store, HIVE
PIG
Operate at Process and HBASE Enterprise Hadoop
Scale SQOOP Access Data
OOZIE HCATALOG
• The ONLY 100% open source
WEBHDFS
Distributed MAP REDUCE and complete distribution
HADOOP CORE Storage & Processing (in 2.0)
HDFS YARN
PLATFORM SERVICES Enterprise Readiness: HA,
DR, Snapshots, Security, …
• Enterprise grade, proven and
tested at scale
HORTONWORKS
DATA PLATFORM (HDP) • Ecosystem endorsed to
ensure interoperability
OS Cloud VM Appliance
Page 23
© Hortonworks Inc. 2013
Notas del editor Hi, I’m Jeff Markham and I wanted to talk today about Agenda points Describe the use case and how to choose the tech Start by installing HDP Install R and dependencies Go into more detail on the R packages Walk through the demo before actually doing the demo Describe the data set Start with the very beginning: getting the downloaded data into Hadoop Start explaining the R script. Kick it off with explanation of RHadoop packages and what they’re doing Explain the mapper and reducer functions Explain the job function Wrap up with showing where the data lands Show how to create the Shiny app. Start with creating the directory. This the entirety of the Shiny UI. Help text in the sidebar is omitted for real estate. Explain the server.R code. Note the imports of the relevant R packages. Move to one of the functions that describes how Shiny wraps googleVis which wraps Google Chart Tools Show how to kick off the Shiny app and note the listening port Go to the browser and view the Shiny app Cut to the live demo. Recap what we just saw and suggest possible future steps to further develop the app Hammer home HDP as the bedrock for the app Suggest getting started with the Sandbox Wrap up with Q & A