SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Application Programming Interfaces
Why?
I want my code to have access to your code or data... from a different
computer!
    we might be using different operating systems!

    different programming languages!

    have different compression capabilites!

    security!

    etc.

At least you don't have to install tons of code or download all of the data.
The Internet Suggests a Solution
HyperText Transfer Protocol: HTTP

    Since the WWW has caught on, HTTP has become a dominant protocol.

    Pretty much all computers support some kind of HTTP client

    Browsers are just fancy HTTP clients

    R can be a client too!

Duncan Temple Lang's RCurl package offers R access to libcurl, a popular HTTP library.
But what data will we transfer?
HTTP gives us a nearly universal way to pass data between machines, now we have to decide what format
messages ought to have.

    Let's choose something lightweight and human readable

        (so no XML :p)

    but it should be something easily serializable, and should have some structure

        JSON is the popular choice
JSON
JSON looks like this:

 1 {
 2     "hello"        :   "world",
 3     "universe"     :   42,
 4     "pizza"        :   nil,
 5     "cookies"      :   ["chocolate", "molasses", "oatmeal"],
 6     "eggs"         :   {
 7                            "over" : "easy"
 8                        }
 9 }

JSON has types, can be nested, and has analogies (e.g. 'dicts' or 'hashes' or 'maps') in most major programming
languages.

smells like a list in R

The JSONIO , also by Duncan Temple Lang, takes R lists to and from their JSON representations.
Numerous Examples
Computational
    geocoding, Google, et al.

    face-recognition, face.com

    prediction, Google

Data
    Federal Register

    Bloomberg

"Data APIs/feeds available as packages in R"
asked on stats.exchange.com a couple of months ago. The list of packages included:

quantmod , tseries , flmport , WSI , RGoogleTrends , RGoogleDocs , twitteR , Zillow , RNYTimes ,
UScensus2000 , infochimps , rdatamarket , factualR , RDSTK , RBloomberg , LIM , RTAQ , IBrokers ,
rnpn , RClimate
API example: TopicWatch


TopicWatch is a platform for text analytics and visualization

    currently developing 3 interfaces to the API:

        iPad app

        web app

        R package

We collect streaming data from a variety of sources including Twitter, RSS feeds, government publications,
and others.
API Outline
The API is still under development, and is unstable. We're always adding new features and polishing old ones.
Just a few concrete capabilites that are already running:

    time series of n-gram frequencies & counts

        aggregated at several resolutions

    n-grams ranked by frequency

        also aggregated a several resolution

        can be filtered by sub grams

    raw documents that contain a gram

    topics that contain a gram

    time series counts of documents that contain co-occurring n-grams

    ranking grams by usage change between any two times
TopicWatchr
The R package is thin wrapper for the HTTP API. It (unsurprisingly) works
by
   sending a request to a URL

   parsing JSON results

   re-arranging lists into data frames

But it has some nice functionality to make working with the API a bit
smoother:
   parses timestamps in data

   paginates large requests automatically

   handles authentication
Example 1: Presidential Candidates
Code to get data:

1   library(TopicWatchr)
2   set_credentials("PRUG", "12345")
3
4   candidates <- c("Herman Cain", "Mitt Romney", "Rick Perry",
5                   "Newt Gingrich", "Ron Paul", "Michelle Bachmann",
6                   "Jon Huntsman", "Rick Santorum")
7
8   twitter_counts <- wordCounts("twitter_sample", candidates)
9   rss_counts     <- wordCounts("rss-majorpapers", candidates)

The wordCounts function constructs the proper API call, makes the call, and arranges the results into a data
frame. Each data frame looks like this:

'data.frame':   5 obs. of 9 variables:
$ times            : POSIXct, format: "2011-11-15 08:00:00" "2011-11-15 08:30:00" ...
$ Herman Cain      : num 0 0.00148 0 0.00326 0.00274
$ Mitt Romney      : num 0 0.00148 0 0.00326 0.00548
$ Rick Perry       : num 0 0.00148 0 0 0
$ Newt Gingrich    : num 0 0.00148 0 0.00326 0
$ Ron Paul         : num 0 0 0 0 0
$ Michelle Bachmann: num 0 0 0 0 0
$ Jon Huntsman     : num 0 0 0 0 0
$ Rick Santorum    : num 0 0.00148 0 0 0


Then we combine data frames and polish with ggplot2 ...
Final Result
Example 2: Likely Phrase Generator
 1   lastGram <- function(g){
 2            strsplit(g, " ")[[1]][[2]]
 3   }
 4
 5   vc <- topGrams("twitter_sample",
 6                  filter=first, limit=1,
 7                  m=1, n=2, prefix=TRUE,
 8                  resolution="daily")$gram
 9
10   phrase <- c()
11
12   for (i in 1:i){
13       vc <- lastGram(vc)
14       phrase <- c(phrase, vc)
15       vc <- topGrams(twsrc, filter=vc, limit=1, m=1, n=2,
16                  prefix=TRUE, dev_server=TRUE,
17                  resolution="daily")$gram
18   }
`Likely' phrases from earlier today:
Twitter: "im going back :) lt3 please follow back :) lt3 please"

Technology RSS feeds: "user interface displays users click scheme federal trade commission ftc antitrust
complaint outside occupy wall street"

same source, seeded with the word "statistics": "statistics showing highlights google apps like behavioral
advertising refers obliquely suggested session sounded viable business edition"

Politics RSS feeds: "washington university battleground poll numbers superfan badge request may become
president obama administration asked whether congress approval"

Major papers RSS feeds: "percent stake throughout california chapter 11 years ago effectively sealed george
w street movement prefers birds early"

Federal Register: "revision incorporates provisions related investigative actions could result based upon fresh
prunes grown ornamentals ca fip"
Feeling Adventurous?
We're looking for beta testers for the R package! In Shackleton's words, what to expect:

...BITTER COLD, LONG MONTHS OF COMPLETE DARKNESS, CONSTANT DANGER, SAFE RETURN DOUBTFUL...

But it can still be fun! You can talk with me about it, or get in touch later at

homer@luckysort.com
That's all!
Thanks for listening. Questions?

Más contenido relacionado

Destacado

Zing Me - Build brand engagement with Zing Me
Zing Me - Build brand engagement with Zing MeZing Me - Build brand engagement with Zing Me
Zing Me - Build brand engagement with Zing Mezingopen
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zingzingopen
 
Distributed search solutions and comparison
Distributed search   solutions and comparison Distributed search   solutions and comparison
Distributed search solutions and comparison zingopen
 
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Cloudera, Inc.
 
Zing Me Platform Policy
Zing Me Platform PolicyZing Me Platform Policy
Zing Me Platform Policyzingopen
 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)Portland R User Group
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakShelly Sanchez Terrell
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

Destacado (11)

R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
Zing Me - Build brand engagement with Zing Me
Zing Me - Build brand engagement with Zing MeZing Me - Build brand engagement with Zing Me
Zing Me - Build brand engagement with Zing Me
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
Distributed search solutions and comparison
Distributed search   solutions and comparison Distributed search   solutions and comparison
Distributed search solutions and comparison
 
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
 
Zing Me Platform Policy
Zing Me Platform PolicyZing Me Platform Policy
Zing Me Platform Policy
 
"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)"R & Text Analytics" (15 January 2013)
"R & Text Analytics" (15 January 2013)
 
Inaugural Addresses
Inaugural AddressesInaugural Addresses
Inaugural Addresses
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & Textspeak
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similar a R, HTTP, and APIs, with a preview of TopicWatchr

Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioCHAKER ALLAOUI
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddlerholiman
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverDataWorks Summit
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Big Data Spain
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationNacho Caballero
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking VN
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactJean-Paul Calbimonte
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...Markus Scheidgen
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Making Machine Learning Easy with H2O and WebFlux
Making Machine Learning Easy with H2O and WebFluxMaking Machine Learning Easy with H2O and WebFlux
Making Machine Learning Easy with H2O and WebFluxTrayan Iliev
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jCongressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jWilliam Lyon
 
Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015Databricks
 
[Reactive] Programming with [Rx]ROS
[Reactive] Programming with [Rx]ROS[Reactive] Programming with [Rx]ROS
[Reactive] Programming with [Rx]ROSAndrzej Wasowski
 

Similar a R, HTTP, and APIs, with a preview of TopicWatchr (20)

Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddler
 
Hadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game ForeverHadoop & Hive Change the Data Warehousing Game Forever
Hadoop & Hive Change the Data Warehousing Game Forever
 
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
Apache flink: data streaming as a basis for all analytics by Kostas Tzoumas a...
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualization
 
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
Grokking TechTalk #33: Architecture of AI-First Systems - Engineering for Big...
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Making Machine Learning Easy with H2O and WebFlux
Making Machine Learning Easy with H2O and WebFluxMaking Machine Learning Easy with H2O and WebFlux
Making Machine Learning Easy with H2O and WebFlux
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jCongressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
 
Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015Spark Community Update - Spark Summit San Francisco 2015
Spark Community Update - Spark Summit San Francisco 2015
 
[Reactive] Programming with [Rx]ROS
[Reactive] Programming with [Rx]ROS[Reactive] Programming with [Rx]ROS
[Reactive] Programming with [Rx]ROS
 

Último

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

R, HTTP, and APIs, with a preview of TopicWatchr

  • 1.
  • 2. Application Programming Interfaces Why? I want my code to have access to your code or data... from a different computer! we might be using different operating systems! different programming languages! have different compression capabilites! security! etc. At least you don't have to install tons of code or download all of the data.
  • 3. The Internet Suggests a Solution HyperText Transfer Protocol: HTTP Since the WWW has caught on, HTTP has become a dominant protocol. Pretty much all computers support some kind of HTTP client Browsers are just fancy HTTP clients R can be a client too! Duncan Temple Lang's RCurl package offers R access to libcurl, a popular HTTP library.
  • 4. But what data will we transfer? HTTP gives us a nearly universal way to pass data between machines, now we have to decide what format messages ought to have. Let's choose something lightweight and human readable (so no XML :p) but it should be something easily serializable, and should have some structure JSON is the popular choice
  • 5. JSON JSON looks like this: 1 { 2 "hello" : "world", 3 "universe" : 42, 4 "pizza" : nil, 5 "cookies" : ["chocolate", "molasses", "oatmeal"], 6 "eggs" : { 7 "over" : "easy" 8 } 9 } JSON has types, can be nested, and has analogies (e.g. 'dicts' or 'hashes' or 'maps') in most major programming languages. smells like a list in R The JSONIO , also by Duncan Temple Lang, takes R lists to and from their JSON representations.
  • 6. Numerous Examples Computational geocoding, Google, et al. face-recognition, face.com prediction, Google Data Federal Register Bloomberg "Data APIs/feeds available as packages in R" asked on stats.exchange.com a couple of months ago. The list of packages included: quantmod , tseries , flmport , WSI , RGoogleTrends , RGoogleDocs , twitteR , Zillow , RNYTimes , UScensus2000 , infochimps , rdatamarket , factualR , RDSTK , RBloomberg , LIM , RTAQ , IBrokers , rnpn , RClimate
  • 7. API example: TopicWatch TopicWatch is a platform for text analytics and visualization currently developing 3 interfaces to the API: iPad app web app R package We collect streaming data from a variety of sources including Twitter, RSS feeds, government publications, and others.
  • 8. API Outline The API is still under development, and is unstable. We're always adding new features and polishing old ones. Just a few concrete capabilites that are already running: time series of n-gram frequencies & counts aggregated at several resolutions n-grams ranked by frequency also aggregated a several resolution can be filtered by sub grams raw documents that contain a gram topics that contain a gram time series counts of documents that contain co-occurring n-grams ranking grams by usage change between any two times
  • 9. TopicWatchr The R package is thin wrapper for the HTTP API. It (unsurprisingly) works by sending a request to a URL parsing JSON results re-arranging lists into data frames But it has some nice functionality to make working with the API a bit smoother: parses timestamps in data paginates large requests automatically handles authentication
  • 10. Example 1: Presidential Candidates Code to get data: 1 library(TopicWatchr) 2 set_credentials("PRUG", "12345") 3 4 candidates <- c("Herman Cain", "Mitt Romney", "Rick Perry", 5 "Newt Gingrich", "Ron Paul", "Michelle Bachmann", 6 "Jon Huntsman", "Rick Santorum") 7 8 twitter_counts <- wordCounts("twitter_sample", candidates) 9 rss_counts <- wordCounts("rss-majorpapers", candidates) The wordCounts function constructs the proper API call, makes the call, and arranges the results into a data frame. Each data frame looks like this: 'data.frame': 5 obs. of 9 variables: $ times : POSIXct, format: "2011-11-15 08:00:00" "2011-11-15 08:30:00" ... $ Herman Cain : num 0 0.00148 0 0.00326 0.00274 $ Mitt Romney : num 0 0.00148 0 0.00326 0.00548 $ Rick Perry : num 0 0.00148 0 0 0 $ Newt Gingrich : num 0 0.00148 0 0.00326 0 $ Ron Paul : num 0 0 0 0 0 $ Michelle Bachmann: num 0 0 0 0 0 $ Jon Huntsman : num 0 0 0 0 0 $ Rick Santorum : num 0 0.00148 0 0 0 Then we combine data frames and polish with ggplot2 ...
  • 12. Example 2: Likely Phrase Generator 1 lastGram <- function(g){ 2 strsplit(g, " ")[[1]][[2]] 3 } 4 5 vc <- topGrams("twitter_sample", 6 filter=first, limit=1, 7 m=1, n=2, prefix=TRUE, 8 resolution="daily")$gram 9 10 phrase <- c() 11 12 for (i in 1:i){ 13 vc <- lastGram(vc) 14 phrase <- c(phrase, vc) 15 vc <- topGrams(twsrc, filter=vc, limit=1, m=1, n=2, 16 prefix=TRUE, dev_server=TRUE, 17 resolution="daily")$gram 18 }
  • 13. `Likely' phrases from earlier today: Twitter: "im going back :) lt3 please follow back :) lt3 please" Technology RSS feeds: "user interface displays users click scheme federal trade commission ftc antitrust complaint outside occupy wall street" same source, seeded with the word "statistics": "statistics showing highlights google apps like behavioral advertising refers obliquely suggested session sounded viable business edition" Politics RSS feeds: "washington university battleground poll numbers superfan badge request may become president obama administration asked whether congress approval" Major papers RSS feeds: "percent stake throughout california chapter 11 years ago effectively sealed george w street movement prefers birds early" Federal Register: "revision incorporates provisions related investigative actions could result based upon fresh prunes grown ornamentals ca fip"
  • 14. Feeling Adventurous? We're looking for beta testers for the R package! In Shackleton's words, what to expect: ...BITTER COLD, LONG MONTHS OF COMPLETE DARKNESS, CONSTANT DANGER, SAFE RETURN DOUBTFUL... But it can still be fun! You can talk with me about it, or get in touch later at homer@luckysort.com
  • 15. That's all! Thanks for listening. Questions?