Enviar búsqueda
Cargar
Analyzing Power of Tweets in Predicting Commodity Futures
•
3 recomendaciones
•
1,166 vistas
Srivatsan Ramanujam
Seguir
Extracting signals from tweets to predict commodity futures.
Leer menos
Leer más
Datos y análisis
Denunciar
Compartir
Denunciar
Compartir
1 de 20
Descargar ahora
Descargar para leer sin conexión
Recomendados
All thingspython@pivotal
All thingspython@pivotal
Srivatsan Ramanujam
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
EMC
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Sarah Aerni
Recomendados
All thingspython@pivotal
All thingspython@pivotal
Srivatsan Ramanujam
Python Powered Data Science at Pivotal (PyData 2013)
Python Powered Data Science at Pivotal (PyData 2013)
Srivatsan Ramanujam
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
Pivotal: Data Scientists on the Front Line: Examples of Data Science in Action
EMC
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Sarah Aerni
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
TigerGraph
The MADlib Analytics Library
The MADlib Analytics Library
EMC
Open source analytics
Open source analytics
Ajay Ohri
Machine Learning with Hadoop
Machine Learning with Hadoop
Sangchul Song
Machine Learning and Hadoop
Machine Learning and Hadoop
Josh Patterson
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
TigerGraph
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
TigerGraph
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
TigerGraph
A sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
TigerGraph
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
Más contenido relacionado
La actualidad más candente
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
TigerGraph
The MADlib Analytics Library
The MADlib Analytics Library
EMC
Open source analytics
Open source analytics
Ajay Ohri
Machine Learning with Hadoop
Machine Learning with Hadoop
Sangchul Song
Machine Learning and Hadoop
Machine Learning and Hadoop
Josh Patterson
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
TigerGraph
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
TigerGraph
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Lukas Vlcek
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
TigerGraph
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
MLconf
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
PivotalOpenSourceHub
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
TigerGraph
A sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
TigerGraph
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
La actualidad más candente
(20)
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
The MADlib Analytics Library
The MADlib Analytics Library
Open source analytics
Open source analytics
Machine Learning with Hadoop
Machine Learning with Hadoop
Machine Learning and Hadoop
Machine Learning and Hadoop
Graph Databases and Machine Learning | November 2018
Graph Databases and Machine Learning | November 2018
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Graph Gurus 21: Integrating Real-Time Deep-Link Graph Analytics with Spark AI
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
Plume - A Code Property Graph Extraction and Analysis Library
Plume - A Code Property Graph Extraction and Analysis Library
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Apache HAWQ and Apache MADlib: Journey to Apache
Apache HAWQ and Apache MADlib: Journey to Apache
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
Graph Gurus Episode 8: Location, Location, Location - Geospatial Analysis wit...
A sql implementation on the map reduce framework
A sql implementation on the map reduce framework
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Similar a Analyzing Power of Tweets in Predicting Commodity Futures
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Esther Vasiete
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Ian Huston
Predicting Tweet Sentiment
Predicting Tweet Sentiment
Lucinda Linde
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Makoto Yui
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
Sonal Tiwari
Db tech show - hivemall
Db tech show - hivemall
Makoto Yui
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PyData
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
PivotalOpenSourceHub
Deep Learning for Recommender Systems
Deep Learning for Recommender Systems
Nick Pentreath
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
William Markito Oliveira
Introduction To R
Introduction To R
Spotle.ai
What is Chatgpt Complete Guide
What is Chatgpt Complete Guide
Ravendra Singh
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
TigerGraph
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
TigerGraph
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Databricks
Lambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
Similar a Analyzing Power of Tweets in Predicting Commodity Futures
(20)
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Predicting Tweet Sentiment
Predicting Tweet Sentiment
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
Db tech show - hivemall
Db tech show - hivemall
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
Deep Learning for Recommender Systems
Deep Learning for Recommender Systems
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Introduction To R
Introduction To R
What is Chatgpt Complete Guide
What is Chatgpt Complete Guide
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Lambda architecture for real time big data
Lambda architecture for real time big data
Último
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
suginr1
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
GovindSinghDasila
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
HyderabadDolls
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
Rajesh Mondal
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
Elaine Werffeli
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
RemarkSemacio
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
HyderabadDolls
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
HyderabadDolls
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
amy56318795
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
SOFTTECHHUB
Último
(20)
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rohtak [ 7014168258 ] Call Me For Genuine Models We...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
Analyzing Power of Tweets in Predicting Commodity Futures
1.
Analyzing the power
of Tweets in predicting Commodity Futures Mar 17, 2014 @gopivotal @being_bayesian Srivatsan Ramanujam Senior Data Scientist Pivotal © Copyright 2013 Pivotal. All rights reserved. 1
2.
Problem Definition Ÿ
Can we predict Corn, Soybean and Wheat futures based on Social Chatter on Twitter ? Ÿ The Customer: A major Agricultural Cooperative @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 2
3.
@gopivotal @being_bayesian Data
© Copyright 2013 Pivotal. All rights reserved. 3
4.
Obtaining Data Ÿ
Used to fetch 5-years of historical tweets matching any of a list of keywords of interest Tweets Table Poster Information @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 4
5.
GNIP @gopivotal @being_bayesian
Ÿ As plugged-in partners, we’ve worked with GNIP before, experience was great! Ÿ We needed historical data and GNIP’s Historical PowerTrack came in handy Ÿ Clean API, quick quotes, convenient to download results of historical jobs © Copyright 2013 Pivotal. All rights reserved. 5
6.
Grain Futures Vs.
Volume of Tweets @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 6
7.
The Platform @gopivotal
@being_bayesian © Copyright 2013 Pivotal. All rights reserved. 7
8.
Data Science Toolkit
Ÿ Appliance – Full Rack DCA with Greenplum Database Ÿ ETL – Python Ÿ Modeling – SQL – MADlib – PL/Python, PL/Java – Ark-Tweet-NLP1 with PL/Java Wrappers Ÿ Visualization – Tableau 1CMU ARK Twitter Parts-of-Speech tagger : http://www.ark.cs.cmu.edu/TweetNLP (GPL 2) @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 8
9.
Pivotal Greenplum MPP
DB @gopivotal @being_bayesian Think of it as multiple PostGreSQL servers Master Segments/Workers Rows are distributed across segments by a particular field (or randomly) © Copyright 2013 Pivotal. All rights reserved. 9
10.
PL/X : X
in {pgsql, R, Python, Java, Perl, C etc.} • Allows users to write Greenplum/ PostgreSQL functions in the R/Python/ Java, Perl, pgsql or C languages Standby Ÿ The interpreter/VM of the language ‘X’ is installed on each node of the Greenplum Database Cluster • Data Parallelism: - PL/X piggybacks on Greenplum’s MPP architecture @gopivotal @being_bayesian Master Segment Host Segment Segment … Master Host SQL Interconnect Segment Host Segment Segment Segment Host Segment Segment Segment Host Segment Segment © Copyright 2013 Pivotal. All rights reserved. 10
11.
Scalable, in-database ML
• Open Source!https://github.com/madlib/madlib • Works on Greenplum DB and PostgreSQL • Active development by Pivotal • Downloads and Docs: http://madlib.net/ @gopivotal @being_bayesian - Latest Release : 1.4 (Dec 2014) © Copyright 2013 Pivotal. All rights reserved. 11
12.
MADlib In-Database Functions
Predictive Modeling Library Generalized Linear Models • Linear Regression • Logistic Regression • Multinomial Logistic Regression • Cox Proportional Hazards • Regression • Elastic Net Regularization • Sandwich Estimators (Huber white, clustered, marginal effects) Matrix Factorization • Single Value Decomposition (SVD) • Low-Rank @gopivotal @being_bayesian Machine Learning Algorithms • Principal Component Analysis (PCA) • Association Rules (Affinity Analysis, Market Basket) • Topic Modeling (Parallel LDA) • Decision Trees • Ensemble Learners (Random Forests) • Support Vector Machines • Conditional Random Field (CRF) • Clustering (K-means) • Cross Validation Linear Systems • Sparse and Dense Solvers Descriptive Statistics Sketch-based Estimators • CountMin (Cormode- Muthukrishnan) • FM (Flajolet-Martin) • MFV (Most Frequent Values) Correlation Summary Support Modules Array Operations Sparse Vectors Random Sampling Probability Functions © Copyright 2013 Pivotal. All rights reserved. 12
13.
@gopivotal @being_bayesian The
Models © Copyright 2013 Pivotal. All rights reserved. 13
14.
The Approach •
In addition to identifying textual cues in tweets that were correlated with commodity futures, we also wanted to analyze whether tweet sentiment was correlated with commodity futures @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 14
15.
Sentiment Analysis –
Challenges Ÿ Language on Twitter doesn’t adhere to rules of grammar, syntax or spelling Ÿ We don’t have labeled data for our problem. The tweets aren’t tagged with sentiment Ÿ Semi-Supervised Sentiment Prediction can be achieved by dictionary look-ups of tokens in a Tweet, but without Context, Sentiment Prediction is futile! @gopivotal @being_bayesian “Cool” © Copyright 2013 Pivotal. All rights reserved. 15
16.
Sentiment Analysis –
Approach Ÿ Parallelized ArkTweetNLP to achieve fast parts-of-speech tagging on Tweets Ÿ Custom (patent pending) algorithm to extract contextual cues & score sentiment of tweets Semi-Supervised Sentiment Classification Phrase Extraction Break-up Tweets into tokens and tag their parts-of-speech Part-of-speech tagger1 1: Parts-of-speech Tagger : Gp-Ark-Tweet-NLP (http://vatsan.github.io/gp-ark-tweet-nlp/) @gopivotal @being_bayesian Phrasal Polarity Scoring Use learned phrasal polarities to score sentiment of new tweets Sentiment Scored Tweets © Copyright 2013 Pivotal. All rights reserved. 16
17.
Text Analytics Pipeline
with GNIP stream Tweet Stream Stored on HDFS (gpfdist) Loaded as external tables into GPDB Parallel Parsing of JSON and extraction of fields using PL/ Python @gopivotal @being_bayesian Topic Analysis through MADlib pLDA Sentiment Analysis through custom PL/Python functions D3.js © Copyright 2013 Pivotal. All rights reserved. 17
18.
Key Take-Aways There
is significant signal in Tweets in predicting commodity futures Sentiment Analysis of tweets can provide an additional signal in predicting commodity futures. Twitter sentiment was negatively correlated with commodity futures, in the sample we analyzed A blended model of Text Regression, Sentiment Analysis and Tweet Actor information gave us encouraging results and we believe that when combined with market fundamentals like weather or yield will give better models @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 18
19.
What’s in it
for me? @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 19
20.
Pivotal Open Source
Contributions http://gopivotal.com/pivotal-products/open-source-software • MADlib – In-database parallel ML - https://github.com/madlib/madlib • PyMADlib – Python Wrapper for MADlib - https://github.com/gopivotal/pymadlib • PivotalR – R wrapper for MADlib - https://github.com/madlib-internal/PivotalR • Part-of-speech tagger for Twitter via SQL - http://vatsan.github.io/gp-ark-tweet-nlp/ Questions? @being_bayesian @gopivotal @being_bayesian © Copyright 2013 Pivotal. All rights reserved. 20
Descargar ahora