SlideShare una empresa de Scribd logo
1 de 39
Presented By : RANJAN KUMAR BAITHA
INTRODUCTION
PROBLEM DEFINATION
APPLICATION
DATA COLLECTION
DESCRIPTION ABOUT DATA
REFERENCES
About Twitter
 Social networking and micro blogging service
 Enables users to send and read messages
 Messages of length up to 140 characters, known as
"tweets".
 Tweets contain rich information about people’s
preferences.
 People share their thoughts about matches and players
stats using Twitter.
People’s opinions towards a match have huge impact
on its success.
Our project includes prediction using Twitter data, and
analysis of the prediction results.
High volume of positive tweets may indicate perform-
ance and result of a match and players . But how to
quantify ?
 The problem in twitter analytics is classifying
polarity of a given text at the document, sentence or a
features/aspect level.
Whether the given document, sentence or a entity of
a features/aspect is positive, negative or neutral.
Using social media to predict the future becomes very
popular in recent years.
 Predicting the Future with Social Media Bernardo tries to
show that twitter-based prediction of Matches and Players
that can effect in result and performance.
 Predicting matches and players performance using social
media (Andrei Oghina, Mathias Breuss, Manos Tsagkias &
Maarten de Rijke 2012) uses twitter and facebook data to
predict the scores and result as well as which player can
perform in that match.
My project includes prediction using Twitter data and
investigation on two new topics based on the prediction
results.
Data Collection: existing twitter data set and
recent tweets via Twitter API
Data Pre-processing: get the "clean" data and
transform it to the format we need
 Analysis: train a classifier to classify the tweets as:
positive, negative, neutral and irrelevant
 Prediction: use the statistics of the tweets' labels
to predict the match result (win/loss)
MapReduce – Data Reduction The processing
pillar in the Hadoop ecosystem is the MapReduce
framework.
The framework allows the specification of an
operation to be applied to a huge data set, divide
the problem and data, and run it in parallel.
From an analyst’s point of view, this can occur on
multiple dimensions. For example, a very large
dataset can be reduced into a smaller subset where
analytics can be applied
MapReduce - R Executing R code in the
context of a MapReduce job elevates the
kinds and size of analytics that can be
applied to huge datasets.
Problems that fit nicely into this model
include “pleasingly parallel” scenarios.
Here’s a simple use case: Scoring a dataset
against a model built in R.
HDFS Architecture
Namenode
• manages the File System's namespace/meta-data/file
blocks
• Runs on 1 machine to several machines
Data node
• Stores and retrieves data blocks
• Reports to Namenode
• Runs on many machines
 Secondary Namenode
• Performs house keeping work so Namenode doesn’t have
• Requires similar hardware as Namenode machine
• Not used for high-availability ,not a backup for name
node
 Imposes key-value input/output
 Defines map and reduce functions
map: (K1,V1) → list (K2,V2)
reduce: (K2,list(V2)) → list (K3,V3)
 Map function is applied to every input key-value pair
 Map function generates intermediate key-value pairs
 Intermediate key-values are sorted and grouped by
key
 Reduce is applied to sorted and grouped
intermediate key-values
 Reduce emits result key-values
Takes care of distributed processing and
coordination
 Scheduling
– Jobs are broken down into smaller chunks called tasks.
These tasks are scheduled
 Task Localization with Data
– Framework strives to place tasks on the nodes that
host
the segment of data to be processed by that specific task
– Code is moved to where the data is
 Error Handling
– Failures are an expected behavior so tasks are
automatically re-tried on other machines
Data Synchronization
– Shuffle and Sort barrier re-arranges and moves
data between machines
– Input and output are coordinated by the
framework
This involves pushing the model to the
Task nodes in the Hadoop cluster, running
a MapReduce job that loads the model into
R on a task node, scoring data either row-by
row ( or in aggregates), and writing the
results back to HDFS.
 In the most simplistic case this can be
done with just a Map task.
Session is the first step in working within
theHDFS Overview To meet these challenges we
have to start with some basics.
First, we need to understand data storage in
Hadoop, how it can be leveraged from R, and why
it is important.
The basic storage mechanism in Hadoop is
HDFS (Hadoop Distributed File System).
For an R programmer, being able to read/write
files in HDFS from a standalone R .
 Avoid sampling / aggregation;
 Reduce data movement and
replication;
 Bring the analytics as close as
possible to the data and;
 Optimize computation speed.
Creating a Twitter Application
First step to perform Twitter Analysis is to
create a twitter application. This application
will allow you to perform analysis by
connecting your R console to the twitter using
the twitter API. The steps for creating your
twitter applications are:
Go to https://dev.twitter.com and login by
using your twitter account.
Then go to My Applications  Create a new
application
Give your application a name, describe about
your application in few words, provide your
website’s URL or your blog address (in case you
don’t have any website).
Leave the Callback URL blank for now.
Complete other formalities and create your
twitter application.
Once, all the steps are done, the created
application will show as below.
Please note the Consumer key and Consumer
Secret numbers as they will be used in RStudio
later.
This step is done. Next, I will work on my Rstudio.
These are twitteR, ROAuth, plyr,
stringr,RJSONIO,Rcurl,bitops and ggplot2.
In this section, I will first use some packages in R.
You can install these packages by the following commands:
Working on Rstudio - Building the corpus
Now run the following R script code snippet
After running this script section, the console will look like this
 Now once this file is downloaded, we are now
moving on to accessing the twitter API.
This step include the script code to perform
handshake using the Consumer Key and
Consumer Secret number of your own application.

You have to change these entries by the keys
from your application.
Following is the code you have to run to perform
handshake
Saving Tweets
Once the handshake is done and authorized by twitter, we can
fetch most recent tweets related to any keyword. I have used
#Kejriwal as Mr. Arvind Kejriwal is the most talked about
person in Delhi now a day.
The code for getting tweets related to #Kejriwal is:
This command will get 1000 tweets related to Kejriwal. The
function “searchTwitter” is used to download tweets from the
timeline. Now we need to convert this list of 1000 tweets into
the data frame, so that we can work on it. Then finally we
convert the data frame into .csv file
http://www.google.com
http://www.wikipedia.com
http://txcdk.unt.edu/iralab/sentiment_analysis
https://sites.google.com/site/miningtwitter/questions/sentime
nt/analysis
http://yourwhatyourepeatedlydo.blogspot.in/2013/04/downloa
ding-twitter-data-using-r.html
http://davetang.org/muse/2013/04/06/using-the-r_twitter-
package/
Twitter_Sentiment_analysis.pptx

Más contenido relacionado

Similar a Twitter_Sentiment_analysis.pptx

Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlKhanderao Kand
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveIRJET Journal
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopNushrat
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET Journal
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdfTechoERP
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111NavNeet KuMar
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis ReportAbanoub Amgad
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenRevolution Analytics
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringIRJET Journal
 

Similar a Twitter_Sentiment_analysis.pptx (20)

B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and HiveSentiment Analysis on Twitter Data Using Apache Flume and Hive
Sentiment Analysis on Twitter Data Using Apache Flume and Hive
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Finding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache HadoopFinding URL pattern with MapReduce and Apache Hadoop
Finding URL pattern with MapReduce and Apache Hadoop
 
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
IRJET-An Efficient Technique to Improve Resources Utilization for Hadoop Mapr...
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
Big Data
Big DataBig Data
Big Data
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Data Mapping Strategy for data stored in the storage cloud  111A Survey on Data Mapping Strategy for data stored in the storage cloud  111
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Datapedia Analysis Report
Datapedia Analysis ReportDatapedia Analysis Report
Datapedia Analysis Report
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
Big Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and StoringBig Data with Hadoop – For Data Management, Processing and Storing
Big Data with Hadoop – For Data Management, Processing and Storing
 

Último

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 

Último (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 

Twitter_Sentiment_analysis.pptx

  • 1. Presented By : RANJAN KUMAR BAITHA
  • 3. About Twitter  Social networking and micro blogging service  Enables users to send and read messages  Messages of length up to 140 characters, known as "tweets".  Tweets contain rich information about people’s preferences.  People share their thoughts about matches and players stats using Twitter.
  • 4. People’s opinions towards a match have huge impact on its success. Our project includes prediction using Twitter data, and analysis of the prediction results. High volume of positive tweets may indicate perform- ance and result of a match and players . But how to quantify ?
  • 5.
  • 6.  The problem in twitter analytics is classifying polarity of a given text at the document, sentence or a features/aspect level. Whether the given document, sentence or a entity of a features/aspect is positive, negative or neutral.
  • 7. Using social media to predict the future becomes very popular in recent years.  Predicting the Future with Social Media Bernardo tries to show that twitter-based prediction of Matches and Players that can effect in result and performance.  Predicting matches and players performance using social media (Andrei Oghina, Mathias Breuss, Manos Tsagkias & Maarten de Rijke 2012) uses twitter and facebook data to predict the scores and result as well as which player can perform in that match. My project includes prediction using Twitter data and investigation on two new topics based on the prediction results.
  • 8. Data Collection: existing twitter data set and recent tweets via Twitter API Data Pre-processing: get the "clean" data and transform it to the format we need  Analysis: train a classifier to classify the tweets as: positive, negative, neutral and irrelevant  Prediction: use the statistics of the tweets' labels to predict the match result (win/loss)
  • 9. MapReduce – Data Reduction The processing pillar in the Hadoop ecosystem is the MapReduce framework. The framework allows the specification of an operation to be applied to a huge data set, divide the problem and data, and run it in parallel. From an analyst’s point of view, this can occur on multiple dimensions. For example, a very large dataset can be reduced into a smaller subset where analytics can be applied
  • 10. MapReduce - R Executing R code in the context of a MapReduce job elevates the kinds and size of analytics that can be applied to huge datasets. Problems that fit nicely into this model include “pleasingly parallel” scenarios. Here’s a simple use case: Scoring a dataset against a model built in R.
  • 12. Namenode • manages the File System's namespace/meta-data/file blocks • Runs on 1 machine to several machines Data node • Stores and retrieves data blocks • Reports to Namenode • Runs on many machines  Secondary Namenode • Performs house keeping work so Namenode doesn’t have • Requires similar hardware as Namenode machine • Not used for high-availability ,not a backup for name node
  • 13.
  • 14.  Imposes key-value input/output  Defines map and reduce functions map: (K1,V1) → list (K2,V2) reduce: (K2,list(V2)) → list (K3,V3)  Map function is applied to every input key-value pair  Map function generates intermediate key-value pairs  Intermediate key-values are sorted and grouped by key  Reduce is applied to sorted and grouped intermediate key-values  Reduce emits result key-values
  • 15. Takes care of distributed processing and coordination  Scheduling – Jobs are broken down into smaller chunks called tasks. These tasks are scheduled  Task Localization with Data – Framework strives to place tasks on the nodes that host the segment of data to be processed by that specific task – Code is moved to where the data is
  • 16.  Error Handling – Failures are an expected behavior so tasks are automatically re-tried on other machines Data Synchronization – Shuffle and Sort barrier re-arranges and moves data between machines – Input and output are coordinated by the framework
  • 17. This involves pushing the model to the Task nodes in the Hadoop cluster, running a MapReduce job that loads the model into R on a task node, scoring data either row-by row ( or in aggregates), and writing the results back to HDFS.  In the most simplistic case this can be done with just a Map task.
  • 18.
  • 19. Session is the first step in working within theHDFS Overview To meet these challenges we have to start with some basics. First, we need to understand data storage in Hadoop, how it can be leveraged from R, and why it is important. The basic storage mechanism in Hadoop is HDFS (Hadoop Distributed File System). For an R programmer, being able to read/write files in HDFS from a standalone R .
  • 20.  Avoid sampling / aggregation;  Reduce data movement and replication;  Bring the analytics as close as possible to the data and;  Optimize computation speed.
  • 21. Creating a Twitter Application First step to perform Twitter Analysis is to create a twitter application. This application will allow you to perform analysis by connecting your R console to the twitter using the twitter API. The steps for creating your twitter applications are: Go to https://dev.twitter.com and login by using your twitter account. Then go to My Applications  Create a new application
  • 22.
  • 23.
  • 24. Give your application a name, describe about your application in few words, provide your website’s URL or your blog address (in case you don’t have any website). Leave the Callback URL blank for now. Complete other formalities and create your twitter application. Once, all the steps are done, the created application will show as below. Please note the Consumer key and Consumer Secret numbers as they will be used in RStudio later.
  • 25. This step is done. Next, I will work on my Rstudio.
  • 26. These are twitteR, ROAuth, plyr, stringr,RJSONIO,Rcurl,bitops and ggplot2. In this section, I will first use some packages in R. You can install these packages by the following commands: Working on Rstudio - Building the corpus
  • 27. Now run the following R script code snippet After running this script section, the console will look like this
  • 28.  Now once this file is downloaded, we are now moving on to accessing the twitter API. This step include the script code to perform handshake using the Consumer Key and Consumer Secret number of your own application.  You have to change these entries by the keys from your application. Following is the code you have to run to perform handshake
  • 29.
  • 30. Saving Tweets Once the handshake is done and authorized by twitter, we can fetch most recent tweets related to any keyword. I have used #Kejriwal as Mr. Arvind Kejriwal is the most talked about person in Delhi now a day. The code for getting tweets related to #Kejriwal is: This command will get 1000 tweets related to Kejriwal. The function “searchTwitter” is used to download tweets from the timeline. Now we need to convert this list of 1000 tweets into the data frame, so that we can work on it. Then finally we convert the data frame into .csv file
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.