SlideShare a Scribd company logo
1 of 60
Chapter 4
Social Media and Text Analytics
1 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Overview of Social Media Analytics
 Social media analytics is the process of collecting and
analyzing audience data shared on social networks to
improve an organization's strategic business decisions.
 Social media analytics is the ability to gather and find
meaning in data gathered from social channels to support
business decisions — and measure the performance of
actions based on those decisions through social media.
 Social media analytics uses specifically designed
software platforms that work similarly to web search
tools.
 Data about keywords or topics is retrieved through
search queries or web ‘crawlers’ that span channels.
 Fragments of text are returned, loaded into a database,
categorized and analyzed to derive meaningful insights.
2 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Social Media Analytics Process
3 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Seven Layers of Social Media Analytics
 Social media at a minimum has seven layers of data.
 Each layer carries potentially valuable information
and insights that can be harvested for business
intelligence purposes.
 Out of the seven layers, some are visible or easily
identifiable (e.g., text and actions) and other are
invisible (e.g., social media and hyperlink networks).
4 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
5 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 LAYER ONE: TEXT
 Social media text analytics deals with the extraction and
analysis of business insights from textual elements of
social media content, such as comments, tweets, blog
posts, and Facebook status updates. Text analytics is
mostly used to understand social media users’ sentiments
or identify emerging themes and topics.
 LAYER TWO: NETWORKS
 Social media network analytics extract, analyze, and
interpret personal and professional social networks, for
example, Facebook, Friendship Network, and Twitter.
Network analytics seeks to identify influential nodes (e.g.,
people and organizations) and their position in the
network.
6 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 LAYER THREE: ACTIONS
 Social media actions analytics deals with extracting,
analyzing, and interpreting the actions performed by
social media users, including likes, dislikes, shares,
mentions, and endorsement. Actions analytics are mostly
used to measure popularity, influence, and prediction in
social media.
 LAYER FOUR: MOBILE
 Mobile analytics is the next frontier in the social business
landscape. Mobile analytics deals with measuring and
optimizing user engagement with mobile applications (or
apps for short), analyzing and understanding in-app
purchases, customer engagement, and mobile user
demographics.
7 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 LAYER FIVE: HYPERLINKS
 Hyperlink analytics is about extracting, analyzing, and
interpreting social media hyperlinks (e.g., in-links and out-
links).
 Hyperlink analysis can reveal, for example, Internet traffic
patterns and sources of incoming or outgoing traffic to
and from a source.
 LAYER SIX: LOCATION
 Location analytics, also known as spatial analysis or
geospatial analytics, is concerned with mining and
mapping the locations of social media users,
contents, and data.
8 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 LAYER SEVEN: SEARCH ENGINES
 Search engines analytics focuses on analyzing historical
search data for gaining a valuable insight into a range of
areas, including trends analysis, keyword monitoring,
search result and advertisement history, and
advertisement spending statistics.
9 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Accessing Social Media Data
 Social media data is any type of data that can be
gathered through social media. In general, the term
refers to social media metrics and
demographics collected through analytics tools on
social platforms.
 Social media data can also refer to data collected
from content people post publicly on social media.
This type of social media data for marketing can be
collected through social listening tools.
10 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Social Network Analysis
 Social network analysis (SNA) is the process of
investigating social structures through the use
of networks and graph theory. It characterizes
networked structures in terms of nodes (individual
actors, people, or things within the network) and
the ties, edges, or links (relationships or interactions)
that connect them.
 SNA is the practice of representing networks of
people as graphs and then exploring these graphs. A
typical social network representation has nodes for
people, and edges connecting two nodes to
represent one or more relationships between them
11 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
12
 The resulting graph can reveal patterns of
connection among people. Small networks can be
represented visually, and these visualizations are
intuitive and may make apparent patterns of
connections, and reveal nodes that are highly
connected or which play a critical role in connecting
groups together
 Social network analysis (SNA) is a process of
quantitative and qualitative analysis of a social
network. SNA measures and maps the flow of
relationships and relationship changes between
knowledge-possessing entities.
 Simple and complex entities include websites,
computers, animals, humans, groups, organizations
and nations.
The benefits of social network:
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
13
 Helps you understand your audience better
 Used for customer segmentation
 Used to design Recommendation Systems
 Detect fake news, among other things
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
14
 Link Prediction:
 Link prediction is one of the most important research
topics in the field of graphs and networks. The objective
of link prediction is to identify pairs of nodes that will
either form a link or not in the future.
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
15
 Link prediction has a ton of use in real-world
applications.
 Predict which customers are likely to buy what products
on online marketplaces like Amazon. It can help in making
better product recommendations
 Suggest interactions or collaborations between
employees in an organization
 Extract vital insights from terrorist networks
Introduction to Natural Language
Processing
 Natural Language Processing is a branch of Computer
Science that deals with the understanding and
processing of natural language, e.g. texts or voice
recordings.
 The goal is for a machine to be able to communicate with
humans in the same way that humans have been
communicating with each other for centuries.
 Learning a new language is not easy for us humans
either and requires a lot of time and perseverance.
 When a machine wants to learn a natural language, it is
no different.
 Therefore, some sub-areas have emerged within Natural
Language Processing that are necessary for language to
be completely understood.
16 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Text Analytics
 Tokenization
 Bag of words
 Word weighting : TF-IDF
 N-Grams
 Stop word
 Stemming and Lemmatization
 Synonyms and Part of speech tagging
17 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Tokenization
 The text is cut into pieces called “tokens” or “terms.”
 These tokens are the most basic unit of information you’ll
use for your model.
 The terms are often words but this isn’t a necessity.
Entire sentences can be used for analysis.
 We’ll use unigrams: terms consisting of one word.
 Often, however, it’s useful to include bigrams (two words
per token) or trigrams (three words per token) to capture
extra meaning and increase the performance of your
models.
 This does come at a cost, though, because you’re
building bigger term-vectors by including bigrams and/or
trigrams in the equation.
18 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Bag of words
 To build our classification model we’ll go with the bag of
words approach.
 Bag of words is the simplest way of structuring textual
data: every document is turned into a word vector.
 If a certain word is present in the vector it’s labeled
“True”; the others are labeled “False”. Figure shows a
simplified example of this, in case there are only two
documents: one about the television show Game of
Thrones and one about data science.
 The two word vectors together form the document-term
matrix.
 The document-term matrix holds a column for every term
and a row for every document
19 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
20 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Word weighting : TF-IDF
 Term Frequency - Inverse Document Frequency (TF-
IDF) is a widely used statistical method in natural
language processing and information retrieval. It
measures how important a term is within a document
relative to a collection of documents (i.e., relative to
a corpus). Words within a text document are
transformed into importance numbers by a text
vectorization process. There are many different text
vectorization scoring schemes, with TF-IDF being
one of the most common.
21 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 As its name implies, TF-IDF vectorizes/scores a
word by multiplying the word’s Term Frequency (TF)
with the Inverse Document Frequency (IDF).
 Term Frequency: TF of a term or word is the
number of times the term appears in a document
compared to the total number of words in the
document.

22 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 Inverse Document Frequency: IDF of a term
reflects the proportion of documents in the corpus
that contain the term. Words unique to a small
percentage of documents (e.g., technical jargon
terms) receive higher importance values than words
common across all documents (e.g., a, the, and).
23 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 The TF-IDF of a term is calculated by multiplying TF
and IDF scores.
 TF-IDF is useful in many natural language
processing applications. For example, Search
Engines use TF-IDF to rank the relevance of a
document for a query. TF-IDF is also employed in
text classification, text summarization, and topic
modeling.
24 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Example
 Imagine the term ’t’ appears 20 times in a document
that contains a total of 100 words.
 Term Frequency (TF) of ’t’ can be calculated as
follow:
 Assume a collection of related documents contains
10,000 documents. If 100 documents out of 10,000
documents contain the term ’t’, Inverse Document
Frequency (IDF) of ’t’ can be calculated as follows
25 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 Using these two quantities, we can calculate TF-IDF
score of the term ’t’ for the document.
26 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
N-Grams
 N-gram can be defined as the contiguous sequence
of n items from a given sample of text or speech.
The items can be letters, words, or base pairs
according to the application. The N-grams typically
are collected from a text or speech corpus (A long
text dataset).
 N-grams of texts are extensively used in text mining
and natural language processing tasks. They are
basically a set of co-occurring words within a given
window and when computing the n-grams you
typically move one word forward (although you can
move X words forward in more advanced scenarios).
27 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 For example, for the sentence
 “I reside in Bengaluru”.
SL.No Type of n-gram Generated n-grams
1 Unigram [“I”, ”reside”, ”in”, ”Bengaluru”]
2 Bigram [“I reside”, ”reside in”, ”in Bengaluru”]
3 Trigram [“I reside in”, “reside in Bengaluru”]
28 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 When N=1, this is referred to as unigrams and this is
essentially the individual words in a sentence.
 When N=2, this is called bigrams and
 when N=3 this is called trigrams.
 When N>3 this is usually referred to as four grams or five
grams and so on.
 How many N-grams in a sentence?
 If X=Num of words in a given sentence K, the number of
n-grams for sentence K would be:
29 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Stop word
 Stop words are a set of commonly used words in a
language. Examples of stop words in English are “a,”
“the,” “is,” “are,” etc.
 Stop words are commonly used in Text Mining and
Natural Language Processing (NLP) to eliminate words
that are so widely used that they carry very little useful
information.
 When to remove stop words?
 If we have a task of text classification or sentiment analysis
then we should remove stop words as they do not provide any
information to our model, i.e keeping out unwanted words
out of our corpus, but if we have the task of language
translation then stopwords are useful, as they have to be
translated along with other words.
30 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 pros:
 Stop words are often removed from the text before
training deep learning and machine learning models since
stop words occur in abundance, hence providing little to
no unique information that can be used for classification
or clustering.
 On removing stop words, dataset size decreases, and the
time to train the model also decreases without a huge
impact on the accuracy of the model.
 Stop word removal can potentially help in improving
performance, as there are fewer and only significant
tokens left. Thus, the classification accuracy could be
improved
31 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 cons:
 Improper selection and removal of stop words can change
the meaning of our text. So we have to be careful in
choosing our stop words.
 Ex: “ This movie is not good.”
If we remove (not ) in pre-processing step the sentence
(this movie is good) indicates that it is positive which is
wrongly interpreted.
32 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Stemming and Lemmatization
 What is Stemming?
 Stemming is a technique used to extract the base form of
the words by removing affixes from them. It is just like
cutting down the branches of a tree to its stems. For
example, the stem of the words eating, eats,
eaten is eat.
 Search engines use stemming for indexing the words.
That’s why rather than storing all forms of a word, a
search engine can store only the stems. In this way,
stemming reduces the size of the index and increases
retrieval accuracy.
33 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
What is Lemmatization?
 Lemmatization is a development of Stemming and
describes the process of grouping together the
different inflected forms of a word so they can be
analyzed as a single item.
 Lemmatization is similar to Stemming but it brings
context to the words. So it links words with similar
meanings to one word.
 Lemmatization algorithms usually also use positional
arguments as inputs, such as whether the word is an
adjective, noun, or verb.
34 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Synonyms and Part of speech tagging
 Part-of-speech (POS) tagging is a process in natural
language processing (NLP) where each word in a text is
labeled with its corresponding part of speech. This can
include nouns, verbs, adjectives, and other grammatical
categories.
 POS tagging is useful for a variety of NLP tasks, such as
information extraction, named entity recognition, and
machine translation. It can also be used to identify the
grammatical structure of a sentence and to disambiguate
words that have multiple meanings.
 POS tagging is typically performed using machine
learning algorithms, which are trained on a large
annotated corpus of text. The algorithm learns to predict
the correct POS tag for a given word based on the
context in which it appears.
35 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 Why POS tagging?
 POS tagging is an important part of NLP because it
works as the prerequisite for further NLP analysis as
follows −
 Chunking
 Syntax Parsing
 Information extraction
 Machine Translation
 Sentiment Analysis
 Grammar analysis & word-sense disambiguation
36 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
 Tagging a list of sentences
 Rather than tagging a single sentence, the
NLTK’s TaggerI class also provides us
a tag_sents() method with the help of which we can tag a
list of sentences. Following is the example in which we
tagged two simple sentences
 Un-tagging a sentence
 We can also un-tag a sentence. NLTK provides
nltk.tag.untag() method for this purpose. It will take a
tagged sentence as input and provides a list of words
without tags.
37 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Use of Parts of Speech Tagging in NLP
 To understand the grammatical structure of a sentence:
 By labeling each word with its POS, we can better understand the syntax
and structure of a sentence. This is useful for tasks such as machine
translation and information extraction, where it is important to know how
words relate to each other in the sentence.
 To disambiguate words with multiple meanings:
 Some words, such as “bank,” can have multiple meanings depending on
the context in which they are used. By labeling each word with its POS,
we can disambiguate these words and better understand their intended
meaning.
 To improve the accuracy of NLP tasks:
 POS tagging can help improve the performance of various NLP tasks,
such as named entity recognition and text classification. By providing
additional context and information about the words in a text, we can build
more accurate and sophisticated algorithms.
 To facilitate research in linguistics:
 POS tagging can also be used to study the patterns and characteristics of
language use and to gain insights into the structure and function of
different parts of speech.
38 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Application of POS Tagging
 Information extraction:
 POS tagging can be used to identify specific types of information in a text, such as
names, locations, and organizations. This is useful for tasks such as extracting
data from news articles or building knowledge bases for artificial intelligence
systems.
 Named entity recognition:
 POS tagging can be used to identify and classify named entities in a text, such as
people, places, and organizations. This is useful for tasks such as building
customer profiles or identifying key figures in a news story.
 Text classification:
 POS tagging can be used to help classify texts into different categories, such as
spam emails or sentiment analysis. By analyzing the POS tags of the words in a
text, algorithms can better understand the content and tone of the text.
 Machine translation:
 POS tagging can be used to help translate texts from one language to another by
identifying the grammatical structure and relationships between words in the
source language and mapping them to the target language.
 Natural language generation:
 POS tagging can be used to generate natural-sounding text by selecting
appropriate words and constructing grammatically correct sentences. This is useful
for tasks such as chatbots and virtual assistants.
39 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Sentiment Analysis
 Sentiment analysis is the process of classifying
whether a block of text is positive, negative, or,
neutral.
 Sentiment analysis is a subset of natural language
processing (NLP) that uses machine learning to
analyze and classify the emotional tone of text data.
 The goal which Sentiment analysis tries to gain is to
be analyzed people’s opinions in a way that can help
businesses expand.
 It focuses not only on polarity (positive, negative &
neutral) but also on emotions (happy, sad, angry,
etc.)as well as intentions to buy.
40 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Why Use Sentiment Analysis?
Sentiment analysis is the contextual meaning of
words that indicates the social sentiment of a
brand and also helps the business to determine
whether the product they are manufacturing is
going to make a demand in the market or not.
Businesses can use insights from sentiment
analysis to improve their products, fine-tune
marketing messages, correct misconceptions,
and identify positive influencers.
It’s very helpful in helping businesses to gain
insights, understand customers, predict and
enhance the customer experience, tailor
marketing campaigns, and aid in decision-
making.
41 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Types of Sentiment Analysis
Fine-grained sentiment analysis:
• This depends on the polarity base. This category can be designed as very positive,
positive, neutral, negative, or very negative. The rating is done on a scale of 1 to 5. If
the rating is 5 then it is very positive, 2 then negative, and 3 then neutral.
Emotion detection:
• The sentiments happy, sad, angry, upset, jolly, pleasant, and so on come under
emotion detection. It is also known as a lexicon method of sentiment analysis.
Aspect-based sentiment analysis:
• It focuses on a particular aspect for instance if a person wants to check the feature of
the cell phone then it checks the aspect such as the battery, screen, and camera
quality then aspect based is used.
Multilingual sentiment analysis:
• Multilingual consists of different languages where the classification needs to be done
as positive, negative, and neutral. This is highly challenging and comparatively
difficult.
42 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Applications
• If for instance the comments on social media side as
Instagram, over here all the reviews are analyzed and
categorized as positive, negative, and neutral.
Social
Media:
• In the play store, all the comments in the form of 1 to 5
are done with the help of sentiment analysis
approaches.
Customer
Service:
• In the marketing area where a particular product
needs to be reviewed as good or bad.
Marketing
Sector:
• All the reviewers will have a look at the comments and
will check and give the overall review of the product.
Reviewer
side:
43 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Document or text summarization
 Text summarization is a very useful and important
part of Natural Language Processing (NLP).
 We can summarize our text in a few lines by
removing unimportant text and converting the same
text into smaller semantic text form.
 In this approach we build algorithms or programs
which will reduce the text size and create a summary
of our text data. This is called automatic text
summarization in machine learning.
 Text summarization is the process of creating shorter
text without removing the semantic structure of text.
44 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
45
 Text summarization is the practice of breaking down long
publications into manageable paragraphs or sentences.
 The procedure extracts important information while also
ensuring that the paragraph's sense is preserved. This
shortens the time it takes to comprehend long materials
like research articles while without omitting critical
information.
 Text summarising presents a number of issues, including
text identification, interpretation, and summary
generation, as well as analysis of the resulting summary.
 Identifying important phrases in the document and
exploiting them to uncover relevant information to add in
the summary are critical jobs in extraction-based
summarising.
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
46
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
47
Two
approaches to
text
summarization.
Extraction
based
summarization
Abstractive
Summarization
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
48
 Extraction based summarization
 The extractive text summarising approach entails
extracting essential words from a source material
and combining them to create a summary.
 Without making any modifications to the texts, the
extraction is done according to the given measure
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
49
 Abstractive Summarization
 Another way of text summarization is abstractive
summarization. We create new sentences from the
original content in this step.
 This is in contrast to our previous extractive technique, in
which we only utilized the phrases that were present. It's
possible that the phrases formed by abstractive
summarization aren't present in the original text.
 When abstraction is used for text summarization in deep
learning issues, it can overcome the extractive method's
grammatical errors.
 Abstraction is more efficient than extraction. The text
summarising algorithms necessary for abstraction, on the
other hand, are more complex to build, which is why
extraction is still widely used.
Trend Analytics
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
50
 Trend analysis – also known as technical analysis –
is used to monitor metrics and their development
over time. As such, the technique relies on effective
historical analysis.
 Trend analysis is a methodology used in research to
gather and study data for prediction-making about
future consumer behavior based on the trend
analysis of observed and recorded data from past
and ongoing trends.
 It helps determine the main characteristics of the
stock market and the consumers associated with it.
 Trend analysis is the practice that gives us the ability
to look at data over time for a long-running survey.
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
51
Types of
Trend
Analysis
Temporal
Method
Geographic
Method
Intuitive
Method
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
52
 Temporal Method
 This type of methodology is used to analyze patterns
and trends of a given group of relevant data or
objects of study in a specific cohort of time, as well
as its change in that period.
 A clear example of this type of study is longitudinal
studies with the clear intention of detecting and
analyzing trends that arise from historical trends.
 It is mainly used in ethnographic research and other
types of event-focused studies. The great
disadvantage of this type of trend analysis is that it is
exposed to many variables that could affect the final
result of the study.
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
53
 Geographic Method
 The geographic method of trend analysis is generally
easy and reliable; it can be the means to identify
commonalities and differences between user groups
belonging to the same or different geographies.
 The main purpose of the geographic method is the
analysis of market trends that develop in groups of
users identified by their geographic location.
 The downside of the geographic method is
consequently the geographic limitation for data
analysis, which can be influenced by factors such as
culture and traditions that are specific to the
geographic location user groups.
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
54
 Intuitive Method
 The intuitive method is a type of trend analysis
implemented to analyze trends within groups of users
based on logical explanations, behavioral patterns, or
other elements perceived by a futurist.
 This market trend analysis is helpful for prediction-
making without the need for large amounts of statistical
data. However, some issues with the methodology are
the overreliance on knowledge and logic provided by
futurists and researchers, which makes it prone to
become biased to its researcher.
 The intuitive method is the most difficult type of trend
analysis and might not be as precise.
Challenges to Social media analytics
Data cleansing
• cleaning unstructured textual data (e.g., normalizing text),
especially high-frequency streamed real-time data, still
presents numerous problems and research challenges.
Scraping
• although social media data is accessible through APIs, due to
the commercial value of the data, most of the major sources
such as Facebook and Google are making it increasingly
difficult for academics to obtain comprehensive access to their
‘raw’ data; very few social data sources provide affordable data
offerings to academia and researchers. News services such as
Thomson Reuters and Bloomberg typicallycharge a premium
for access to their data.
55 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Scraping
• In contrast, Twitter has recently announced the Twitter
Data Grants program, where researchers can apply to get
access to Twitter’s public tweets and historical data in
order to get insights from its massive set of data (Twitter
has more than 500 million tweets a day).
Data protection
• once you have created a ‘big data’ resource, the data
needs to be secured, ownership and IP issues resolved
(i.e., storing scraped data is against most of the
publishers’ terms of service), and users provided with
different levels of access; otherwise, users may attempt to
‘suck’ all the valuable data from the database.
56 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Holistic data sources
• researchers are increasingly bringing together and
combining novel data sources: social media data, real-
time market & customer data and geospatial data for
analysis.
Data visualization
• visual representation of data whereby information that
has been abstracted in some schematic form with the
goal of communicating information clearly and
effectively through graphical means. Given the
magnitude of the data involved, visualization is
becoming increasingly important.
57 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Analytics dashboards
• many social media platforms require users to write
APIs to access feeds or program analytics models
in a programming language, such as Java.
• While reasonable for computer scientists, these
skills are typically beyond most (social science)
researchers.
• Non-programming interfaces are required for giving
what might be referred to as ‘deep’ access to ‘raw’
data, for example, configuring APIs, merging social
media feeds, combining holistic sources and
developing analytical models.
58 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
References
 https://www.researchgate.net/publication/352972869_Challenges_an
d_Difficulties_in_Social_Media_Analytics
 http://repo.darmajaya.ac.id/5411/1/Seven%20Layers%20of%20Socia
l%20Media%20Analytics_%20Mining%20Business%20Insights%20fr
om%20Social%20Media%20Text%2C%20Actions%2C%20Networks
%2C%20Hyperlinks%2C%20Apps%2C%20Search%20Engine%2C
%20and%20Location%20Data%20%28%20PDFDrive%20%29.pdf
 introducing-data-science-machine-learning-python
 https://towardsdatascience.com/stemming-vs-lemmatization-
2daddabcb221
 https://kavita-ganesan.com/what-are-n-grams/
59 Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
Asst. Prof. Rushikesh Chikane, MIT
ACSC, Alandi
60
THANK YOU

More Related Content

What's hot

Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data scienceNavin Manaswi
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social mediaJeremiah Fadugba
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Mounia Lalmas-Roelleke
 
Artificial Intelligence (AI) Productivity Tools for School and Office
Artificial Intelligence (AI) Productivity Tools for School and OfficeArtificial Intelligence (AI) Productivity Tools for School and Office
Artificial Intelligence (AI) Productivity Tools for School and OfficeExcellence Foundation for South Sudan
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningAli Habeeb
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingCloudxLab
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLPBill Liu
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information RetrievalDishant Ailawadi
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysisDiana Maynard
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPTChhavi Mathur
 

What's hot (20)

NLP
NLPNLP
NLP
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
 
Social media analytics powered by data science
Social media analytics powered by data scienceSocial media analytics powered by data science
Social media analytics powered by data science
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization
 
Artificial Intelligence (AI) Productivity Tools for School and Office
Artificial Intelligence (AI) Productivity Tools for School and OfficeArtificial Intelligence (AI) Productivity Tools for School and Office
Artificial Intelligence (AI) Productivity Tools for School and Office
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
data science
data sciencedata science
data science
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
Data Science
Data ScienceData Science
Data Science
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Practical sentiment analysis
Practical sentiment analysisPractical sentiment analysis
Practical sentiment analysis
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Text MIning
Text MIningText MIning
Text MIning
 
Web mining (1)
Web mining (1)Web mining (1)
Web mining (1)
 
Social media mining PPT
Social media mining PPTSocial media mining PPT
Social media mining PPT
 

Similar to Social Media Analytics and NLP

Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paperFiras Husseini
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
A large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweetsA large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweetsIJECEIAES
 
Cyber bullying detection and analysis.ppt.pdf
Cyber bullying detection and analysis.ppt.pdfCyber bullying detection and analysis.ppt.pdf
Cyber bullying detection and analysis.ppt.pdfHunais Abdul Nafi
 
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
 
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...AIRCC Publishing Corporation
 
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
 
OSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text miningOSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text miningOpen Science Fair
 
Sentiment Analysis Tasks and Approaches
Sentiment Analysis Tasks and ApproachesSentiment Analysis Tasks and Approaches
Sentiment Analysis Tasks and Approachesenas khalil
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxanhlodge
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...
A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...
A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...Andrew Parish
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...ijcsity
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Dataijtsrd
 

Similar to Social Media Analytics and NLP (20)

Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
 
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
A large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweetsA large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweets
 
Cyber bullying detection and analysis.ppt.pdf
Cyber bullying detection and analysis.ppt.pdfCyber bullying detection and analysis.ppt.pdf
Cyber bullying detection and analysis.ppt.pdf
 
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
 
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
 
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
 
OSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text miningOSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text mining
 
Sentiment Analysis Tasks and Approaches
Sentiment Analysis Tasks and ApproachesSentiment Analysis Tasks and Approaches
Sentiment Analysis Tasks and Approaches
 
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...
A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...
A Novel Technique For Creative Problem-Solving By Using Q-Learning And Associ...
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
Analytical Tools Primer
Analytical Tools PrimerAnalytical Tools Primer
Analytical Tools Primer
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Data
 
Sub1557
Sub1557Sub1557
Sub1557
 

More from RushikeshChikane2

2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptxRushikeshChikane2
 
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptxRushikeshChikane2
 
Chapter 2 System Security.pptx
Chapter 2 System Security.pptxChapter 2 System Security.pptx
Chapter 2 System Security.pptxRushikeshChikane2
 
Security Architectures and Models.pptx
Security Architectures and Models.pptxSecurity Architectures and Models.pptx
Security Architectures and Models.pptxRushikeshChikane2
 
Mining Frequent Patterns, Associations, and.pptx
 Mining Frequent Patterns, Associations, and.pptx Mining Frequent Patterns, Associations, and.pptx
Mining Frequent Patterns, Associations, and.pptxRushikeshChikane2
 
Machine Learning Overview.pptx
Machine Learning Overview.pptxMachine Learning Overview.pptx
Machine Learning Overview.pptxRushikeshChikane2
 
Chapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.pptChapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.pptRushikeshChikane2
 
Chapter 3_Architectural Styles.pptx
Chapter 3_Architectural Styles.pptxChapter 3_Architectural Styles.pptx
Chapter 3_Architectural Styles.pptxRushikeshChikane2
 
Chapter 2_Software Architecture.ppt
Chapter 2_Software Architecture.pptChapter 2_Software Architecture.ppt
Chapter 2_Software Architecture.pptRushikeshChikane2
 
Chapter 1_UML Introduction.ppt
Chapter 1_UML Introduction.pptChapter 1_UML Introduction.ppt
Chapter 1_UML Introduction.pptRushikeshChikane2
 

More from RushikeshChikane2 (10)

2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
 
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
 
Chapter 2 System Security.pptx
Chapter 2 System Security.pptxChapter 2 System Security.pptx
Chapter 2 System Security.pptx
 
Security Architectures and Models.pptx
Security Architectures and Models.pptxSecurity Architectures and Models.pptx
Security Architectures and Models.pptx
 
Mining Frequent Patterns, Associations, and.pptx
 Mining Frequent Patterns, Associations, and.pptx Mining Frequent Patterns, Associations, and.pptx
Mining Frequent Patterns, Associations, and.pptx
 
Machine Learning Overview.pptx
Machine Learning Overview.pptxMachine Learning Overview.pptx
Machine Learning Overview.pptx
 
Chapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.pptChapter 4_Introduction to Patterns.ppt
Chapter 4_Introduction to Patterns.ppt
 
Chapter 3_Architectural Styles.pptx
Chapter 3_Architectural Styles.pptxChapter 3_Architectural Styles.pptx
Chapter 3_Architectural Styles.pptx
 
Chapter 2_Software Architecture.ppt
Chapter 2_Software Architecture.pptChapter 2_Software Architecture.ppt
Chapter 2_Software Architecture.ppt
 
Chapter 1_UML Introduction.ppt
Chapter 1_UML Introduction.pptChapter 1_UML Introduction.ppt
Chapter 1_UML Introduction.ppt
 

Recently uploaded

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

Social Media Analytics and NLP

  • 1. Chapter 4 Social Media and Text Analytics 1 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 2. Overview of Social Media Analytics  Social media analytics is the process of collecting and analyzing audience data shared on social networks to improve an organization's strategic business decisions.  Social media analytics is the ability to gather and find meaning in data gathered from social channels to support business decisions — and measure the performance of actions based on those decisions through social media.  Social media analytics uses specifically designed software platforms that work similarly to web search tools.  Data about keywords or topics is retrieved through search queries or web ‘crawlers’ that span channels.  Fragments of text are returned, loaded into a database, categorized and analyzed to derive meaningful insights. 2 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 3. Social Media Analytics Process 3 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 4. Seven Layers of Social Media Analytics  Social media at a minimum has seven layers of data.  Each layer carries potentially valuable information and insights that can be harvested for business intelligence purposes.  Out of the seven layers, some are visible or easily identifiable (e.g., text and actions) and other are invisible (e.g., social media and hyperlink networks). 4 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 5. 5 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 6.  LAYER ONE: TEXT  Social media text analytics deals with the extraction and analysis of business insights from textual elements of social media content, such as comments, tweets, blog posts, and Facebook status updates. Text analytics is mostly used to understand social media users’ sentiments or identify emerging themes and topics.  LAYER TWO: NETWORKS  Social media network analytics extract, analyze, and interpret personal and professional social networks, for example, Facebook, Friendship Network, and Twitter. Network analytics seeks to identify influential nodes (e.g., people and organizations) and their position in the network. 6 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 7.  LAYER THREE: ACTIONS  Social media actions analytics deals with extracting, analyzing, and interpreting the actions performed by social media users, including likes, dislikes, shares, mentions, and endorsement. Actions analytics are mostly used to measure popularity, influence, and prediction in social media.  LAYER FOUR: MOBILE  Mobile analytics is the next frontier in the social business landscape. Mobile analytics deals with measuring and optimizing user engagement with mobile applications (or apps for short), analyzing and understanding in-app purchases, customer engagement, and mobile user demographics. 7 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 8.  LAYER FIVE: HYPERLINKS  Hyperlink analytics is about extracting, analyzing, and interpreting social media hyperlinks (e.g., in-links and out- links).  Hyperlink analysis can reveal, for example, Internet traffic patterns and sources of incoming or outgoing traffic to and from a source.  LAYER SIX: LOCATION  Location analytics, also known as spatial analysis or geospatial analytics, is concerned with mining and mapping the locations of social media users, contents, and data. 8 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 9.  LAYER SEVEN: SEARCH ENGINES  Search engines analytics focuses on analyzing historical search data for gaining a valuable insight into a range of areas, including trends analysis, keyword monitoring, search result and advertisement history, and advertisement spending statistics. 9 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 10. Accessing Social Media Data  Social media data is any type of data that can be gathered through social media. In general, the term refers to social media metrics and demographics collected through analytics tools on social platforms.  Social media data can also refer to data collected from content people post publicly on social media. This type of social media data for marketing can be collected through social listening tools. 10 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 11. Social Network Analysis  Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them.  SNA is the practice of representing networks of people as graphs and then exploring these graphs. A typical social network representation has nodes for people, and edges connecting two nodes to represent one or more relationships between them 11 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 12. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 12  The resulting graph can reveal patterns of connection among people. Small networks can be represented visually, and these visualizations are intuitive and may make apparent patterns of connections, and reveal nodes that are highly connected or which play a critical role in connecting groups together  Social network analysis (SNA) is a process of quantitative and qualitative analysis of a social network. SNA measures and maps the flow of relationships and relationship changes between knowledge-possessing entities.  Simple and complex entities include websites, computers, animals, humans, groups, organizations and nations.
  • 13. The benefits of social network: Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 13  Helps you understand your audience better  Used for customer segmentation  Used to design Recommendation Systems  Detect fake news, among other things
  • 14. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 14  Link Prediction:  Link prediction is one of the most important research topics in the field of graphs and networks. The objective of link prediction is to identify pairs of nodes that will either form a link or not in the future.
  • 15. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 15  Link prediction has a ton of use in real-world applications.  Predict which customers are likely to buy what products on online marketplaces like Amazon. It can help in making better product recommendations  Suggest interactions or collaborations between employees in an organization  Extract vital insights from terrorist networks
  • 16. Introduction to Natural Language Processing  Natural Language Processing is a branch of Computer Science that deals with the understanding and processing of natural language, e.g. texts or voice recordings.  The goal is for a machine to be able to communicate with humans in the same way that humans have been communicating with each other for centuries.  Learning a new language is not easy for us humans either and requires a lot of time and perseverance.  When a machine wants to learn a natural language, it is no different.  Therefore, some sub-areas have emerged within Natural Language Processing that are necessary for language to be completely understood. 16 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 17. Text Analytics  Tokenization  Bag of words  Word weighting : TF-IDF  N-Grams  Stop word  Stemming and Lemmatization  Synonyms and Part of speech tagging 17 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 18. Tokenization  The text is cut into pieces called “tokens” or “terms.”  These tokens are the most basic unit of information you’ll use for your model.  The terms are often words but this isn’t a necessity. Entire sentences can be used for analysis.  We’ll use unigrams: terms consisting of one word.  Often, however, it’s useful to include bigrams (two words per token) or trigrams (three words per token) to capture extra meaning and increase the performance of your models.  This does come at a cost, though, because you’re building bigger term-vectors by including bigrams and/or trigrams in the equation. 18 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 19. Bag of words  To build our classification model we’ll go with the bag of words approach.  Bag of words is the simplest way of structuring textual data: every document is turned into a word vector.  If a certain word is present in the vector it’s labeled “True”; the others are labeled “False”. Figure shows a simplified example of this, in case there are only two documents: one about the television show Game of Thrones and one about data science.  The two word vectors together form the document-term matrix.  The document-term matrix holds a column for every term and a row for every document 19 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 20. 20 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 21. Word weighting : TF-IDF  Term Frequency - Inverse Document Frequency (TF- IDF) is a widely used statistical method in natural language processing and information retrieval. It measures how important a term is within a document relative to a collection of documents (i.e., relative to a corpus). Words within a text document are transformed into importance numbers by a text vectorization process. There are many different text vectorization scoring schemes, with TF-IDF being one of the most common. 21 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 22.  As its name implies, TF-IDF vectorizes/scores a word by multiplying the word’s Term Frequency (TF) with the Inverse Document Frequency (IDF).  Term Frequency: TF of a term or word is the number of times the term appears in a document compared to the total number of words in the document.  22 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 23.  Inverse Document Frequency: IDF of a term reflects the proportion of documents in the corpus that contain the term. Words unique to a small percentage of documents (e.g., technical jargon terms) receive higher importance values than words common across all documents (e.g., a, the, and). 23 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 24.  The TF-IDF of a term is calculated by multiplying TF and IDF scores.  TF-IDF is useful in many natural language processing applications. For example, Search Engines use TF-IDF to rank the relevance of a document for a query. TF-IDF is also employed in text classification, text summarization, and topic modeling. 24 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 25. Example  Imagine the term ’t’ appears 20 times in a document that contains a total of 100 words.  Term Frequency (TF) of ’t’ can be calculated as follow:  Assume a collection of related documents contains 10,000 documents. If 100 documents out of 10,000 documents contain the term ’t’, Inverse Document Frequency (IDF) of ’t’ can be calculated as follows 25 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 26.  Using these two quantities, we can calculate TF-IDF score of the term ’t’ for the document. 26 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 27. N-Grams  N-gram can be defined as the contiguous sequence of n items from a given sample of text or speech. The items can be letters, words, or base pairs according to the application. The N-grams typically are collected from a text or speech corpus (A long text dataset).  N-grams of texts are extensively used in text mining and natural language processing tasks. They are basically a set of co-occurring words within a given window and when computing the n-grams you typically move one word forward (although you can move X words forward in more advanced scenarios). 27 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 28.  For example, for the sentence  “I reside in Bengaluru”. SL.No Type of n-gram Generated n-grams 1 Unigram [“I”, ”reside”, ”in”, ”Bengaluru”] 2 Bigram [“I reside”, ”reside in”, ”in Bengaluru”] 3 Trigram [“I reside in”, “reside in Bengaluru”] 28 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 29.  When N=1, this is referred to as unigrams and this is essentially the individual words in a sentence.  When N=2, this is called bigrams and  when N=3 this is called trigrams.  When N>3 this is usually referred to as four grams or five grams and so on.  How many N-grams in a sentence?  If X=Num of words in a given sentence K, the number of n-grams for sentence K would be: 29 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 30. Stop word  Stop words are a set of commonly used words in a language. Examples of stop words in English are “a,” “the,” “is,” “are,” etc.  Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so widely used that they carry very little useful information.  When to remove stop words?  If we have a task of text classification or sentiment analysis then we should remove stop words as they do not provide any information to our model, i.e keeping out unwanted words out of our corpus, but if we have the task of language translation then stopwords are useful, as they have to be translated along with other words. 30 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 31.  pros:  Stop words are often removed from the text before training deep learning and machine learning models since stop words occur in abundance, hence providing little to no unique information that can be used for classification or clustering.  On removing stop words, dataset size decreases, and the time to train the model also decreases without a huge impact on the accuracy of the model.  Stop word removal can potentially help in improving performance, as there are fewer and only significant tokens left. Thus, the classification accuracy could be improved 31 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 32.  cons:  Improper selection and removal of stop words can change the meaning of our text. So we have to be careful in choosing our stop words.  Ex: “ This movie is not good.” If we remove (not ) in pre-processing step the sentence (this movie is good) indicates that it is positive which is wrongly interpreted. 32 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 33. Stemming and Lemmatization  What is Stemming?  Stemming is a technique used to extract the base form of the words by removing affixes from them. It is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating, eats, eaten is eat.  Search engines use stemming for indexing the words. That’s why rather than storing all forms of a word, a search engine can store only the stems. In this way, stemming reduces the size of the index and increases retrieval accuracy. 33 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 34. What is Lemmatization?  Lemmatization is a development of Stemming and describes the process of grouping together the different inflected forms of a word so they can be analyzed as a single item.  Lemmatization is similar to Stemming but it brings context to the words. So it links words with similar meanings to one word.  Lemmatization algorithms usually also use positional arguments as inputs, such as whether the word is an adjective, noun, or verb. 34 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 35. Synonyms and Part of speech tagging  Part-of-speech (POS) tagging is a process in natural language processing (NLP) where each word in a text is labeled with its corresponding part of speech. This can include nouns, verbs, adjectives, and other grammatical categories.  POS tagging is useful for a variety of NLP tasks, such as information extraction, named entity recognition, and machine translation. It can also be used to identify the grammatical structure of a sentence and to disambiguate words that have multiple meanings.  POS tagging is typically performed using machine learning algorithms, which are trained on a large annotated corpus of text. The algorithm learns to predict the correct POS tag for a given word based on the context in which it appears. 35 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 36.  Why POS tagging?  POS tagging is an important part of NLP because it works as the prerequisite for further NLP analysis as follows −  Chunking  Syntax Parsing  Information extraction  Machine Translation  Sentiment Analysis  Grammar analysis & word-sense disambiguation 36 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 37.  Tagging a list of sentences  Rather than tagging a single sentence, the NLTK’s TaggerI class also provides us a tag_sents() method with the help of which we can tag a list of sentences. Following is the example in which we tagged two simple sentences  Un-tagging a sentence  We can also un-tag a sentence. NLTK provides nltk.tag.untag() method for this purpose. It will take a tagged sentence as input and provides a list of words without tags. 37 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 38. Use of Parts of Speech Tagging in NLP  To understand the grammatical structure of a sentence:  By labeling each word with its POS, we can better understand the syntax and structure of a sentence. This is useful for tasks such as machine translation and information extraction, where it is important to know how words relate to each other in the sentence.  To disambiguate words with multiple meanings:  Some words, such as “bank,” can have multiple meanings depending on the context in which they are used. By labeling each word with its POS, we can disambiguate these words and better understand their intended meaning.  To improve the accuracy of NLP tasks:  POS tagging can help improve the performance of various NLP tasks, such as named entity recognition and text classification. By providing additional context and information about the words in a text, we can build more accurate and sophisticated algorithms.  To facilitate research in linguistics:  POS tagging can also be used to study the patterns and characteristics of language use and to gain insights into the structure and function of different parts of speech. 38 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 39. Application of POS Tagging  Information extraction:  POS tagging can be used to identify specific types of information in a text, such as names, locations, and organizations. This is useful for tasks such as extracting data from news articles or building knowledge bases for artificial intelligence systems.  Named entity recognition:  POS tagging can be used to identify and classify named entities in a text, such as people, places, and organizations. This is useful for tasks such as building customer profiles or identifying key figures in a news story.  Text classification:  POS tagging can be used to help classify texts into different categories, such as spam emails or sentiment analysis. By analyzing the POS tags of the words in a text, algorithms can better understand the content and tone of the text.  Machine translation:  POS tagging can be used to help translate texts from one language to another by identifying the grammatical structure and relationships between words in the source language and mapping them to the target language.  Natural language generation:  POS tagging can be used to generate natural-sounding text by selecting appropriate words and constructing grammatically correct sentences. This is useful for tasks such as chatbots and virtual assistants. 39 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 40. Sentiment Analysis  Sentiment analysis is the process of classifying whether a block of text is positive, negative, or, neutral.  Sentiment analysis is a subset of natural language processing (NLP) that uses machine learning to analyze and classify the emotional tone of text data.  The goal which Sentiment analysis tries to gain is to be analyzed people’s opinions in a way that can help businesses expand.  It focuses not only on polarity (positive, negative & neutral) but also on emotions (happy, sad, angry, etc.)as well as intentions to buy. 40 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 41. Why Use Sentiment Analysis? Sentiment analysis is the contextual meaning of words that indicates the social sentiment of a brand and also helps the business to determine whether the product they are manufacturing is going to make a demand in the market or not. Businesses can use insights from sentiment analysis to improve their products, fine-tune marketing messages, correct misconceptions, and identify positive influencers. It’s very helpful in helping businesses to gain insights, understand customers, predict and enhance the customer experience, tailor marketing campaigns, and aid in decision- making. 41 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 42. Types of Sentiment Analysis Fine-grained sentiment analysis: • This depends on the polarity base. This category can be designed as very positive, positive, neutral, negative, or very negative. The rating is done on a scale of 1 to 5. If the rating is 5 then it is very positive, 2 then negative, and 3 then neutral. Emotion detection: • The sentiments happy, sad, angry, upset, jolly, pleasant, and so on come under emotion detection. It is also known as a lexicon method of sentiment analysis. Aspect-based sentiment analysis: • It focuses on a particular aspect for instance if a person wants to check the feature of the cell phone then it checks the aspect such as the battery, screen, and camera quality then aspect based is used. Multilingual sentiment analysis: • Multilingual consists of different languages where the classification needs to be done as positive, negative, and neutral. This is highly challenging and comparatively difficult. 42 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 43. Applications • If for instance the comments on social media side as Instagram, over here all the reviews are analyzed and categorized as positive, negative, and neutral. Social Media: • In the play store, all the comments in the form of 1 to 5 are done with the help of sentiment analysis approaches. Customer Service: • In the marketing area where a particular product needs to be reviewed as good or bad. Marketing Sector: • All the reviewers will have a look at the comments and will check and give the overall review of the product. Reviewer side: 43 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 44. Document or text summarization  Text summarization is a very useful and important part of Natural Language Processing (NLP).  We can summarize our text in a few lines by removing unimportant text and converting the same text into smaller semantic text form.  In this approach we build algorithms or programs which will reduce the text size and create a summary of our text data. This is called automatic text summarization in machine learning.  Text summarization is the process of creating shorter text without removing the semantic structure of text. 44 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 45. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 45  Text summarization is the practice of breaking down long publications into manageable paragraphs or sentences.  The procedure extracts important information while also ensuring that the paragraph's sense is preserved. This shortens the time it takes to comprehend long materials like research articles while without omitting critical information.  Text summarising presents a number of issues, including text identification, interpretation, and summary generation, as well as analysis of the resulting summary.  Identifying important phrases in the document and exploiting them to uncover relevant information to add in the summary are critical jobs in extraction-based summarising.
  • 46. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 46
  • 47. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 47 Two approaches to text summarization. Extraction based summarization Abstractive Summarization
  • 48. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 48  Extraction based summarization  The extractive text summarising approach entails extracting essential words from a source material and combining them to create a summary.  Without making any modifications to the texts, the extraction is done according to the given measure
  • 49. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 49  Abstractive Summarization  Another way of text summarization is abstractive summarization. We create new sentences from the original content in this step.  This is in contrast to our previous extractive technique, in which we only utilized the phrases that were present. It's possible that the phrases formed by abstractive summarization aren't present in the original text.  When abstraction is used for text summarization in deep learning issues, it can overcome the extractive method's grammatical errors.  Abstraction is more efficient than extraction. The text summarising algorithms necessary for abstraction, on the other hand, are more complex to build, which is why extraction is still widely used.
  • 50. Trend Analytics Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 50  Trend analysis – also known as technical analysis – is used to monitor metrics and their development over time. As such, the technique relies on effective historical analysis.  Trend analysis is a methodology used in research to gather and study data for prediction-making about future consumer behavior based on the trend analysis of observed and recorded data from past and ongoing trends.  It helps determine the main characteristics of the stock market and the consumers associated with it.  Trend analysis is the practice that gives us the ability to look at data over time for a long-running survey.
  • 51. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 51 Types of Trend Analysis Temporal Method Geographic Method Intuitive Method
  • 52. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 52  Temporal Method  This type of methodology is used to analyze patterns and trends of a given group of relevant data or objects of study in a specific cohort of time, as well as its change in that period.  A clear example of this type of study is longitudinal studies with the clear intention of detecting and analyzing trends that arise from historical trends.  It is mainly used in ethnographic research and other types of event-focused studies. The great disadvantage of this type of trend analysis is that it is exposed to many variables that could affect the final result of the study.
  • 53. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 53  Geographic Method  The geographic method of trend analysis is generally easy and reliable; it can be the means to identify commonalities and differences between user groups belonging to the same or different geographies.  The main purpose of the geographic method is the analysis of market trends that develop in groups of users identified by their geographic location.  The downside of the geographic method is consequently the geographic limitation for data analysis, which can be influenced by factors such as culture and traditions that are specific to the geographic location user groups.
  • 54. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 54  Intuitive Method  The intuitive method is a type of trend analysis implemented to analyze trends within groups of users based on logical explanations, behavioral patterns, or other elements perceived by a futurist.  This market trend analysis is helpful for prediction- making without the need for large amounts of statistical data. However, some issues with the methodology are the overreliance on knowledge and logic provided by futurists and researchers, which makes it prone to become biased to its researcher.  The intuitive method is the most difficult type of trend analysis and might not be as precise.
  • 55. Challenges to Social media analytics Data cleansing • cleaning unstructured textual data (e.g., normalizing text), especially high-frequency streamed real-time data, still presents numerous problems and research challenges. Scraping • although social media data is accessible through APIs, due to the commercial value of the data, most of the major sources such as Facebook and Google are making it increasingly difficult for academics to obtain comprehensive access to their ‘raw’ data; very few social data sources provide affordable data offerings to academia and researchers. News services such as Thomson Reuters and Bloomberg typicallycharge a premium for access to their data. 55 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 56. Scraping • In contrast, Twitter has recently announced the Twitter Data Grants program, where researchers can apply to get access to Twitter’s public tweets and historical data in order to get insights from its massive set of data (Twitter has more than 500 million tweets a day). Data protection • once you have created a ‘big data’ resource, the data needs to be secured, ownership and IP issues resolved (i.e., storing scraped data is against most of the publishers’ terms of service), and users provided with different levels of access; otherwise, users may attempt to ‘suck’ all the valuable data from the database. 56 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 57. Holistic data sources • researchers are increasingly bringing together and combining novel data sources: social media data, real- time market & customer data and geospatial data for analysis. Data visualization • visual representation of data whereby information that has been abstracted in some schematic form with the goal of communicating information clearly and effectively through graphical means. Given the magnitude of the data involved, visualization is becoming increasingly important. 57 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 58. Analytics dashboards • many social media platforms require users to write APIs to access feeds or program analytics models in a programming language, such as Java. • While reasonable for computer scientists, these skills are typically beyond most (social science) researchers. • Non-programming interfaces are required for giving what might be referred to as ‘deep’ access to ‘raw’ data, for example, configuring APIs, merging social media feeds, combining holistic sources and developing analytical models. 58 Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi
  • 60. Asst. Prof. Rushikesh Chikane, MIT ACSC, Alandi 60 THANK YOU