SlideShare una empresa de Scribd logo
1 de 16
Project Final Report
Exploratory Analysis of Social Media
Images to Inform Product Innovation,
Marketing & Promotions
Big Data Analytics Summer2015
Matthew Blough, Eric DeFina, Zixin Mao, Sandilya Tumma
8/12/2015
1
Abstract
Social Listening is an established activity allowing organizations to generate
consumer/customer insights and make more informed business decisions from public data in
social media. While traditionally based on text analytics tools, the rise of platforms such as
Instagram, Pinterest, Snapchat, Tumblr and more, have transformed the content, and therefore
data, generated in social media. As such, the analysis of unstructured data from images will be
critical to “social listening” on today’s platforms to fully understand context, sentiment,
meaning, and more. Through this research, we will explore whether we can use big data
platforms to read, analyze (trends, commonalities) and summarize unstructured data from social
media images to develop insights that feed business and marketing decisions for an online travel
agency company (e.g. Travelocity).
Introduction
Social Listening is an activity that allows brands and organizations to learn from public
data generated by consumers in social networks. By mining this unstructured data, companies
can generate insights from observing online consumer conversation, and then use these insights
to make smarter and more informed business decisions, such as product innovations, decisions
and changes, marketing campaigns, promotional offers, and more.
Over the past couple years, however, there has been a transformational shift in the
content published by consumers to social networks. With the rise of platforms such as
Instagram, Pinterest, Snapchat, and more, social conversation has become dominated by visual
communication and content. In addition, “traditional” social networks such as Facebook and
Twitter have also seen an influx of visual posts verses traditional comments, tags, and other text-
2
based content. In 2014, 500 million image-based posts shared each day in social media, often
without the pairing of text to provide context to the image. This shift has been largely enabled
by the advancement and adoption of smartphones, as well as faster data connection speeds. For
users, visuals are easier and faster to consume.
In order to continue to mine the full sphere of social media for business insights and
questions, we must go beyond text analytics and use big data tools to collect and analyze
imagery quickly and efficiently. As image content now dominates the social web, it will be
critical to understand the context, sentiment and meaning of images in the same way tools have
historically parsed this data from text.
Business Significance
Market research, and specifically understanding consumer needs and the market
environment, has long been a tenant of running a successful and profitable business. In 2014
alone, the market research industry boasted of over $40 billion in sales globally. Today, the
Internet and rise of social media has created new opportunities for research and insights. It is no
longer necessary, in many cases, to set up formal, and expensive, studies in order to understand
and listen to consumers. In addition, the scale and size of the data has offered the ability to
analyze behavior of much larger groups of people compared to the smaller sample sizes of
traditional research studies. Since access to social media has become freely available to
interested organizations, many have turned to the analysis of this massive public data set as a
new form of consumer, market, and competitive insights.
Advertising has been a key activity for large online travel agencies to convert consumers
and drive sales. Expedia and Travelocity along spent over $4 billion in advertising in 2014
3
alone. To make that advertising most effective, it is critical to understand the consumer insight
and create advertisement plans that can drive consumers down the purchase funnel, from product
awareness, to actually purchasing a trip. Analysis of social media images can provide these key
insights that can bolster our advertisement effectiveness and ultimately sales. By knowing what
the types of images consumers are posting, and what the images consist of, we can draw
conclusions of what travel options consumers are looking for, who people most commonly travel
with, and what activities they are doing. This information, far more insightful than transactional
data we have traditionally had access to, can be utilized to create more engaging opportunities
for our advertising creativity and promotional bundles to better meet the wants and needs of our
target audience.
Problem Statement
An online travel agency, TravelWeb, would like to determine the most effective new
advertising and promotion campaign based on consumers’ travel behavior, activity and trends, in
order to increase sales. Based on information extracted from social media imagery, we want to
answer:
• What imagery and creative should we be using for our marketing campaigns, ads,
website imagery and social media content?
• What deals and packages should we be offering?
• How should we structure our offers to best meet the wants and desires of
consumers?
• What bundles and deals should we create?
4
Methodology
The process of this project is to take a large set of unstructured data from social media in
the form of images, transform it into a structured data set by using a computer vision algorithm,
then analyze the structured data set via data mining techniques in order to gain insights into
consumer compositions and preferences.
The first step of our methodology is hypothesis development, in other words, we needed
to outline our business interests. Our hypothesis spans across several topics. One is to understand
type of imagery, whether it may be hiking, camping, skiing, cruise travel, can be used to create
marketing campaigns or ads for prospective clients. Another hypothesis is to see who people are
traveling with in order to understand how to cater to their needs and interests on vacation trips.
What kind of bundles should be offered in terms of activities, foods, excursions based off of
these social media images. For example, people generally enjoy taking pictures on water skiing
more than when sitting around a campfire due to the thrill of the activity. Traveling companies
can offer packages for jet skiing in order to maximize revenue for that specific activity.
The dataset of images has already been given to us. Following Figure 1.2, the next task is
to take the images and run them through Microsoft ComputerVision API to extract structured
quantitative and qualitative information in each picture. This provides information on facial
recognition features such as gender and age, image colors, object categories, and how well these
predictions are doing in terms of a score. The dataset has over 150,000 images and the big data
platform can be useful in running these pictures quickly and efficiently. Since API calls are made
per image and one output in JSON format is produced for each image, we end up with 150,000
individual JSON records.
5
Figure 1.1
While these JSON records we obtained from ComputerVision API are structured data, in
order to conveniently perform analysis, the data set needs to be further transformed into a
relational data structure. To achieve this takes two steps. The first step is to aggregate the
individual JSON records into one single file. This is necessary due to the flexible nature of JSON
format, i.e. it doesn’t require individual files to share the same number of fields. Therefore, in
order to make sure fields from all the files are included, these files need to be properly
aggregated. The second step is converting the single JSON data file into a simple relational
structures with columns and rows. To accomplish this, we utilized a tool named Konklone.
•Background Information
•Hypothesis Statement
•Business Insight
•Big Data usefulness
Hypothesis
•Vision API
•Python to extract raw data
•Clean up the data for analysis (ETL process)
Data Collection/ ETL
•Data Mining Tool  IBM SPSS Modeler
•Classification
•Clustering
•Association
•Exploratory analysis
Analysis
•Find insight for business value
•Business decisions for advertising companies
Expected
Conclusion/Implication
6
After a relational database is constructed, it is time to perform analytics. The platforms
we selected are Microsoft Azure and its Machine Learning Studio component. Azure is a
powerful big data platform with easy navigation and access to numerous plug-ins. Its Machine
Learning Studio allows us to apply different types of analytical techniques on the data. We can
easily perform descriptive analytics by slicing and dicing the data set using SQLite queries then
calculate their statistics. At a little more advanced level, we can create and run data mining
models such as classification, clustering and association. Here we can attempt to find different
patterns, trends, and correlations which can be useful to the business insight at hand. The
business implication here is to see what kind of images consumers are taking and begin advising
traveling companies and/or agencies on how to better promote their advertisements to specific
activities and leisure events. The goal here is to assist these companies in increasing revenue and
maximizing profits so that there are no dead costs in promoting the wrong activities. Why
promote parasailing at a location which isn’t suited for that type of activity as opposed to
parasailing on some off shore islands which is much better with consumers taking daily pictures?
7
Figure 1.2
Project Domain
The project domain is broken down by the ETL process - Extraction, Transformation,
Load. Extraction is the challenging aspect of the process where we must connect the online
vision API to Microsoft Azure. This will allow us to feed images into python so that the script
can pull information from the API and give us output JSON files on each image. Python will be
running a loop function to run through all the images on the directory folder and spit out
thousands of files for analysis. Once we have compiled these JSON files together, we are ready
to transform these files into useful data. We have a couple options here. We can go either go
through Amazon Web Service’s MapReduce in order to compress the data into one big file. Once
we have one big file of the unstructured data, we can run this through Microsoft Excel as a CSV
file and clean it up as a proper dataset. This dataset will be our primary source of analysis once
we load it into the IBM SPSS Modeler application. The load process takes us to the modeler
where we can perform exploratory analysis and find key insights into the data. The key findings
Data Extraction
• Vision API
• Py thon
• JSON
Data Transf ormation
(Classif ication)
• Clean up data
f rom JSON output
• AWS/Cloudera
• Microsof t Excel
Data Loading
• Creation of CSV
f iles
• Load into IBM
SPSS Modeler
• Use of Text
Mining application
of SPSS Modeler
Data Analy sis
• Classif ication
• Clustering
• Association
• Word
count/f requency
8
are what will be useful to businesses promoting their vacation packages in a more efficient
manner and spending resources where they find it best to maximize revenue.
Analytical Methods
We are looking to use three main categories of analytics: Classification, clustering, and
association. Classification will allow us to understand how categories of color, objects, age, and
scores are seen together as a large collection of images. Which image characteristics are more
common with social media images? Is there a commonality of pictures being taken of a younger
generation than the elder generation? Classification can help us understand the trend of these
images. Another tool we can look into is clustering. Clustering allows us to group together
characteristics of images which have more relation with each other. This can be with colors or
category images, just to name a few. The association method gives us connection analysis on
images of various categories. Age, gender, and color attributes can be analyzed to see which
combination of characteristics are closely associated with each other. Observing images which
are associated with each other should have a high confidence % in determining how closely they
are related to each other. One example of this: Are there more buildings in the background vs.
pictures of faces with buildings in the background?
Output/Results
We were interested in looking into insightful results through exploratory analysis and see
if we can identify any patterns of trends throughout the dataset of image information. To start
this off we built a simple model in Microsoft Azure identifying descriptive statistics from
important variables most relevant to the insight. Figure 1.3 shows the different nodes connected
9
as we imported our dataset in the reader node and connected it to the project columns to select
the columns which we were most interested in looking at it. The project columns resembles a
“filter” node from other applications and we are able to concentrate on specific variables which
are of value to us. Lastly the descriptive statistics node had to be connected to the project
columns in order to spit out the statistical values of our categories for analysis.
Figure 1.3
In order to get some deeper insights, we needed to drill down the data set by slicing and
dicing it. For instance, an interesting aspect to look at is consumer composition by gender,
gender association and age groups. Figure 1.4 shows us the different slices we created for our
analysis. For example, we ran the following query to isolate records about images with two male
faces.
10
Another absolutely important use for queries is to filter trustworthy data from noisy data.
This is important because although computer vision is getting more accurate day by day, it is still
far from being 100% accurate. Therefore, we need to take into account of filtering out data that
have very low prediction accuracy or ambiguous categories that are too generic for any
meaningful analysis. Take the following query for example, it removes records in which
ComputerVision API produced a prediction accuracy of lower than 10% in the first object
category. In addition, it takes out records that are categorized as “abstract” or “others”.
Figure 1.5 shows us the detailed results of the statistics of various types. The count,
median, mode, range, min, max, average statistics are displayed for numeric variables. Especially
for variables such as category score and face age, where numeric values are given to these
categories and we can identify certain patterns such as the average age of faces being produced
from this collection of images is around the age of 30. But you can also identify face ages which
vary as low as 1 and as high as 96.
11
Figure 1.4
Figure 1.5
12
Clustering analysis helps identify similar characteristics grouped together. With image
clustering, one can identify the different types of images which are similar to each other through
various measurements. One of them being the Euclidian distance which allows to determine how
far one cluster is from another. Figure 1.6 shows how the cluster model was built through
Microsoft Azure. A reader and project column node were once again inserted to filter through
selected variables. We were most interested in identifying 3 columns of categories along with
their main color categorization. Then we added the “train clustering model” model in order to
train the variables into forming four different clusters. This process took some extra time as it
had to train the model so that we can extract results from it. Running it through azure did take 5
times as fast as running through other platforms such as IBM SPSS Modeler. This was a major
advantage to us from a big data perspective. Running 120,000+ images can be done in a quicker
process through azure than other platforms. The K-Means clustering node was added to cluster
our final results together. The metadata editor was used to name the cluster names as 1, 2, 3, 4
numeric values.
Figure 1.6
13
The clustering results were extracted into a CSV format through the K-Means Clustering node.
From here we compiled the CSV table results into clustering results shown in Figure 1.7. Four
clusters formed categories and colors which were in close proximity to each other in
characteristics. They were all distinctively different and shows the type of category names
associated with the colors.
Figure 1.7
Cluster 1:
Outdoor
Building
Street Tree Text
Grey
Black
Cluster 2:
Food
Drinks
Crowd People
Yellow Blue
Green Black
Cluster 3:
Abstract
Others
Cluster 4:
Beach
Water
Sky
Blue
White
14
Scope & Limitations
The main limit of this project is the amount of data. With more data the scores generated
would be more robust resulting in greater precision and accuracy in analyzing the images.
Moreover, our data being images from just one source is another limit. While the old adage, “a
picture is worth a thousand words,” may stand true, we are hoping to narrow results to find the
most important elements of description for analysis of an image. Additionally, due to the lack of
specificity of the ComputerVision API, classifications of the data were unable to be performed
for enhanced insights.
Policy/Managerial Implications
Improved picture recognition and description can allow managers to bolster “social
listening” on today’s media platforms to better understand context, sentiment, and the meaning
behind why a picture is shared and the context of the image. With enhanced image recognition,
more informed business decisions, such as product innovations, decisions and changes,
marketing campaigns, and promotional offers can emulate successful targeted fields for own
campaigns. Moreover, this project can help understand which elements of an image make it go
viral. Greater analysis and understanding of what customers take pictures of and what they share
allows a business to create a better product search for customers as well.
Conclusions & Future Research
Images are the new text on the web. They are easy to share and more engaging than text.
The trend will continue in favor of images and we believe that analysis of images will grow
tremendously in the coming years. Expanding on the importance of social listening, more
insights can be drawn from interpreting pictures. Enhancements in image perspective analysis,
15
GPS and sentiment overlay, will improve clustering and classification in order to better predict
what appeals to specific customers to increase sales. Social media company Snapchat, an
ephemeral photo and video sharing app currently charges $400,000 worth of ad space for a story
generating 20 million views. Meanwhile, Facebook has utilized its massive storage of photos to
developed a way to recognize people in photos even if their faces are obstructed, identified
individuals with 83% accuracy using a method dubbed PIPER, an acronym for pose invariant
person recognition. As the quantity of images shared online increases the quality of data
algorithms processing photos will bolster analysis. Why pictures were taken, understanding the
important elements, what sparked the instance, and how to better react and cater to what
customer desires are the driving forces on how image analytics will proceed in the future.
Sources
1. http://www.fastcompany.com/3000794/rise-visual-social-media
2. http://blogs.adobe.com/digitalmarketing/social-media/visual-social-snapchat-pinterest-
and-the-rise-of-media-rich-marketing/
3. http://wersm.com/visual-web-the-next-big-thing/
4. http://blogs.wsj.com/digits/2015/06/23/facebook-claims-photo-recognition-breakthrough/
5. http://recode.net/2015/06/17/snapchats-making-some-pretty-serious-money-from-live-
stories/
6. https://www.esomar.org/uploads/industry/reports/global-market-research-
2014/ESOMAR-GMR2014-Preview.pdf
7. http://skift.com/2015/02/20/priceline-and-expedias-advertising-arms-race-in-2014/
8. Mary Meeker; 2014 Internet Trends Report
9. https://www.forrester.com/Big+Datas+Big+Meaning+For+Marketing/quickscan/-/E-
res114782
10. http://www.forbes.com/sites/groupthink/2015/05/01/visual-listening-social-medias-next-
frontier/3/

Más contenido relacionado

La actualidad más candente

Internet advertising an interplay among advertisers,online publishers, ad exc...
Internet advertising an interplay among advertisers,online publishers, ad exc...Internet advertising an interplay among advertisers,online publishers, ad exc...
Internet advertising an interplay among advertisers,online publishers, ad exc...Trieu Nguyen
 
Marketing Research Fall 2012
Marketing Research Fall 2012Marketing Research Fall 2012
Marketing Research Fall 2012Randy Brandt
 
Managing Nonprofit News Sites with Web Analytics
Managing Nonprofit News Sites with Web AnalyticsManaging Nonprofit News Sites with Web Analytics
Managing Nonprofit News Sites with Web AnalyticsDana Chinn
 
2009.04 Digital Outlook Raport - raport Razorfish
2009.04 Digital Outlook Raport - raport Razorfish2009.04 Digital Outlook Raport - raport Razorfish
2009.04 Digital Outlook Raport - raport RazorfishARBOinteractive Polska
 
Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...
Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...
Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...Kodexhub
 
Social Media CRM
Social Media CRMSocial Media CRM
Social Media CRMLee Dixon
 
Activating Data to Drive Performance
Activating Data to Drive PerformanceActivating Data to Drive Performance
Activating Data to Drive PerformancePerformics
 
Vivid Visions FINAL REPORT
Vivid Visions FINAL REPORTVivid Visions FINAL REPORT
Vivid Visions FINAL REPORTAnastasia Kilmer
 
Integrating social media monitoring, analytics and engagment marshall sponde...
Integrating social media monitoring, analytics and engagment  marshall sponde...Integrating social media monitoring, analytics and engagment  marshall sponde...
Integrating social media monitoring, analytics and engagment marshall sponde...Marshall Sponder
 
Text Analytics Summit - Text Analytics – The Foundation for Social Business C...
Text Analytics Summit - Text Analytics – The Foundation for Social Business C...Text Analytics Summit - Text Analytics – The Foundation for Social Business C...
Text Analytics Summit - Text Analytics – The Foundation for Social Business C...Collective Intellect
 
Mec gs native_advertising_deck final
Mec gs native_advertising_deck finalMec gs native_advertising_deck final
Mec gs native_advertising_deck finalinout18
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Our Social Times
 
Email marketing-and-marketing-automation-in-complex-buying-process
Email marketing-and-marketing-automation-in-complex-buying-processEmail marketing-and-marketing-automation-in-complex-buying-process
Email marketing-and-marketing-automation-in-complex-buying-processChad Hollingsworth
 
thinkLA Search 101: Search Retargeting 101
thinkLA Search 101: Search Retargeting 101thinkLA Search 101: Search Retargeting 101
thinkLA Search 101: Search Retargeting 101thinkLA
 
An Exploration of Volunteered Geographic Information stakeholders
An Exploration of Volunteered Geographic Information stakeholdersAn Exploration of Volunteered Geographic Information stakeholders
An Exploration of Volunteered Geographic Information stakeholdersChristopher J. Parker
 
30205471 understanding-the-value-of-a-social-media-impression
30205471 understanding-the-value-of-a-social-media-impression30205471 understanding-the-value-of-a-social-media-impression
30205471 understanding-the-value-of-a-social-media-impressionkrishna Reddy
 

La actualidad más candente (20)

Internet advertising an interplay among advertisers,online publishers, ad exc...
Internet advertising an interplay among advertisers,online publishers, ad exc...Internet advertising an interplay among advertisers,online publishers, ad exc...
Internet advertising an interplay among advertisers,online publishers, ad exc...
 
Marketing Research Fall 2012
Marketing Research Fall 2012Marketing Research Fall 2012
Marketing Research Fall 2012
 
Managing Nonprofit News Sites with Web Analytics
Managing Nonprofit News Sites with Web AnalyticsManaging Nonprofit News Sites with Web Analytics
Managing Nonprofit News Sites with Web Analytics
 
People-Based Marketing
People-Based MarketingPeople-Based Marketing
People-Based Marketing
 
2009.04 Digital Outlook Raport - raport Razorfish
2009.04 Digital Outlook Raport - raport Razorfish2009.04 Digital Outlook Raport - raport Razorfish
2009.04 Digital Outlook Raport - raport Razorfish
 
Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...
Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...
Summer Internship Project Report | The Fundamental of Digital Marketing | Lov...
 
Social Media CRM
Social Media CRMSocial Media CRM
Social Media CRM
 
Activating Data to Drive Performance
Activating Data to Drive PerformanceActivating Data to Drive Performance
Activating Data to Drive Performance
 
Vivid Visions FINAL REPORT
Vivid Visions FINAL REPORTVivid Visions FINAL REPORT
Vivid Visions FINAL REPORT
 
Humanizing ai final
Humanizing ai   finalHumanizing ai   final
Humanizing ai final
 
Integrating social media monitoring, analytics and engagment marshall sponde...
Integrating social media monitoring, analytics and engagment  marshall sponde...Integrating social media monitoring, analytics and engagment  marshall sponde...
Integrating social media monitoring, analytics and engagment marshall sponde...
 
Text Analytics Summit - Text Analytics – The Foundation for Social Business C...
Text Analytics Summit - Text Analytics – The Foundation for Social Business C...Text Analytics Summit - Text Analytics – The Foundation for Social Business C...
Text Analytics Summit - Text Analytics – The Foundation for Social Business C...
 
Online marketing
Online marketingOnline marketing
Online marketing
 
Mec gs native_advertising_deck final
Mec gs native_advertising_deck finalMec gs native_advertising_deck final
Mec gs native_advertising_deck final
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13
 
Email marketing-and-marketing-automation-in-complex-buying-process
Email marketing-and-marketing-automation-in-complex-buying-processEmail marketing-and-marketing-automation-in-complex-buying-process
Email marketing-and-marketing-automation-in-complex-buying-process
 
Rediff.com
Rediff.comRediff.com
Rediff.com
 
thinkLA Search 101: Search Retargeting 101
thinkLA Search 101: Search Retargeting 101thinkLA Search 101: Search Retargeting 101
thinkLA Search 101: Search Retargeting 101
 
An Exploration of Volunteered Geographic Information stakeholders
An Exploration of Volunteered Geographic Information stakeholdersAn Exploration of Volunteered Geographic Information stakeholders
An Exploration of Volunteered Geographic Information stakeholders
 
30205471 understanding-the-value-of-a-social-media-impression
30205471 understanding-the-value-of-a-social-media-impression30205471 understanding-the-value-of-a-social-media-impression
30205471 understanding-the-value-of-a-social-media-impression
 

Destacado

Circular 1-2016-firmada-personas-jurídicas
Circular 1-2016-firmada-personas-jurídicasCircular 1-2016-firmada-personas-jurídicas
Circular 1-2016-firmada-personas-jurídicaswilly_
 
Habilidades directivasss
Habilidades directivasssHabilidades directivasss
Habilidades directivasssWilmary Amaya
 
Influence of donor-donor transport on excitation energy transfer in binary sy...
Influence of donor-donor transport on excitation energy transfer in binary sy...Influence of donor-donor transport on excitation energy transfer in binary sy...
Influence of donor-donor transport on excitation energy transfer in binary sy...iosrjce
 
Оптовые продажи онлайн - тренд года!
Оптовые продажи онлайн - тренд года! Оптовые продажи онлайн - тренд года!
Оптовые продажи онлайн - тренд года! Виктория Шим
 

Destacado (8)

Circular 1-2016-firmada-personas-jurídicas
Circular 1-2016-firmada-personas-jurídicasCircular 1-2016-firmada-personas-jurídicas
Circular 1-2016-firmada-personas-jurídicas
 
Habilidades directivasss
Habilidades directivasssHabilidades directivasss
Habilidades directivasss
 
Exposicion de danela
Exposicion de danelaExposicion de danela
Exposicion de danela
 
Resume
ResumeResume
Resume
 
Influence of donor-donor transport on excitation energy transfer in binary sy...
Influence of donor-donor transport on excitation energy transfer in binary sy...Influence of donor-donor transport on excitation energy transfer in binary sy...
Influence of donor-donor transport on excitation energy transfer in binary sy...
 
Оптовые продажи онлайн - тренд года!
Оптовые продажи онлайн - тренд года! Оптовые продажи онлайн - тренд года!
Оптовые продажи онлайн - тренд года!
 
Presentation2
Presentation2Presentation2
Presentation2
 
Jazmin y gissela
Jazmin y gisselaJazmin y gissela
Jazmin y gissela
 

Similar a Big Data Final Paper - Warriors Final

Social Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization ToolsSocial Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization ToolsSayani Majumder
 
Data storytelling neptune digital space dubai
Data storytelling   neptune digital space dubaiData storytelling   neptune digital space dubai
Data storytelling neptune digital space dubaiNeptune Digital Space
 
Social-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-Making
Social-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-MakingSocial-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-Making
Social-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-MakingAmit Shah
 
Running head FINAL REPORT2FINAL REPORT2.docx
Running head FINAL REPORT2FINAL REPORT2.docxRunning head FINAL REPORT2FINAL REPORT2.docx
Running head FINAL REPORT2FINAL REPORT2.docxjeanettehully
 
Evolution of Digital Marketing and its Impact on Privacy
Evolution of Digital Marketing and its Impact on PrivacyEvolution of Digital Marketing and its Impact on Privacy
Evolution of Digital Marketing and its Impact on Privacyijtsrd
 
SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...
SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...
SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...PerformanceIN
 
The Marketer’s Guide to Social Customer Data
The Marketer’s Guide to Social Customer DataThe Marketer’s Guide to Social Customer Data
The Marketer’s Guide to Social Customer DataEvgeny Tsarkov
 
Big data
Big data Big data
Big data VedNaik
 
SP192221
SP192221SP192221
SP192221VedNaik
 
Big data
Big data Big data
Big data VedNaik
 
about digital marketing
about digital marketing about digital marketing
about digital marketing HEERASINGH37
 
A strategic framework for digital measurement
A strategic framework for digital measurementA strategic framework for digital measurement
A strategic framework for digital measurementPeter Isaksson
 
Performics 2014 Digital Trends: Participation Activated
Performics 2014 Digital Trends: Participation ActivatedPerformics 2014 Digital Trends: Participation Activated
Performics 2014 Digital Trends: Participation ActivatedPerformics
 
Digital Marketing Project Report
Digital Marketing Project Report Digital Marketing Project Report
Digital Marketing Project Report KashikaJauhari
 
Social media analytics - 5 key trends
Social media analytics - 5 key trendsSocial media analytics - 5 key trends
Social media analytics - 5 key trendsNewspoint Sp. z o.o.
 
AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4Jeremy Chia
 
Digital marketing analytics paths of value - 12-4-17
Digital marketing analytics   paths of value - 12-4-17Digital marketing analytics   paths of value - 12-4-17
Digital marketing analytics paths of value - 12-4-17Marshall Sponder
 

Similar a Big Data Final Paper - Warriors Final (20)

Social Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization ToolsSocial Media Data Analysis and Visualization Tools
Social Media Data Analysis and Visualization Tools
 
Data storytelling neptune digital space dubai
Data storytelling   neptune digital space dubaiData storytelling   neptune digital space dubai
Data storytelling neptune digital space dubai
 
Social-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-Making
Social-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-MakingSocial-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-Making
Social-Media-Analytics-Enabling-Intelligent-Real-Time-Decision-Making
 
Running head FINAL REPORT2FINAL REPORT2.docx
Running head FINAL REPORT2FINAL REPORT2.docxRunning head FINAL REPORT2FINAL REPORT2.docx
Running head FINAL REPORT2FINAL REPORT2.docx
 
July Update Breakfast
July Update BreakfastJuly Update Breakfast
July Update Breakfast
 
Evolution of Digital Marketing and its Impact on Privacy
Evolution of Digital Marketing and its Impact on PrivacyEvolution of Digital Marketing and its Impact on Privacy
Evolution of Digital Marketing and its Impact on Privacy
 
SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...
SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...
SAAL A - 1445 - KEYNOTE - The Content Marketing Overhaul with Mike King, iPul...
 
The Marketer’s Guide to Social Customer Data
The Marketer’s Guide to Social Customer DataThe Marketer’s Guide to Social Customer Data
The Marketer’s Guide to Social Customer Data
 
Big data
Big data Big data
Big data
 
SP192221
SP192221SP192221
SP192221
 
Big data
Big data Big data
Big data
 
about digital marketing
about digital marketing about digital marketing
about digital marketing
 
A strategic framework for digital measurement
A strategic framework for digital measurementA strategic framework for digital measurement
A strategic framework for digital measurement
 
Performics 2014 Digital Trends: Participation Activated
Performics 2014 Digital Trends: Participation ActivatedPerformics 2014 Digital Trends: Participation Activated
Performics 2014 Digital Trends: Participation Activated
 
Digital Marketing Project Report
Digital Marketing Project Report Digital Marketing Project Report
Digital Marketing Project Report
 
How does big data impact you
How does big data impact youHow does big data impact you
How does big data impact you
 
Social media analytics - 5 key trends
Social media analytics - 5 key trendsSocial media analytics - 5 key trends
Social media analytics - 5 key trends
 
AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4AB1401-SEM5-GROUP4
AB1401-SEM5-GROUP4
 
Web for Non-Profits
Web for Non-ProfitsWeb for Non-Profits
Web for Non-Profits
 
Digital marketing analytics paths of value - 12-4-17
Digital marketing analytics   paths of value - 12-4-17Digital marketing analytics   paths of value - 12-4-17
Digital marketing analytics paths of value - 12-4-17
 

Big Data Final Paper - Warriors Final

  • 1. Project Final Report Exploratory Analysis of Social Media Images to Inform Product Innovation, Marketing & Promotions Big Data Analytics Summer2015 Matthew Blough, Eric DeFina, Zixin Mao, Sandilya Tumma 8/12/2015
  • 2. 1 Abstract Social Listening is an established activity allowing organizations to generate consumer/customer insights and make more informed business decisions from public data in social media. While traditionally based on text analytics tools, the rise of platforms such as Instagram, Pinterest, Snapchat, Tumblr and more, have transformed the content, and therefore data, generated in social media. As such, the analysis of unstructured data from images will be critical to “social listening” on today’s platforms to fully understand context, sentiment, meaning, and more. Through this research, we will explore whether we can use big data platforms to read, analyze (trends, commonalities) and summarize unstructured data from social media images to develop insights that feed business and marketing decisions for an online travel agency company (e.g. Travelocity). Introduction Social Listening is an activity that allows brands and organizations to learn from public data generated by consumers in social networks. By mining this unstructured data, companies can generate insights from observing online consumer conversation, and then use these insights to make smarter and more informed business decisions, such as product innovations, decisions and changes, marketing campaigns, promotional offers, and more. Over the past couple years, however, there has been a transformational shift in the content published by consumers to social networks. With the rise of platforms such as Instagram, Pinterest, Snapchat, and more, social conversation has become dominated by visual communication and content. In addition, “traditional” social networks such as Facebook and Twitter have also seen an influx of visual posts verses traditional comments, tags, and other text-
  • 3. 2 based content. In 2014, 500 million image-based posts shared each day in social media, often without the pairing of text to provide context to the image. This shift has been largely enabled by the advancement and adoption of smartphones, as well as faster data connection speeds. For users, visuals are easier and faster to consume. In order to continue to mine the full sphere of social media for business insights and questions, we must go beyond text analytics and use big data tools to collect and analyze imagery quickly and efficiently. As image content now dominates the social web, it will be critical to understand the context, sentiment and meaning of images in the same way tools have historically parsed this data from text. Business Significance Market research, and specifically understanding consumer needs and the market environment, has long been a tenant of running a successful and profitable business. In 2014 alone, the market research industry boasted of over $40 billion in sales globally. Today, the Internet and rise of social media has created new opportunities for research and insights. It is no longer necessary, in many cases, to set up formal, and expensive, studies in order to understand and listen to consumers. In addition, the scale and size of the data has offered the ability to analyze behavior of much larger groups of people compared to the smaller sample sizes of traditional research studies. Since access to social media has become freely available to interested organizations, many have turned to the analysis of this massive public data set as a new form of consumer, market, and competitive insights. Advertising has been a key activity for large online travel agencies to convert consumers and drive sales. Expedia and Travelocity along spent over $4 billion in advertising in 2014
  • 4. 3 alone. To make that advertising most effective, it is critical to understand the consumer insight and create advertisement plans that can drive consumers down the purchase funnel, from product awareness, to actually purchasing a trip. Analysis of social media images can provide these key insights that can bolster our advertisement effectiveness and ultimately sales. By knowing what the types of images consumers are posting, and what the images consist of, we can draw conclusions of what travel options consumers are looking for, who people most commonly travel with, and what activities they are doing. This information, far more insightful than transactional data we have traditionally had access to, can be utilized to create more engaging opportunities for our advertising creativity and promotional bundles to better meet the wants and needs of our target audience. Problem Statement An online travel agency, TravelWeb, would like to determine the most effective new advertising and promotion campaign based on consumers’ travel behavior, activity and trends, in order to increase sales. Based on information extracted from social media imagery, we want to answer: • What imagery and creative should we be using for our marketing campaigns, ads, website imagery and social media content? • What deals and packages should we be offering? • How should we structure our offers to best meet the wants and desires of consumers? • What bundles and deals should we create?
  • 5. 4 Methodology The process of this project is to take a large set of unstructured data from social media in the form of images, transform it into a structured data set by using a computer vision algorithm, then analyze the structured data set via data mining techniques in order to gain insights into consumer compositions and preferences. The first step of our methodology is hypothesis development, in other words, we needed to outline our business interests. Our hypothesis spans across several topics. One is to understand type of imagery, whether it may be hiking, camping, skiing, cruise travel, can be used to create marketing campaigns or ads for prospective clients. Another hypothesis is to see who people are traveling with in order to understand how to cater to their needs and interests on vacation trips. What kind of bundles should be offered in terms of activities, foods, excursions based off of these social media images. For example, people generally enjoy taking pictures on water skiing more than when sitting around a campfire due to the thrill of the activity. Traveling companies can offer packages for jet skiing in order to maximize revenue for that specific activity. The dataset of images has already been given to us. Following Figure 1.2, the next task is to take the images and run them through Microsoft ComputerVision API to extract structured quantitative and qualitative information in each picture. This provides information on facial recognition features such as gender and age, image colors, object categories, and how well these predictions are doing in terms of a score. The dataset has over 150,000 images and the big data platform can be useful in running these pictures quickly and efficiently. Since API calls are made per image and one output in JSON format is produced for each image, we end up with 150,000 individual JSON records.
  • 6. 5 Figure 1.1 While these JSON records we obtained from ComputerVision API are structured data, in order to conveniently perform analysis, the data set needs to be further transformed into a relational data structure. To achieve this takes two steps. The first step is to aggregate the individual JSON records into one single file. This is necessary due to the flexible nature of JSON format, i.e. it doesn’t require individual files to share the same number of fields. Therefore, in order to make sure fields from all the files are included, these files need to be properly aggregated. The second step is converting the single JSON data file into a simple relational structures with columns and rows. To accomplish this, we utilized a tool named Konklone. •Background Information •Hypothesis Statement •Business Insight •Big Data usefulness Hypothesis •Vision API •Python to extract raw data •Clean up the data for analysis (ETL process) Data Collection/ ETL •Data Mining Tool  IBM SPSS Modeler •Classification •Clustering •Association •Exploratory analysis Analysis •Find insight for business value •Business decisions for advertising companies Expected Conclusion/Implication
  • 7. 6 After a relational database is constructed, it is time to perform analytics. The platforms we selected are Microsoft Azure and its Machine Learning Studio component. Azure is a powerful big data platform with easy navigation and access to numerous plug-ins. Its Machine Learning Studio allows us to apply different types of analytical techniques on the data. We can easily perform descriptive analytics by slicing and dicing the data set using SQLite queries then calculate their statistics. At a little more advanced level, we can create and run data mining models such as classification, clustering and association. Here we can attempt to find different patterns, trends, and correlations which can be useful to the business insight at hand. The business implication here is to see what kind of images consumers are taking and begin advising traveling companies and/or agencies on how to better promote their advertisements to specific activities and leisure events. The goal here is to assist these companies in increasing revenue and maximizing profits so that there are no dead costs in promoting the wrong activities. Why promote parasailing at a location which isn’t suited for that type of activity as opposed to parasailing on some off shore islands which is much better with consumers taking daily pictures?
  • 8. 7 Figure 1.2 Project Domain The project domain is broken down by the ETL process - Extraction, Transformation, Load. Extraction is the challenging aspect of the process where we must connect the online vision API to Microsoft Azure. This will allow us to feed images into python so that the script can pull information from the API and give us output JSON files on each image. Python will be running a loop function to run through all the images on the directory folder and spit out thousands of files for analysis. Once we have compiled these JSON files together, we are ready to transform these files into useful data. We have a couple options here. We can go either go through Amazon Web Service’s MapReduce in order to compress the data into one big file. Once we have one big file of the unstructured data, we can run this through Microsoft Excel as a CSV file and clean it up as a proper dataset. This dataset will be our primary source of analysis once we load it into the IBM SPSS Modeler application. The load process takes us to the modeler where we can perform exploratory analysis and find key insights into the data. The key findings Data Extraction • Vision API • Py thon • JSON Data Transf ormation (Classif ication) • Clean up data f rom JSON output • AWS/Cloudera • Microsof t Excel Data Loading • Creation of CSV f iles • Load into IBM SPSS Modeler • Use of Text Mining application of SPSS Modeler Data Analy sis • Classif ication • Clustering • Association • Word count/f requency
  • 9. 8 are what will be useful to businesses promoting their vacation packages in a more efficient manner and spending resources where they find it best to maximize revenue. Analytical Methods We are looking to use three main categories of analytics: Classification, clustering, and association. Classification will allow us to understand how categories of color, objects, age, and scores are seen together as a large collection of images. Which image characteristics are more common with social media images? Is there a commonality of pictures being taken of a younger generation than the elder generation? Classification can help us understand the trend of these images. Another tool we can look into is clustering. Clustering allows us to group together characteristics of images which have more relation with each other. This can be with colors or category images, just to name a few. The association method gives us connection analysis on images of various categories. Age, gender, and color attributes can be analyzed to see which combination of characteristics are closely associated with each other. Observing images which are associated with each other should have a high confidence % in determining how closely they are related to each other. One example of this: Are there more buildings in the background vs. pictures of faces with buildings in the background? Output/Results We were interested in looking into insightful results through exploratory analysis and see if we can identify any patterns of trends throughout the dataset of image information. To start this off we built a simple model in Microsoft Azure identifying descriptive statistics from important variables most relevant to the insight. Figure 1.3 shows the different nodes connected
  • 10. 9 as we imported our dataset in the reader node and connected it to the project columns to select the columns which we were most interested in looking at it. The project columns resembles a “filter” node from other applications and we are able to concentrate on specific variables which are of value to us. Lastly the descriptive statistics node had to be connected to the project columns in order to spit out the statistical values of our categories for analysis. Figure 1.3 In order to get some deeper insights, we needed to drill down the data set by slicing and dicing it. For instance, an interesting aspect to look at is consumer composition by gender, gender association and age groups. Figure 1.4 shows us the different slices we created for our analysis. For example, we ran the following query to isolate records about images with two male faces.
  • 11. 10 Another absolutely important use for queries is to filter trustworthy data from noisy data. This is important because although computer vision is getting more accurate day by day, it is still far from being 100% accurate. Therefore, we need to take into account of filtering out data that have very low prediction accuracy or ambiguous categories that are too generic for any meaningful analysis. Take the following query for example, it removes records in which ComputerVision API produced a prediction accuracy of lower than 10% in the first object category. In addition, it takes out records that are categorized as “abstract” or “others”. Figure 1.5 shows us the detailed results of the statistics of various types. The count, median, mode, range, min, max, average statistics are displayed for numeric variables. Especially for variables such as category score and face age, where numeric values are given to these categories and we can identify certain patterns such as the average age of faces being produced from this collection of images is around the age of 30. But you can also identify face ages which vary as low as 1 and as high as 96.
  • 13. 12 Clustering analysis helps identify similar characteristics grouped together. With image clustering, one can identify the different types of images which are similar to each other through various measurements. One of them being the Euclidian distance which allows to determine how far one cluster is from another. Figure 1.6 shows how the cluster model was built through Microsoft Azure. A reader and project column node were once again inserted to filter through selected variables. We were most interested in identifying 3 columns of categories along with their main color categorization. Then we added the “train clustering model” model in order to train the variables into forming four different clusters. This process took some extra time as it had to train the model so that we can extract results from it. Running it through azure did take 5 times as fast as running through other platforms such as IBM SPSS Modeler. This was a major advantage to us from a big data perspective. Running 120,000+ images can be done in a quicker process through azure than other platforms. The K-Means clustering node was added to cluster our final results together. The metadata editor was used to name the cluster names as 1, 2, 3, 4 numeric values. Figure 1.6
  • 14. 13 The clustering results were extracted into a CSV format through the K-Means Clustering node. From here we compiled the CSV table results into clustering results shown in Figure 1.7. Four clusters formed categories and colors which were in close proximity to each other in characteristics. They were all distinctively different and shows the type of category names associated with the colors. Figure 1.7 Cluster 1: Outdoor Building Street Tree Text Grey Black Cluster 2: Food Drinks Crowd People Yellow Blue Green Black Cluster 3: Abstract Others Cluster 4: Beach Water Sky Blue White
  • 15. 14 Scope & Limitations The main limit of this project is the amount of data. With more data the scores generated would be more robust resulting in greater precision and accuracy in analyzing the images. Moreover, our data being images from just one source is another limit. While the old adage, “a picture is worth a thousand words,” may stand true, we are hoping to narrow results to find the most important elements of description for analysis of an image. Additionally, due to the lack of specificity of the ComputerVision API, classifications of the data were unable to be performed for enhanced insights. Policy/Managerial Implications Improved picture recognition and description can allow managers to bolster “social listening” on today’s media platforms to better understand context, sentiment, and the meaning behind why a picture is shared and the context of the image. With enhanced image recognition, more informed business decisions, such as product innovations, decisions and changes, marketing campaigns, and promotional offers can emulate successful targeted fields for own campaigns. Moreover, this project can help understand which elements of an image make it go viral. Greater analysis and understanding of what customers take pictures of and what they share allows a business to create a better product search for customers as well. Conclusions & Future Research Images are the new text on the web. They are easy to share and more engaging than text. The trend will continue in favor of images and we believe that analysis of images will grow tremendously in the coming years. Expanding on the importance of social listening, more insights can be drawn from interpreting pictures. Enhancements in image perspective analysis,
  • 16. 15 GPS and sentiment overlay, will improve clustering and classification in order to better predict what appeals to specific customers to increase sales. Social media company Snapchat, an ephemeral photo and video sharing app currently charges $400,000 worth of ad space for a story generating 20 million views. Meanwhile, Facebook has utilized its massive storage of photos to developed a way to recognize people in photos even if their faces are obstructed, identified individuals with 83% accuracy using a method dubbed PIPER, an acronym for pose invariant person recognition. As the quantity of images shared online increases the quality of data algorithms processing photos will bolster analysis. Why pictures were taken, understanding the important elements, what sparked the instance, and how to better react and cater to what customer desires are the driving forces on how image analytics will proceed in the future. Sources 1. http://www.fastcompany.com/3000794/rise-visual-social-media 2. http://blogs.adobe.com/digitalmarketing/social-media/visual-social-snapchat-pinterest- and-the-rise-of-media-rich-marketing/ 3. http://wersm.com/visual-web-the-next-big-thing/ 4. http://blogs.wsj.com/digits/2015/06/23/facebook-claims-photo-recognition-breakthrough/ 5. http://recode.net/2015/06/17/snapchats-making-some-pretty-serious-money-from-live- stories/ 6. https://www.esomar.org/uploads/industry/reports/global-market-research- 2014/ESOMAR-GMR2014-Preview.pdf 7. http://skift.com/2015/02/20/priceline-and-expedias-advertising-arms-race-in-2014/ 8. Mary Meeker; 2014 Internet Trends Report 9. https://www.forrester.com/Big+Datas+Big+Meaning+For+Marketing/quickscan/-/E- res114782 10. http://www.forbes.com/sites/groupthink/2015/05/01/visual-listening-social-medias-next- frontier/3/