Full Paper: Analytics: Key to go from generating big data to deriving business value

Analytics: Key to go from generating big data to
deriving business value
Deepali Arora1, Piyush Malik2,
1Dept. of Electrical and Computer Engineering, University of Victoria, P.O. Box 3055 STN CSC, Victoria, B.C
{darora}@ece.uvic.ca
2Business Analytics and Strategy, IBM Global Business Services, 4400 N 1st Street, San Jose, CA
{Piyush.Malik}@us.ibm.com
Abstract—The potential to extract actionable insights from big
data has gained increased attention of researchers in academia
as well as several industrial sectors. The field has become
interesting and problems look even more exciting to solve ever
since organizations have been trying to tame large volumes
of complex and fast arriving big data streams through newer
computing paradigms. However, extracting meaningful and ac-
tionable information from big data is a challenging and daunting
task. The ability to generate value from large volumes of data is
an art which combined with analytical skills needs to be mastered
in order to gain competitive advantage in business. The ability of
organizations to leverage the emerging technologies and integrate
big data into their enterprise architectures effectively depends
on the maturity level of the technology and business teams,
capabilities they develop as well as the strategies they adopt.
In this paper, through selected use cases, we demonstrate how
statistical analyses, machine learning algorithms, optimization
and text mining algorithms can be applied to extract meaningful
insights from the data available through social media, online
commerce, telecommunication industry, smart utility meters and
used for variety of business benefits, including improving security.
The nature of applied analytical techniques largely depends on
the underlying nature of the problem so a one-size-fits-all solution
hardly exists. Deriving information from big data is also subject
to challenges associated with data security and privacy. These and
other challenges are discussed in context of the selected problems
to illustrate the potential of big data analytics.
I. INTRODUCTION
The analysis of big data and the associated potential to ex-
tract actionable information has gained attention of researchers
in both academia and industry [1], [2], [3], [4]. Researchers
in both academia/industry have emphasized on developing
new tools and techniques for better storing, managing, and
analyzing big data [5]. However, the business community
is looking for ways to improve their profits by leveraging
information hidden in big data through analytics [6]. Mas-
sive amount of data are generated on a daily basis from
various sources including (but not limited to) online shop-
ping transactions, gas and electric meters, electronic health
records, social networking interactions, weather and satellite
data, embedded sensors in industrial machinery as well as in
automobiles and aircrafts, data center computing equipment
as well as telecommunication industry equipment. According
to International Data Corporation (IDC), cumulative digital
data is predicted to grow from 4.4 zeta-bytes (ZB) in 2013 to
44 ZB by the year 2020 [7]. Data is now considered as the
“new oil” of the economy, defined mainly by four prominent
characteristics- volume, velocity, variety and veracity [8].
While better understanding of the knowledge hidden within the
large datasets generated from various sources can potentially
help businesses, deriving useful information from these data
is a big challenge.
Before any kind of actionable insights from data can
be derived using advanced analysis techniques, several pre-
processing steps are involved. These steps include data collec-
tion, data preparation and cleansing, data storage, and manage-
ment [9]. The analysis of data can be broadly classified into
three categories based on the depth of analysis: 1) descriptive
analytics which exploits the historical trends to extract useful
information from the data, 2) predictive analytics that focuses
on predicting future probability of occurrence of pattern or
trends, and 3) prescriptive analytics which focuses on decision
making by gaining insights into the system behavior [9].
Regardless of the depth of the analysis, extracting information
from data requires a solid understanding of techniques com-
prising of statistical analysis, optimization, machine learning,
text-mining algorithms, etc.
A number of studies have highlighted the tools/algorithms
that can be used to derive solutions for various problems
associated with big data [1], [2], [3], [4], [10]. For example, [1]
and [2] presented a brief review of the challenges and issues
surrounding big data. Some of the popular tools, frameworks
and technologies that can be used to aggregate, manage
and analyze big data, includes Hadoop and its ecosystem
of techniques and tools such as Pig, Hive, Hbase, Spark,
High Performance Cluster Computing (HPCC)), in-memory
computing engines and NoSQL databases, cloud based data
service engines, etc. are still nascent and continually evolving
under the open source software movement. A brief overview
of how big data can be used to derive value for various
organizations including government, educational institutions
and industries is presented in [11]. Possibilities and challenges
in implementing big data related technologies in organizations,
including storage of the data, lack of skilled people and time
involved in processing of huge datasets are discussed in [12].
[3] and [4] presented the general overview of how big data
can be used to generate value for businesses. An in-depth
tutorial on big data analytics is presented by Hu et al. [13],
2015 IEEE First International Conference on Big Data Computing Service and Applications
978-1-4799-8128-1/15 $31.00 © 2015 IEEE
DOI 10.1109/BigDataService.2015.62
446

who assessed different techniques that can be used for data
acquisition and pre-processing, data storage and management
and different analytics techniques to derive information.
While these studies provide a good overview of the big
data opportunity, issues and challenges involved, value to
businesses, and how various techniques can be used for data
storage, processing or analytics in general, none of these
studies have discussed the application of different algorithms
to derive value for specific applications and this is the main
focus of this paper. In this paper using five different use cases,
we illustrate how big data analytics has been used in obtaining
meaningful information. The use cases considered in this
paper include sentiment analysis for social media, preventing
customer churn in the telecommunication sector, enhancing
customers’ online shopping experience, generating value from
smart utility meters and improving data security. The objective
of this paper is not to present new algorithms for any of
the selected industry use cases but rather, to provide a brief
overview and are illustrative of the existing algorithms and
methods that can be be applied to derive value from big data.
Detailed discussion of how different innovative algorithms can
be applied to realize value for each of these use cases and
challenges associated with them is beyond the scope of the
current paper and could be presented in an extended version
of this paper in the future.
This paper is organized as follows: Application of data
analytics to different domains is discussed in Section 2,
Section 3 highlights some of the challenges around big data
analytics and finally, conclusions are presented in Section 4.
II. APPLICATION OF BIG DATA ANALYTICS
A. Sentiment analysis in social networks
The explosion of data in the form of blogs, online forums
and on social media channels such as Facebook, Twitter,
Linkedin, Instagram, Pintrest, Youtube, etc has given con-
sumers a new way of expressing their opinions about any
product or service and consequently may influence other
potential buyers. Investigation of users’ opinions or sentiments
about any product or service, expressed in textual form,
on these websites/blogs is referred to as sentiment analysis
[14]. Sentiment analysis combines natural language processing
with artificial intelligence capability and text analytics to
evaluate statements found across various social platforms to
determine whether they are positive or negative with respect to
a particular brand, product or service [15]. Sentiment analysis
thus provides business intelligence which can be used to
make impactful decisions. In addition, consumers routinely
look for online reviews before buying any product or service.
Developing techniques that can better automate the process of
analyzing user generated web content about a given product
or service is now the focus of research in both academia and
industry. Several companies are also involved in designing
algorithms/tools that can perform sentiment analysis either
online for free or at nominal costs. One such example is IBM
Watson’s user modeling service that uses linguistic analytics
to generate psychographic profiles and extract cognitive and
social characteristics based on users emails, text messages,
tweets, forum posts, etc [16]. Some of the other examples of
sentiment analysis tools includes Google analytics, Tweetstats,
Social Mention, and Twendz [17].
There are three main classification levels in sentiment anal-
ysis: the document-level, the sentence-level, and the aspect-
level sentiment analysis [18] and the methodologies that can
be used to detect them are broadly classified into three main
categories, i.e., lexicon based techniques, machine learning
techniques and hybrid approaches [19]. The lexicon-based
approach relies on a collection of known and pre-compiled
sentiment terms, machine learning approaches are based on
application of different algorithms that can be trained and the
hybrid approaches are based on the combination of these two
approaches [20]. A number of studies have used lexicon based
approaches [21], machine learning based supervised [22], or
unsupervised [23], [24] approaches, and combined machine
learning and lexicon based [25], [26] approaches to classify
sentiments into positive or negative categories.
Sentiment analysis has been used by researchers in finding
people’s opinion expressed on social media sites including
Twitter about products/services launched by a company [27]
and in real world industrial application (based on second
author’s experience) in which one of IBM’s clients leveraged
sentiments from social media to identify influencers of a
public policy. The general methodology in both these use
cases involved four main steps: gathering data, generating
features, designing a classifier that can differentiate between
different sentiments i.e., positive, negative or neutral, and
finally deriving a sentiment score.
However, deriving information from the user created web
content remains a daunting task as the sentiments may carry
varying meanings in different disciplines and cultures. Thus,
to derive meaningful results, data features such as individ-
ual keywords and their frequency of occurrences; parts of
speech such as adjectives, adverbs; opinion words and phrases
including good or bad, likes, dislikes; and negations [28],
[29] should be carefully derived following feature selection
techniques [18]. Supervised machine learning approaches such
as classification algorithms can then be designed by converting
the sentiment analysis problem to a simple text classification
problem. For a standard text classification problem, the subset
of data is used to form a training record set defining different
classes. These classes are related to the underlying feature
values. The classification model can then be used to predict the
class label for any new instance. Several classification models
are discussed in the literature [18]. Some of the commonly
used classifiers include the Naive Bayes classifier, support
vector machines (SVM), maximum entropy based classifier,
decision trees, and neural networks [18]. Similarly unsuper-
vised techniques can also be used to derive users’ sentiments
about products/services [23], [24]. The power of integrating
sentiments and intelligence trends from social media was
recently hailed as the reason for IBM and Twitter to forge an
alliance to incorporate Twitter analytics into their consulting
business [30].
447

B. Preventing customer churn in telecommunication sector
The strong competition amongst telecommunication service
providers has compelled them to offer packages that could
potentially attract either more customers or at least help
them retain their existing ones. Since cost of acquiring a
new customer is relatively high compared to retaining the
existing customers [31], companies are developing new and
competitive ways to retain their customers and maintain long
term relationship with them to avoid customer churn. Churners
are the customers who leave their existing telecommunication
service provider and switch to new ones for different reasons
[32]. Customers generally switch services for lower prices
or better services. Predicting customer churn is important for
companies as it directly affects their revenues. It can also help
companies take action by offering better service or attractive
packages to prevent their existing customer from switching
to different service provider. Literature reveals [33] that on
average the telecommunication companies face around 2.2 %
of customer churn each month. Designing algorithms that can
predict and in turn prevent customer churn is important to the
telecommunication industry.
The problem of predicting churn and non-churn customers
has been addressed in number of studies [31], [32]. However,
with increasing competition, the companies are now turning to-
wards machine learning algorithms to gain early insights about
their customers’ behavior such that timely actions can be taken
to prevent customer churn. One simple approach to predict if
the user is churn or non-churn customer, is to formulate it as a
two class classifier problem using underlying feature values to
predict the outcome. Some of the possible features that can be
used to define churn and non-churn classes, includes duration
of customers calls, services subscribed, usage pattern, and
demographics [31]. A comprehensive review of the approaches
that can be followed to predict churning customer is presented
in [31], [32].
Telecommunication service providers can also use infor-
mation about customers usage pattern or services subscribed
and demographics to design and offer customized packages
to their users [34]. One possible approach is to use clustering
algorithms for customer segmentation based on the services
they use [35], [36], where clustering refers to partitioning of
data points into small number of clusters with some similarity.
This allows companies to identify customers for promotion
of the products in future, in retaining their customers and
attracting new customers by offering customized packages to
the targeted audiences based on their usage behaviors.
A real world example at Celcom, a telecommunication
service provider in Asia that is using predictive personalized
analytics to predict churn probability of its customers. They
are also offering personalized incentives and geolocation based
cross brand promotional offers and coupons and offers, thereby
increasing engagement and loyalty with its client base [37].
C. Enhancing customers’ online shopping experience
With the advancements in technology and introduction of
smartphones and tablets, online shopping has become conve-
nient, ubiquitous and so much popular that it is predicted to
grow to $370 billion in 2017 [38]. Businesses are now using
advanced analytics to predict customer behaviors and for car-
rying out customer segmentation based on the characteristics
of the customer groups [39]. While data from online clicks
on stores’ inventory does yield information about what user
is looking for, it still doesn’t provide companies the complete
information about their consumers as many of them still go
to retail malls to buy a product [40]. Retailers need to merge
both offline and online data to design algorithms for better
understanding of their customers’ behaviors and for designing
product recommendation engines for different audiences [41].
One of the approaches followed to predict customer be-
havior is the use of the transactional data. For example, [42]
developed a model using hierarchical clustering and a hidden
Markov model (HMM) to predict customer behavior based on
transactional data. [43] also used Markov model to predict the
probability of click to conversion based on the time spent by
the customer on site. [44] compared the performance of ag-
gregate (developing one model for all customers), segmented
(developing models for different segments of customers) and
1-to-1 (developing models for individual users) marketing
approaches across a broad range of experimental settings
including multiple segmentation levels, real-world marketing
datasets, dependent variables, different types of classifiers,
segmentation/clustering techniques, and different predictive
measures. Their results showed both 1-to-1 and segmentation
approaches significantly outperform the aggregate modelling
approaches. However, in the presence of little transactional
data, the segmentation models outperformed both 1-to-1 and
aggregate modelling approaches.
Once a retailer knows the underlying behavior of a con-
sumer, then based on the products that a customer selected in
the past, they can design recommender systems to assist them
in selecting similar products [45]. The underlying assumption
is that the consumers follow patterns similar to their past
spending habits and are likely to repeat it in the future. Using
different machine learning techniques such as classification,
genetic algorithms, clustering or K-nearest neighbor algo-
rithms [45], retailers can potentially identify different customer
segments and predict customers’ preference and spending
abilities. This can help retailers in better advertising of their
products to the right audiences.
The data mining techniques can also be used to market
products to consumers based on their demographics informa-
tion combined with their online activities. By combining the
information about geographic location of a user, the time of
day/week they visit store, the products they buy, and mapping
those attributes against the actual sales data it is possible to
highlight hidden interactions between online and offline sales
activity of a consumer. However, combining online and offline
information is a real challenge for retailers [46].
While online retailers like Amazon and eBay are already
using sophisticated data analytic techniques to enhance cus-
tomers’ online shopping experience, the traditional brick and
mortar retailers are also now realizing the benefits of analytics
for increased profits. The acquisition of Kosmix labs by Wal-
mart in 2011 is one such example [47]. Recently, a mid-scale
retailer, Macy’s have also leveraged big data analytics for bet-
448

ter inventory management based on customers’ segmentation
characteristics. They developed a unique Omnichannel strategy
where customers can order via different channels and pick up
their order in a store of their choice; through a central online
fulfillment center. In-store customer localization abilities using
either WiFi or beacons as underlying technologies are also
emerging that would further assist in enhancing consumers’
shopping experiences in future [48], [49].
D. Generating value from smart utility meters
With rapid deployment of smart electricity and gas meters,
especially in developed countries, the utility companies are
also leaning towards extracting and utilizing the information
generated from smart meter data for increased profits, im-
proved customer satisfaction and better resource management
[50]. A meter is called smart or intelligent due to its ability
to measure the electricity usage in real time at much smaller
time intervals than traditional meters (which keeps the record
of cumulative electricity consumption) [50]. Smart meters also
allow to remotely control the consumption of electricity and
to switch off supply when needed. To convert the data into
actionable insights, utility companies need to adapt techniques
for accurate and timely collection, transfer, storage, processing
and analyses of data. Many established companies including
IBM, SAP, Oracle, as well as startups like Autogrid are
currently assisting utility companies in designing solutions for
better understanding the hidden potential of the data generated
from smart meters [51], [52].
Several machine learning algorithms have been proposed
in the literature for better management and control of data
for utility companies [50]. [53] suggested that by grouping
customers based on usage readings following clustering tech-
niques, the utility companies can identify consumer for tar-
geted services. Knowledge of customer usage patterns can also
assist utility companies in designing better demand response
tariff plans. For example, utility companies can encourage
consumers with flexible consumption patterns to minimize
their usage during the peak hours by offering incentives [54].
Likewise, consumers with high energy usage pattern can be
penalized if they are unable to curtail their consumption by
limiting use of household appliances during the peak energy
usage hours. Machine learning algorithms such as independent
component analysis [55] and clustering techniques [56] have
also been used to identify the type of demand faced by
different consumer groups during the day [55]. Multiple linear
regression models have also been used to predict the usage of
power in households [56]. Support vector machine classifier
have been used to distinguish user groups based on their usage
patterns [57]. Customers’ load profiles can potentially assist
in identifying and detecting irregularities or abnormalities
caused either due to faulty metering or human intervention
and fraud [51]. Finally, machine learning techniques can also
be used to predict congestion or instability conditions within
a network. This information can be used by utility companies
to identify overloaded or ageing components and carry out in-
time preventive maintenance to avoid power losses and lost
revenues [58].
Real-world examples that illustrate data analysis use for
utility companies include EnerNoc and Comverge, which
are assisting utility companies by designing tools such as
demand response programs that can encourage customers
in reducing load demands during peak times, such as late
afternoon during a heat wave when the air conditioning load
stresses the grid’s capacity. In exchange for lowering power
consumption, consumers are offered rebates. Leveraging big
data technologies, AutoGrid software service also analyzes
grid usage patterns to predict power demand a day ahead thus
encouraging both utilities and consumers to participate in load-
shedding programs to prevent outages [59].
E. Improving Security
Cybercrime costs $118 billion annually and this figure
is expected to grow significantly [60]. With easy access to
information available online, sophisticated cybercrimes are
occurring at an alarming rate due to which traditional security
solutions are no longer sufficient to defend against these esca-
lating threats. Incidents of hacking, identity theft and stealing
credit card data from retailers and banks are in the news quite
regularly but recent sophisticated and organized breaches at
Sony involving an unreleased movie have shaken the world.
While a lot still needs to be done to prevent cyberterrorism,
Big data analytics in security now offers promising solutions
towards efficient detection of suspicious activities over the
network. It is expected that big data analytics will impact
various aspects of information security such as network mon-
itoring, user authentication and control, authorization, identity
management, fraud detection, data loss prevention and control
[61]. Using big data analytics to detect threats and design
security solutions, the enterprises are now able to prevent their
systems from future threats.
A number of data mining techniques to detect cyber crimes
are proposed in the literature [61]. For example, classification
models such as Naive Bayes, support vector machines, neural
networks, decision trees have long been used to detect spam
emails [62], (spamming implies sending unsolicited emails).
Support vector machine techniques have also been used to
prevent Denial of Service (DoS) attacks, where DoS attack
refers to the process of making system inaccessible to other
users [63], [64]. While [63] used Enhanced Multi Class
Support Vector Machines (EMCSVM) to predict various kinds
of DoS attacks, [64] proposed radial-basis function neural
network (RBFNN) and support vector machines (SVM), to
solve the DoS problem with an ability to detect or predict new
attacks based on the patterns similar to the attack patterns that
appeared in the past. Classification models have also been used
to detect Malware [65] and phishing URLs [66] and emails
[67].
Data mining techniques have also been used for anomaly
detection to search for unusual patterns and network behaviors
[68]. While feature selection approaches are used to prioritize
features that can assist in differentiating normal behavior from
the one affected by the presence of anomalies, classifiers are
used to differentiate between patterns [69]. These anomalies
could be present either due to internal system failure or due
449

to external attacks. In case of external attacks, identifying
the intruders that carry out these malicious activities and
identifying the types of attacks are other major issues. Machine
learning approaches can now also be used for both intruder
detection [70] and finding the types of attacks [71].
Finally, as more companies turn towards cloud computing
for storage and processing of big data, the security of cloud
becomes essential. Cloud computing is vulnerable to security
threats including insecure application and programming in-
terfaces, malicious insiders, shared technology vulnerabilities,
data leakages and account hacking [72].
A number of companies are also working on designing
solutions to protect users from cybercrime. For instance,
IBMs’ QRadar security intelligence platform is designed to
deliver the benefits of next-generation security information and
event management technology to various companies [73]. En-
terprises use QRadar solutions to collect and correlate billions
of events and network flows per day in deployments that span
multiple locations. By analyzing structured, enriched security
data alongside unstructured data from across the enterprise
using QRadar solutions, the malicious activities hidden deep
in the masses of an organization’s data can be potentially
detected.
III. CHALLENGES IN BIG DATA ANALYTICS
While big data analyses provide value to businesses there
are issues surrounding it in general that must be carefully dealt
with to exploit its full potential [31], [1]. One of the primary
concerns around big data is security and privacy. Access to
large data implies the potential to identify individuals and
also their profile on the basis of their behavior, likes, dislikes,
daily routine, etc. Thus companies must take extra precautions
to prevent the confidentiality of users’ sensitive information.
Another major challenge is data access and storage. With
huge volumes of data being generated, it is not feasible to
store it on a single machine compelling companies to rely
on the cloud for storage. Cloud computing can be used
to manage and store these large datasets but again privacy
around cloud is an open research problem. The risk of storing
sensitive information on the cloud without sufficient security
measures have been unfortunately illustrated in a number
of instances. Eliminating single point of failure by creating
multiple copies of data and storing on different nodes is also
a challenge as these nodes have to be synchronized to retrieve
data efficiently. Since data is available in different formats,
extracting them and combining in a format that can be easily
imported for analysis is another challenge. Finally, the skillset
(which is a culmination of advanced statistical techniques,
data optimization methods, machine learning algorithms and
thorough understanding of business value) required to extract
meaningful information from big data is seldom available.
While these challenges are applicable in general to all
industrial domains, there are also challenges specific to each
of the applications considered in this study, which are briefly
discussed below.
• Sentiment analysis: Sentiment analysis classifies text
into three main classes i.e., positive, negative and neutral
but given the subjectivity of text classification in reality
text can be classified into many categories [74]. There-
fore instead of simple two-class classifiers, multi-class
classifiers should be used for better results. Designing
a classifier for sentiment analysis in the presence of
limited amount of data available for training a classifier is
quite challenging [14]. Moreover, the training data used
for designing a classifier should be selected carefully as
the same word may have different meaning in different
domains based on the context [75]. Sarcastic or ironic
sentences often lead to wrong classification. Using only
words rather than sentences also has the potential to
erroneous classification. Finally, making general conclu-
sions about any product/services based on the limited
number of tweets or posts available on the web can yield
misleading results and the results must be checked for
statistical significance.
• Predicting customer churn: Cost constraints dictate that
telecommunication companies focus more on retaining
existing customers rather than acquiring new ones and
thus starts offering promotions to the existing customers
who are likely to churn. However, finding the real cause
for customer churn is not always easy because identify-
ing underlying variables that best describe a customer’s
behavioral profiles is a challenging task and may not
always yield users’ true intentions thus leading to wrong
predictions. Moreover, integration of data from miscella-
neous sources such as customer base, call center inbound
and outbound calls, billing, etc., to gather information
about a customer is not always straightforward. With
high competition available, companies are now offering
service plans suitable for different customer segments
but designing algorithms to group customers with similar
preferences based on partial information alone may not
yield feasible solutions.
• Enhancing online shopping experience: Despite its
popularity, online shopping still has to overcome certain
challenges to encourage customers. One of the main chal-
lenge in predicting customers’ behavior is merging online
data with offline transaction data as these datasets may
not be managed by a single entity. Customers’ security
and privacy concerns around using their transactional
data for predicting their spending behavior also need
to be addressed satisfactorily. Analyzing data to predict
customers’ preference of products, to promote similar
products or relevant coupons to targeted audiences, is a
challenging issue which only gets worse with time due
to users’ changing shopping preferences.
• Smart utility meters: One of the major challenge faced
by the utility companies is merging data that resides in
disparate databases among various departments of utility
companies. Credibility of data is another major challenge
that could have devastating effect on firm’s reputation.
Since the data generated by smart meters may yield
abnormalities due to the faulty behaviors caused either
by natural conditions or by human interference, thus
making decisions based on faulty data can potentially
impact utility companies’ revenues. Lack of infrastructure
450

to support data processing and analysis, generated from
smart meters, is another major challenge faced by utility
companies. Predicting customers’ profile patterns includ-
ing number of people living in a household, appliances
they use and the time of usage of different appliances
based on their electricity usage bills for promotional
offers could also raise privacy concerns for users.
• Security: Although the application of big data analytics
in improving security looks promising it has its own
challenges [76]. One of the major challenges faced by
organizations is the data leakage caused by third party
intervention. Data loss is even more vulnerable if it is
housed in the cloud. Ownership of information hosted on
cloud is another major issue faced by organizations and
trust boundaries need to be established carefully between
the data owners and the data storage owners. With large
datasets stored on cloud, proper security measures must
be taken to prevent re-identification of users based on the
information available through different datasets.
IV. CONCLUSIONS
The unprecedented growth in data in almost every sector
provides businesses a unique opportunity to use analytics
to decipher hidden insights that can be used for making
better decisions. In this paper through five different use cases,
we have illustrated how analytics can be applied to derive
value from big data for various industrial applications. The
examples considered in this study include sentiment analy-
sis for social media, preventing churn of telecommunication
customers, enhancing customers’ online shopping experience,
generating value from smart utility meters and improving
security. While a number of different techniques have been
proposed in the existing literature to derive value for these
use cases, classification and clustering models have been most
widely used for these applications. The continuing growth of
studies that attempt to derive value from big data suggest that
big data analytics can provide useful insights for businesses,
potentially also leading to increased revenues and business
advantages over competition. However, big data analytics also
faces challenges that need to be addressed, in conjunction, in
order to exploit the full potential of the hidden insights within
these large datasets.
REFERENCES
[1] A. Katal, M. Wazid, and R. Goudar, “Big data: Issues, challenges,
tools and good practices,” in Contemporary Computing (IC3), Sixth
International Conference on, Aug 2013, pp. 404–409.
[2] S. Sagiroglu and D. Sinanc, “Big data: A review,” in Collaboration
Technologies and Systems (CTS), International Conference on, May
2013, pp. 42–47.
[3] F. Muhtaroglu, S. Demir, M. Obali, and C. Girgin, “Business model
canvas perspective on big data applications,” in Big Data, IEEE Inter-
national Conference on, Oct 2013, pp. 32–37.
[4] A. Rajpurohit, “Big data for business managers; bridging the gap be-
tween potential and value,” in Big Data, IEEE International Conference
on, Oct 2013, pp. 29–31.
[5] Z. Liu, P. Yang, and L. Zhang, “A sketch of big data technologies,” in
Internet Computing for Engineering and Science, Seventh International
Conference on, Sept 2013, pp. 26–29.
[6] S. Dhar and S. Mazumdar, “Challenges and best practices for enterprise
adoption of big data technologies,” in Technology Management Confer-
ence (ITMC), 2014 IEEE International, June 2014, pp. 1–4.
[7] The digital universe of opportunities: Rich data and the increasing
value of the internet of things. [Online]. Available: http://www.emc.
com/leadership/digital-universe/2014iview/executive-summ%ary.htm
[8] P. Malik, “Governing big data: Principles and practices,” IBM Journal
of Research and Development, vol. 57, no. 3/4, pp. 1:1–1:13, May 2013.
[9] H. Hu, Y. Wen, T.-S. Chua, and X. Li, “Toward scalable systems for
big data analytics: A technology tutorial,” Access, IEEE, vol. 2, pp.
652–687, 2014.
[10] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica,
“Spark: Cluster computing with working sets,” in Proceedings of the
2nd USENIX Conference on Hot Topics in Cloud Computing, ser.
HotCloud’10, 2010, pp. 10–15.
[11] N. Y. Xin and L. Y. Ling, “How we could realize big data value,”
in Instrumentation and Measurement, Sensor Network and Automation
(IMSNA), 2013 2nd International Symposium on, Dec 2013, pp. 425–
427.
[12] J. Wielki, “Implementation of the big data concept in organizations -
possibilities, impediments and challenges,” in Computer Science and
Information Systems (FedCSIS), 2013 Federated Conference on, Sept
2013, pp. 985–989.
[13] H. Hu, Y. Wen, T.-S. Chua, and X. Li, “Toward scalable systems for
big data analytics: A technology tutorial,” Access, IEEE, vol. 2, pp.
652–687, 2014.
[14] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon-
based methods for sentiment analysis,” Comput. Linguist., vol. 37, no. 2,
pp. 267–307, 2011.
[15] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in
Proceedings of the Tenth ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, 2004, pp. 168–177.
[16] User modeling improves understanding of people’s prefer-
ences to help engage users on their own terms. [On-
line]. Available: http://www.ibm.com/smarterplanet/us/en/ibmwatson/
developercloud/user-mo%deling.html
[17] Five sentiment analysis tools that wont cost you a
cent. [Online]. Available: http://www.fieldassignment.com/2011/04/
free-sentiment-analysis-tools.ht%ml
[18] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis
algorithms and applications: A survey,” Ain Shams Engineering Journal,
2014. [Online]. Available: http://www.sciencedirect.com/science/article/
pii/S2090447914000550
[19] E. Boiy, P. Hens, K. Deschacht, and M. francine Moens, “Automatic
sentiment analysis in on-line text,” in In Proceedings of the 11th
International Conference on Electronic Publishing, 2007, pp. 349–360.
[20] D. Maynard and A. Funk, “Automatic detection of political opinions in
tweets,” in The Semantic Web: ESWC 2011 Workshops, vol. 7117, 2012,
pp. 88–99.
[21] B. Liu, Sentiment Analysis and Opinion Mining. Morgan and Claypool
Publishers, 2012.
[22] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment
classification using machine learning techniques,” in Proceedings of
the ACL-02 Conference on Empirical Methods in Natural Language
Processing, 2002, pp. 79–86.
[23] M. Usha and M. Indra Devi, “Analysis of sentiments using unsupervised
learning techniques,” in Information Communication and Embedded
Systems, International Conference on, Feb 2013, pp. 241–245.
[24] G. Li and F. Liu, “A clustering-based approach on sentiment analysis,”
in Intelligent Systems and Knowledge Engineering, International Con-
ference on, Nov 2010, pp. 331–337.
[25] L. Zhang, R. Ghosh, M. Dekhil, M. Hsu, and B. Liu. (2011) Combining
lexicon-based and learning-based methods for twitter sentiment
analysis. [Online]. Available: http://www.hpl.hp.com/techreports/2011/
HPL-2011-89.html
[26] P. P. Balage Filho, L. V. Avanço, M. d. G. V. Nunes, and T. A. S.
Pardo, “NILC USP: An improved hybrid system for sentiment analysis
in twitter messages,” in Proceedings of the 8th International Workshop
on Semantic Evaluation. Association for Computational Linguistics
and Dublin City University, 2014, pp. 428–432.
[27] M. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine
learning techniques,” in Computing, Communications and Networking
Technologies (ICCCNT),2013 Fourth International Conference on, July
2013, pp. 1–5.
[28] C. Z. Charu C. Aggarwal, Mining Text Data. Springer, 2012.
[29] Y. Mejova and P. Srinivasan, “Exploring feature definition and selection
for sentiment classifiers,” in ICWSM’11, 2011, pp. 1–6.
[30] Twitter, ibm announce a new data analytics part-
nership. [Online]. Available: http://fortune.com/2014/10/29/
twitter-ibm-data-analytics-partnership/
451

[31] N. Kamalraj and A. Malathi, “A survey on churn prediction techniques in
communication sector,” International Journal of Computer Applications,
vol. 64, no. 5, pp. 39–42, February 2013, full text available.
[32] W. Bandara, A. Perera, and D. Alahakoon, “Churn prediction method-
ologies in the telecommunications sector: A survey,” in Advances in
ICT for Emerging Regions, International Conference on, Dec 2013, pp.
172–176.
[33] C.-P. Wei and I.-T. Chiu, “Turning telecommunications call details
to churn prediction: a data mining approach,” Expert Systems with
Applications, vol. 23, no. 2, pp. 103 – 112, 2002.
[34] C. Zhao, Y. Wu, and H. Gao, “Study on knowledge acquisition of
the telecom customers’ consuming behaviour based on data mining,”
in Wireless Communications, Networking and Mobile Computing, 4th
International Conference on, Oct 2008, pp. 1–5.
[35] J. Zhao, W. Zhang, and Y. Liu, “Improved k-means cluster algorithm in
telecommunications enterprises customer segmentation,” in Information
Theory and Information Security, IEEE International Conference on,
Dec 2010, pp. 167–169.
[36] L. Ye, C. Qiu-ru, X. Hai-xu, L. Yi-jun, and Y. Zhi-min, “Telecom
customer segmentation with k-means clustering,” in Computer Science
Education, 7th International Conference on, July 2012, pp. 648–651.
[37] Celcom loyalty deals. [Online]. Available: http://www2.nst.com.my/
nation/celcom-loyalty-deals-1.558917
[38] J. Li. (2013) Study: Online shopping behavior in the
digital era. [Online]. Available: http://www.iacquire.com/blog/
study-online-shopping-behavior-in-the-digi%tal-era
[39] P. Yang, Q. lun Zheng, H. Peng, and Q. Tan, “A stepwise learning
approach to automatic discovery of interest data blocks,” in Machine
Learning and Cybernetics, 2004. Proceedings of 2004 International
Conference on, vol. 3, Aug. 2004, pp. 1441–1446.
[40] (2014) Making online shopping smarter with ad-
vanced analytics. [Online]. Available: www.cognizant.com/.../
Making-Online-Shopping-Smarter-with-Advanced-anal%ytics.pdf
[41] R. Dewan, M. Freimer, and Y. Jiang, “Using online competitor’s inven-
tory information for pricing,” in System Sciences, 40th Annual Hawaii
International Conference on, Jan 2007, pp. 210a–210a.
[42] M. Mestre and P. Vitoria, “Tracking of consumer behaviour in e-
commerce,” in Information Fusion, 16th International Conference on,
July 2013, pp. 1214–1221.
[43] M. Gupta, H. Mittal, P. Singla, and A. Bagchi, “Characterizing compar-
ison shopping behavior: A case study,” in Data Engineering Workshops
(ICDEW), 2014 IEEE 30th International Conference on, March 2014,
pp. 115–122.
[44] T. Jiang and A. Tuzhilin, “Segmenting customers from population to
individuals: Does 1-to-1 keep your customers forever?” Knowledge and
Data Engineering, IEEE Transactions on, vol. 18, no. 10, pp. 1297–
1311, Oct 2006.
[45] H.-W. Yang, Z. geng Pan, X.-Z. Wang, and B. Xu, “A personalized
products selection assistance based on e-commerce machine learning,”
in Machine Learning and Cybernetics, 2004. Proceedings of 2004
International Conference on, vol. 4, Aug. 2004, pp. 2629–2633.
[46] P. Henry and H. Luo, “Wifi: what’s next?” Communications Magazine,
IEEE, vol. 40, no. 12, pp. 66–72, Dec 2002.
[47] Wal-mart paid 300 million-plus for kos-
mix. [Online]. Available: http://allthingsd.com/20110418/
exclusive-wal-mart-paid-300-million-plus%-for-kosmix/
[48] Beacons, beacons, everywhere beacons. [Online].
Available: http://www.mediapost.com/publications/article/231059/
beacons-beacons-ev%erywhere-beacons.html
[49] Stores sniff out smartphones to follow shoppers. [On-
line]. Available: http://www.technologyreview.com/news/520811/
stores-sniff-out-smartphone%s-to-follow-shoppers/
[50] D. Alahakoon and X. Yu, “Advanced analytics for harnessing the
power of smart meter big data,” in Intelligent Energy Systems, IEEE
International Workshop on, Nov 2013, pp. 40–45.
[51] Generating big value from big data in energy and utilities.
[Online]. Available: http://www-01.ibm.com/software/data/bigdata/
industry-energy.html3
[52] Utilities and big data: Using analytics for increased customer
satisfaction. [Online]. Available: http://www.oracle.com/us/industries/
utilities/big-data-analytics-custom%er-wp-2075868.pdf
[53] S. Valero, M. Ortiz, C. Senabre, C. Alvarez, F. Franco, and A. Gabaldon,
“Methods for customer and demand response policies selection in new
electricity markets,” Generation, Transmission Distribution, IET, vol. 1,
no. 1, pp. 104–110, January 2007.
[54] A. Albert and R. Rajagopal, “Smart meter driven segmentation: What
your consumption says about you,” Power Systems, IEEE Transactions
on, vol. 28, no. 4, pp. 4019–4030, Nov 2013.
[55] H. Liao and D. Niebur, “Load profile estimation in electric transmission
networks using independent component analysis,” Power Systems, IEEE
Transactions on, vol. 18, no. 2, pp. 707–715, May 2003.
[56] C. Beckel, L. Sadamori, T. Staake, and S. Santini, “Revealing household
characteristics from smart meter data,” Energy, 2014.
[57] S. K. T. J. Nagi, K. S. Yap and S. K. Ahmed, “2ndinternational power
engineering and optimization conference,” in Power Load Forecasting
using Hybrid Self-Organizing Maps and Support Vector Machines, June
2008.
[58] F. Zhao, G. Wang, C. Deng, and Y. Zhao, “A real-time intelligent
abnormity diagnosis platform in electric power system,” in Advanced
Communication Technology (ICACT), 2014 16th International Confer-
ence on, Feb 2014, pp. 83–87.
[59] M. LaMonica. Bringing big data to smart meters.
[Online]. Available: http://www.technologyreview.com/view/506476/
bringing-big-data-to-smart-%meters/
[60] Cyber security analytics. [Online]. Available: http://www.teradata.com/
Cyber-Security-Analytics/
[61] T. Mahmood and U. Afzal, “Security analytics: Big data analytics for
cybersecurity: A review of trends, techniques and tools,” in Information
Assurance (NCIA), 2013 2nd National Conference on, Dec 2013, pp.
129–134.
[62] P. Panigrahi, “A comparative study of supervised machine learning
techniques for spam e-mail filtering,” in Computational Intelligence
and Communication Networks, Fourth International Conference on, Nov
2012, pp. 506–512.
[63] T. Subbulakshmi, S. Shalinie, V. GanapathiSubramanian, K. BalaKrish-
nan, D. AnandKumar, and K. Kannathal, “Detection of ddos attacks
using enhanced support vector machines with real time generated
dataset,” in Advanced Computing (ICoAC), 2011 Third International
Conference on, Dec 2011, pp. 17–22.
[64] G. Tsang, P. Chan, D. Yeung, and E. Tsang, “Denial of service detection
by support vector machines and radial-basis function neural network,” in
Machine Learning and Cybernetics, Proceedings of 2004 International
Conference on, vol. 7, Aug 2004, pp. 4263–4268.
[65] M. Mas’ud, S. Sahib, M. Abdollah, S. Selamat, and R. Yusof, “Analysis
of features selection and machine learning classifier in android malware
detection,” in Information Science and Applications, International Con-
ference on, May 2014, pp. 1–5.
[66] J. James, L. Sandhya, and C. Thomas, “Detection of phishing urls
using machine learning techniques,” in Control Communication and
Computing, International Conference on, Dec 2013, pp. 304–309.
[67] A. Almomani, B. Gupta, S. Atawneh, A. Meulenberg, and E. Almomani,
“A survey of phishing email filtering techniques,” Communications
Surveys Tutorials, IEEE, vol. 15, no. 4, pp. 2070–2090, Fourth 2013.
[68] B. Thuraisingham, “Data mining for security applications,” in Machine
Learning and Applications, 2004. Proceedings. 2004 International Con-
ference on, Dec 2004, pp. 3–4.
[69] A. Aziz, A. Hassanien, S.-O. Hanaf, and M. Tolba, “Multi-layer hybrid
machine learning techniques for anomalies detection and classification
approach,” in Hybrid Intelligent Systems (HIS), 2013 13th International
Conference on, Dec 2013, pp. 215–220.
[70] L. Khan, M. Awad, and B. Thuraisingham, “A new intrusion detection
system using support vector machines and hierarchical clustering,” The
VLDB Journal, vol. 16, no. 4, pp. 507–521, Oct. 2007.
[71] T. Subbulakshmi, S. Shalinie, V. GanapathiSubramanian, K. BalaKrish-
nan, D. AnandKumar, and K. Kannathal, “Detection of ddos attacks
using enhanced support vector machines with real time generated
dataset,” in Advanced Computing, Third International Conference on,
Dec 2011, pp. 17–22.
[72] M. Khorshed, A. Ali, and S. Wasimi, “Trust issues that create threats for
cyber attacks in cloud computing,” in Parallel and Distributed Systems,
IEEE 17th International Conference on, Dec 2011, pp. 900–905.
[73] Ibm security intelligence with big data. [Online]. Available: http:
//www-03.ibm.com/security/solution/intelligence-big-data/
[74] J. T. Mr. Saifee Vohra, “Applications and challenges for sentiment
analysis : A survey,” International Journal of Engineering Research and
Technology, vol. 2, 2013.
[75] H. R. P, “Opinion mining and sentiment analysis - challenges and
applications,” International Journal of Application or Innovation in
Engineering and Management (IJAIEM), vol. 3, 2014.
[76] A. A. Cardenas, P. K. Manadhata, and S. P. Rajan, “Big data analytics
for security,” IEEE Security and Privacy, vol. 11, no. 6, pp. 74–76, 2013.
452

Full Paper: Analytics: Key to go from generating big data to deriving business value

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Full Paper: Analytics: Key to go from generating big data to deriving business value

Similar to Full Paper: Analytics: Key to go from generating big data to deriving business value (20)

Recently uploaded

Recently uploaded (20)

Full Paper: Analytics: Key to go from generating big data to deriving business value