1. UNIT III
BIG DATA ANALYTICS
What is Data?
The numbers, characters, or symbols on which operations are performed by a computer
which may be stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
What is Big Data?
Big Data is a collection of data that is huge in volume, yet growing exponentially with time.
It is a data with so large size and complexity that none of traditional data management tools
can store it or process it efficiently. Big data is also a data but with huge size.
Big Data means a data that is huge in size. Big data is a term used to describe a collection of
data that is huge in size and yet growing exponentially with time.
Big Data: Today nearly all data are captured digitally. As a result, the data have been growing
at an overwhelming rate, being measured by Terabyte (1012
bytes), Petabytes (1015
bytes) and even by higher dimensional terms.
Bit = 0 or 1,
1 byte =8 bits
1024 bytes = 1 Kilobyte (KB)
1024 Kilobyte = Megabyte (MB)
1024 Megabyte = Gigabyte (GB)
1024 gigabyte = Terabyte (TB) – 1024 GB
1024 Terabyte = Petabyte (PB) – 10,48,576 GB
1024 Petabyte = Exabyte (EB) – 1,07,37,41,824 GB
1024 Exabyte = Zettabyte (ZB) – 1099511627776 GB
1024 Zettabyte = Yottabyte (YB) – 1125899910000000 GB
Is there anything bigger than yottabyte?
Just think of the amount of data stored on FB, Twitter or Amazon server. Walmart has over 1
million transaction each hour, yielding more than 2.5 petabytes (25 lakh GB) of data.
Analytics professionals have coined the term big data to refer to massive amounts of business
data from a wide variety of sources, much of which is available in real time, and much of
which is uncertain or unpredictable. These characteristics of the data usually known as the
volume, variety, velocity and veracity of data. Big data provides an opportunity for
2. organisations to gain a competitive advantage - If data can be understood and analysed
effectively can make better business decisions. A study by McKinsey Global Institute noted
that the effective use of big data has the potential to transform economies, delivering a new
wave of productivity growth and consumer surplus. Using big data will become a key basis of
competition for existing companies, and will create new competitors who are able to attract
employees that have the critical skills for a big data world.
Following are some of the Big Data examples-
1. The New York Stock Exchange is an example of Big Data that generates about one
terabyte of new trade data per day.
2. Social Media: The statistic shows that 500+terabytes of new data get ingested into
the databases of social media site Facebook, every day. This data is mainly generated
in terms of photo and video uploads, message exchanges, putting comments etc.
3. A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time.
With many thousand flights per day, generation of data reaches up to many Petabytes.
Characteristics of Big Data
Big data can be described by the following characteristics:
(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data
plays a very crucial role in determining value out of data. Also, whether a particular data can
actually be considered as a Big Data or not, is dependent upon the volume of data.
Hence, „Volume‟ is one characteristic which needs to be considered while dealing with Big
(ii) Variety – The next aspect of Big Data is its variety.
Variety refers to heterogeneous sources and the nature of data, both structured and
unstructured. During earlier days, spreadsheets and databases were the only sources of data
considered by most of the applications. Nowadays, data in the form of emails, photos, videos,
monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications.
This variety of unstructured data poses certain issues for storage, mining and analyzing data.
(iii) Velocity – The term „velocity‟ refers to the speed of generation of data. How fast the data
is generated and processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc.
The flow of data is massive and continuous.
3. (iv) Variability – This refers to the inconsistency which can be shown by the data at times,
thus hampering the process of being able to handle and manage the data effectively.
Advantages of Big Data Processing
1. Data accumulation from multiple sources, including the Internet, social media
platforms, online shopping sites, company databases, external third-party sources, etc.
2. Real-time forecasting and monitoring of business as well as the market.
3. Identify crucial points hidden within large datasets to influence business decisions.
4. Promptly mitigate risks by optimizing complex decisions for unforeseen events and
5. Identify issues in systems and business processes in real-time.
6. Unlock the true potential of data-driven marketing.
7. Dig in customer data to create tailor-made products, services, offers, discounts, etc.
8. Facilitate speedy delivery of products/services that meet and exceed client
9. Diversify revenue streams to boost company profits and ROI.
10. Respond to customer requests, grievances, and queries in real-time.
11. Foster innovation of new business strategies, products, and services.
Types of Big Data
Following are the types of Big Data:
Any data that can be stored, accessed and processed in the form of fixed format is termed as a
‗structured‘ data. Over the period of time, talent in computer science has achieved greater
success in developing techniques for working with such kind of data (where the format is
well known in advance) and also deriving value out of it. However, nowadays, we are
foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in
the rage of multiple zettabytes.
Examples of Structured Data
An ‗Employee‘ table in a database is an example of Structured Data
Employee_ID Employee_Name Gender Department Salary_In_lacs
2365 Rajesh Kulkarni Male Finance 650000
3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Das Male Finance 500000
7699 Priya Sane Female Finance 550000
Any data with unknown form or the structure is classified as unstructured data. In addition to
the size being huge, un-structured data poses multiple challenges in terms of its processing
for deriving value out of it. A typical example of unstructured data is a heterogeneous data
source containing a combination of simple text files, images, videos etc. Now day
organizations have wealth of data available with them but unfortunately, they don‘t know
how to derive value out of it since this data is in its raw form or unstructured format.
Examples of Un-structured Data
The output returned by ‗Google Search‘
5. Semi-structured data can contain both the forms of data. We can see semi-structured data as a
structured in form but it is actually not defined with e.g. a table definition in
relational DBMS. Example of semi-structured data is a data represented in an XML file. E-
Characteristics of Semi-structured data
1. Data does not conform to a data model but has some structure.
2. Data can not be stored in the form of rows and columns as in Databases.
3. Semi-structured data contains tags and elements (Metadata) which is used to group data and
describe how the data is stored.
Data Growth over the years
Fig:Data Growth over the years
Please note that web application data, which is unstructured, consists of log files, transaction
history files etc. OLTP systems are built to work with structured data wherein data is stored
in relations (tables).
The rapid development of technology has changed how the world works. User behaviour and
media consumption has taken an entirely different route, thanks to the internet. This change
has forced marketers to follow and adapt to the changing marketing channels.
Even advertising isn‘t limited to traditional channels anymore. It‘s now the era of digital
advertising. Just like traditional advertising, digital advertising comes in numerous shapes,
sizes, and forms. The basic principle behind digital ads is the same as that of conventional
6. ads. Still, these ads have their own set of importance, types, advantages, disadvantages, and
even operations that differ from traditional ads.
Digital advertising is the communication made by a company to advertise and promote its
brand, product, or service using various platforms and digital channels. It consists of actions
in web browsers, social media pages, blogs, apps, or any other form of contact through the
Internet. With the digital transformation, more and more options arise for companies to
communicate with the market and, of course, its audience. This way, everything that is done
using digital platforms and resources can be considered digital advertising. The main goal is
to be present precisely where the public is.
Digital advertising is a process of migration and adaptation of companies about the
movement that society as a whole has undergone. After all, if 4.18 billion people are using
their smartphones to connect to the Internet, their business also needs to be present in this
channel, strengthening its relevance and brand awareness. Digital advertising is the action of
calling public attention to an offering through online and digital paid channels by an identified
Precisely, digital advertising is any form of advertising that appears online or on digital
channels like websites, search engines, social media platforms, mobile apps, digital OOH,
and other channels that can be accessed digitally.
Digital advertising is a branch of digital marketing and deals only with the promotional mix,
and not with other Ps of the digital marketing mix.
Characteristics of Digital Advertising
Digital advertisements come with their own unique set of characteristics. These are:
Paid form: Digital advertising, just like other forms of advertising requires the
advertiser (also called the sponsor) to pay to create the advertising message and
creative, buy advertising space or slot, and monitor advertising efforts.
Measurable: Digital ads are highly measurable in terms of how many people view
them and how many people interact with them. Often, advertisers are even able to
calculate accurate ROI of these ads.
Goal-oriented: These ads are always backed by goals – to promote, sell, increase
Data backed: Digital ads are backed by data of what they are about, whom they are
targeted at, and how the target audience interacts with them. Data forms the backbone
of digital ads.
Personal or non-personal: Ads on digital media can be highly personalised based on
user activity over the internet or non-personal with a motive to
enhance brand awareness.
7. Evolution of Advertising
The internet has changed the landscape of advertising. At the beginning of 90s, the
investment in the digital advertisement was zero. But over the years, in 2025, digital
advertising is expected to cross over $500 billion worldwide.
It all started in 1994 when hotwired.com, a Wired website, released the first banner ads on its
website. The same year saw the development of HTTP cookie that helped the advertiser and
publisher track user behaviour.
Soon after, in 1996, Flash was introduced, which formed the framework for web advertising.
In 1997, pop-up ads were discovered and found wide usage all over the internet.
In 1998, Google launched and brought a concept of minimal search engines.
The company started monetising in 2000 and paved the way for pay per click advertising and
search engine marketing.
Mobile advertising also started in the year 2000.
In the early 2000s, Facebook launched and brought in its hyper-targeted ads model that used
user interests and demographics for advertising. This started the trend of targeted ads.
8. The late 2000s and early 2010s saw the launch of YouTube, Smartphones, iPad, Instagram,
Snapchat, Internet of Things, and Real-Time Bidding that later changed the advertising
In-app and mobile
The digital advertising agency is still growing, and everyday finds its application in a new
Types of Digital Advertising
The digital media landscape is vast. It caters to every activity of the user on a digital device
and the web. Digital advertising makes sure to tap the user‘s attention on every touchpoint it
can. Thus, it can be divided into different types depending upon the type of ad, channel, and
the ad‘s intent.
Also called search engine marketing, this type of advertising uses search engine results to
promote the offering. Advertisers target keywords that people search for on a search engine
and push their webpages at the top of the results by paying such search engines like Google,
Such ad campaigns are intent oriented and are also called pay per click or PPC campaigns as
they‘re paid for by the advertiser only when a user clicks on the result.
These ads are further divided into search ads, shopping ads, maps ads, etc. depending upon
the search engine they‘re listed on and the intent behind the target user‘s search.
9. Display advertising is the most common form of digital advertising. It comprises images,
text, and animation, and shows up as banners on websites and blogs.
These ads can be personalised according to the user activity on the internet or non-
personalised and are usually released to increase brand exposure, offer exposure, and fulfil
other such motives.
Usually, display ads are further categorised as:
Traditional Display Ads: These ads have fixed sizes and occupy a fixed space
irrespective of the device the website is loaded on.
Responsive Display Ads: These ads adapt to the size of the screen a website is
Native ads are camouflaged ads that blend with the content they are added to. Sponsored
listings within or after blog posts form an example of native ads. These ads result in better
user interaction when compared to other forms of digital ads because of their property of
matching the structure and function of the platform it appears on.
10. Video ads are digital ads used to promote an offering using videos or motion graphics that
play before, during or after streaming content, or as a standalone banner or native ads.
Used mostly on video streaming platforms like YouTube, Facebook Watch, etc. video ads are
also used on websites and blogs as out-stream ads to gain website users‘ attention.
Audio ads are used majorly on audio streaming platforms like Spotify, LiveXLive, Pandora,
etc. The advertiser gets into a contract of the streaming platform or the content creator to add
the brand‘s ads within, before, or after the content.
Mobile ads are digital ads that are delivered on mobile devices. These ads use two different
Mobile Web: These are the websites, blogs, and webpages arranged to fit the mobile
In-App: Mobile applications are specialised applications developed to target
specialised moments. These apps are easy to download, navigate, and involve a higher
retention rate than other channels.
Remarketing ads are digital ads targeted to a brand‘s online visitors with a motive to bring
them back on the website or application, or perform an action. For example, a brand targeting
its ecommerce store‘s visitor through ads to make him complete his/her incomplete
transaction. Another example could be of an event brand that targets its website visitors with
informative ads of new events that they could attend.
Social Media Ads
Social media ads appear on social media platforms like Facebook, Instagram, Snapchat,
Reddit, LinkedIn, etc. These ads can be display ads, native ads, video ads, audio ads,
remarketing ads or mobile ads.
Social ads are hyper-targeted ads that target the users depending upon their demographics,
locations, interests, and even psychographic and behavioural interests.
Influencer And Curator Ads
Influencer ads and curator ads are comparatively new ad forms where a brand directly
contacts the content developers and/or content curators with good followership to place the
brand or offering in their content. This helps the brand to gain exposure and trust of the
How Does Digital Advertising Work?
11. Even though digital advertising is different from traditional advertising, it works in a way
similar to the latter. There are parties involved, a contract backs the transaction, and the
creative and the ad copy is developed to meet the advertising goals.
A typical digital advertisement involves three different parties:
Advertiser: It‘s the brand that creates and funds the advertisement. For example,
Nike with its campaign ‗Just Do It‘.
Advertising Network: It‘s the middleman that connects the advertiser with the
publishers and the advertisement space providers. For example, Google runs
AdWords for advertisers and AdSense for publishers.
Publisher: A publisher is anyone who owns a digital property and is willing to
monetise that property by selling ad spaces. For example, Feedough.com.
Generally, an advertising network acts as a mediator that connects an advertiser with
numerous publishers or digital property owners.
However, in cases of big players like with social media advertisements, the publisher, like
Facebook, LinkedIn, etc., becomes the advertising network itself.
Digital Advertising Vs Traditional Advertising
Traditional advertising is offline advertising. It involves using channels like magazine,
newspaper, television, radio, direct mail and billboards to advertise an offering or an idea.
These ads differ considerably from digital advertising.
Digital Advertising Traditional Advertising
Digital advertising is the act of calling
public attention to an offering or an
idea through online and digital paid
channels by an identified sponsor.
Traditional Advertising is the act of
calling public attention to an offering
or an idea through offline paid
channels by an identified sponsor.
Online and digital channels like
website, search engines, social media
Offline channels like radio,
television, newspaper, etc.
Digital ads development, deployment,
and measurement are backed by data.
It isn‘t possible to get accurate data to
back all traditional advertisements.
12. Digital Advertising Traditional Advertising
Digital ads include both single sided
communication and two-sided
communication. The user gets to
interact with the advertisement.
There‘s no way for a user to interact
with a traditional advertisement.
Pros And Cons Of Digital Advertising
Digital advertising comes with its own set of advantages and disadvantages that sets it apart
from traditional advertising.
User-Targeted: Digital advertising is highly user-targeted and can even
be microtargeted to the target audience‘s smallest section.
Inexpensive: Considering the ROI of digital ads, it is considered an inexpensive form
Data-Backed: Digital ads are data-backed. This data can be used to develop
campaigns that were not possible before.
Interactive: Digital ads can be made interactive, increasing engagement and proving
beneficial for both the advertiser and users.
Real-Time: Changes in digital ads can be made real-time. Even the analytics and data
can be collected in real-time. This proves to be a great advantage.
Global Coverage: It‘s easy to launch digital ads to a worldwide audience without
even going to such places.
Wide Range Of Formats: Digital ads come in numerous shapes and sizes, and the
scope is still untapped.
Limited Audience: Only 59 percent of the global population uses the internet. The
rest don‘t have access to it yet. So, if the brand‘s target audience doesn‘t have internet
access, digital advertising could be of no use.
Competition: Many advertisers bid for one ad space that increases the competition
and prices of ads.
Increasing Ad-Blockers: Ads are everywhere. This bugs digital users who look for
ways to block such ads.
Requires A Specialised Skillset: Running digital ads requires a specialised skill set
to develop and optimise ads and bidding for the same.
During the last few decades, with the rise of Youtube, Amazon, Netflix and many other such
web services, recommender systems have taken more and more place in our lives. From e-
13. commerce (suggest to buyers articles that could interest them) to online advertisement
(suggest to users the right contents, matching their preferences), recommender systems are
today unavoidable in our daily online journeys. In a very general way, recommender systems
are algorithms aimed at suggesting relevant items to users (items being movies to watch, text
to read, products to buy or anything else depending on industries).
Recommender systems are really critical in some industries as they can generate a huge
amount of income when they are efficient or also be a way to stand out significantly from
competitors. As a proof of the importance of recommender systems, we can mention that, a
few years ago, Netflix organised a challenge (the ―Netflix prize‖) where the goal was to
produce a recommender system that performs better than its own algorithm with a prize of 1
million dollars to win.
Data required for recommender systems stems from explicit user ratings after watching a
movie or listening to a song, from implicit search engine queries and purchase histories, or
from other knowledge about the users/items themselves. Sites like Spotify, YouTube or
Netflix use that data in order to suggest playlists, so-called Daily mixes, or to make video
What is Recommender System?
A recommender system or a recommendation system is a subclass of information
filtering system that seeks to predict the ―rating‖ or ―preference‖ that a user would give to an
A software system that provides a single target user within a single context with personalized
recommendations of items such as goods, services or information to guide the target user to
find most relevant items using ratings on a single relevance criterion (i.e., overall) and where
both users and items are in a single domain.
A recommender system is a computer program that uses its recommendations to help users
make informed buying decisions based on their preference, browsing history and their buying
pattern. Online product recommendation (OPR) is a strategy that enables products to be
dynamically populated with customer data such as browsing history and context. This
strategy provides a personalized shopping experience.
A system that aids in decision making by providing users with suggestions. These
suggestions are developed based on past information or domain knowledge. A
computerized systems that suggest goods and service by predicting user‘s preference and
Methods or Categories of Recommender system
14. The purpose of a recommender system is to suggest relevant items to users. To achieve this
task, there exist two major categories of methods : collaborative filtering methods and content
based methods. Before digging more into details of particular algorithms, let‘s discuss briefly
these two main paradigms.
1. Collaborative filtering methods
Collaborative methods for recommender systems are methods that are based solely on the past
interactions recorded between users and items in order to produce new recommendations.
These interactions are stored in the so-called ―user-item interactions matrix‖. The main idea
that rules collaborative methods is that these past user-item interactions are sufficient to detect
similar users and/or similar items and make predictions based on these estimated proximities.
The class of collaborative filtering algorithms is divided into two sub-categories that are
a. Memory based approach
b. Model based approach
Memory based approaches directly works with values of recorded interactions, assuming no
model, and are essentially based on nearest neighbours search (for example, find the closest
users from a user of interest and suggest the most popular items among these neighbours).
Model based approaches assume an underlying ―generative‖ model that explains the user-item
interactions and try to discover it in order to make new predictions.
The main advantage of collaborative approaches is that they require no information about
users or items and, so, they can be used in many situations. Moreover, the more users interact
with items the more new recommendations become accurate: for a fixed set of users and items,
new interactions recorded over time bring new information and make the system more and
15. However, as it only consider past interactions to make recommendations, collaborative
filtering suffer from the ―cold start problem‖: it is impossible to recommend anything to new
users or to recommend a new item to any users and many users or items have too few
interactions to be efficiently handled. This drawback can be addressed in different way:
recommending random items to new users or new items to random users (random strategy),
recommending popular items to new users or new items to most active users (maximum
expectation strategy), recommending a set of various items to new users or a new item to a set
of various users (exploratory strategy) or, finally, using a non collaborative method for the
early life of the user or the item.
2. Content Based methods
Unlike collaborative methods that only rely on the user-item interactions, content based
approaches use additional information about users and/or items. If we consider the example of
a movies recommender system, this additional information can be, for example, the age, the
sex, the job or any other personal information for users as well as the category, the main
actors, the duration or other characteristics for the movies (items).
Then, the idea of content based methods is to try to build a model, based on the available
―features‖, that explain the observed user-item interactions. Still considering users and movies,
we will try, for example, to model the fact that young women tend to rate better some movies,
that young men tend to rate better some other movies and so on. If we manage to get such
model, then, making new predictions for a user is pretty easy: we just need to look at the
profile (age, sex, …) of this user and, based on this information, to determine relevant movies
Content based methods suffer far less from the cold start problem than collaborative
approaches: new users or items can be described by their characteristics (content) and so
relevant suggestions can be done for these new entities. Only new users or items with
16. previously unseen features will logically suffer from this drawback, but once the system old
enough, this has few to no chance to happen.
Later in this post, we will further discuss content based approaches and see that, depending on
our problem, various classification or regression models can be used, ranging from very
simple to much more complex models.
Why do we need recommender systems?
Companies using recommender systems focus on increasing sales as a result of very
personalized offers and an enhanced customer experience.
Recommendations typically speed up searches and make it easier for users to access
content they‘re interested in, and surprise them with offers they would have never
What is more, companies are able to gain and retain customers by sending out emails with
links to new offers that meet the recipients' interests, or suggestions of films and TV shows
that suit their profiles.
The user starts to feel known and understood and is more likely to buy additional products or
consume more content. By knowing what a user wants, the company gains competitive
advantage and the threat of losing a customer to a competitor decreases.
Providing that added value to users by including recommendations in systems and products is
appealing. Furthermore, it allows companies to position ahead of their competitors and
eventually increase their earnings.
17. How does a recommender system work?
Recommender systems function with two kinds of information:
Characteristic information. This is information about items (keywords, categories, etc.) and
users (preferences, profiles, etc.).
User-item interactions. This is information such as ratings, number of purchases, likes, etc.
Based on this, we can distinguish between three algorithms used in recommender systems:
Content-based systems, which use characteristic information.
Collaborative filtering systems, which are based on user-item interactions.
Hybrid systems, which combine both types of information with the aim of avoiding problems
that are generated when working with just one kind.
Next, we will dig a little deeper into content-based and collaborative filtering systems and see
how they are different
Customer analytics, also called customer data analytics, is the systematic examination of a
company's customer information and customer behaviour to identify, attract and retain the
most profitable customers.
Customer analytics refers to the processes and technologies that give organizations the
customer insight necessary to deliver offers that are anticipated, relevant and timely.
The importance of customer analytics
Customers have access to information anywhere, anytime including where to shop, what to
buy, how much to pay and so on. This makes it increasingly important to utilize predictive
analytics and data to forecast how customers will behave when interacting with brands.
The goal of customer analytics is to create an accurate view of a customer to make decisions
about how best to acquire and retain customers, identify high-value customers and
18. proactively interact with them. The better the understanding of a customer's buying habits
and lifestyle preferences, the more accurate predictive behaviours become and the better
the customer journey becomes. Without large amounts of accurate data, any insight derived
from analysis could be wildly inaccurate.
How to use customer analytics
Customer analytics is often managed by an interdisciplinary group made up of business
owners from different departments within the company, including marketing, sales, customer
service, IT and business analysts.
To be effective and obtain the most meaningful insights, the group must first agree upon
which business metrics they need to achieve a single view of the customer experience.
Multiple instances of customer relationship management (CRM) applications, disparate
enterprise resource planning (ERP) systems and poor customer data integration (CDI) can
leave group members with a fragmented view of the customer.
19. Customer analytics best practices
By measuring and analyzing data using specific metrics, organizations can create successful
customer interactions. Some customer analytics best practices and common metrics that can
help drive better business decisions include:
Targeting customers across all channels and analyzing the various ways a product or
service can be distributed.
Assessing and understanding customers in relation to the brand and whether a customer is
satisfied. This can be achieved through a combination of quantitative and qualitative
Engaging with customers at the right moment through the right channel.
Predicting churn rate and taking actions to extend a customer's lifetime value.
Spotting trends in big data and analyzing online behaviour to increase sales.
Maximizing the customer journey through personalized selling and market segmentation
by assessing which customers might buy one type of product versus another.
Customer analytics tools
Customer analytics tools are specialized apps used to gain insight into the customer
experience, understand customer behaviour and to help tailor marketing campaigns to
specific customer segments.
These customer data analysis tools can be part of a CRM suite or sold as stand-alone
platforms which do everything from collect customer data from different systems in different
locations (data integration) to data analysis and visualization. These tools also connect to
popular sales and marketing applications along with web content management systems,
email, social platforms and customer loyalty programs.
There are a number of customer analytics tools to choose from, provided by major CRM
vendors and niche software providers. Tools from major vendors in this space include:
20. Adobe Analytics
Google Analytics 360
IBM Watson Customer Experience Analytics
SAP Hybris Marketing Cloud
SAS Customer Intelligence 360
Some of the tools integrate features such as user segmentation with systems which
personalize websites and that build niche marketing campaigns. As more customer analytics
tools emerge, major software providers will likely improve usability further so their tools
appeal to a wide range of users, and they'll add integration and new services. In addition,
advanced features will be built into connected systems, including omnichannel content
Application of Customer Analytics
Although until recently over 90% of retailers had limited visibility on their customers,
with increasing investments in loyalty programs, customer tracking solutions and
market research, this industry started increasing use of customer analytics in decisions
ranging from product, promotion, price and distribution management. The most
obvious use of customer analytics in retail today is the development of personalized
communications and offers and/or different marketing programs by
segment. Additional reasons set forth by Bain & Co. include: prioritizing product
development efforts, designing distribution strategies and determining product
pricing. Demographic, lifestyle, preference, loyalty data, behaviour, shopper value
and predictive behavior data points are key to the success of customer analytics.
Companies can use data about customers to restructure retail management. This
restructuring using data often occurs in dynamic scheduling and worker evaluations.
Through dynamic scheduling, companies optimize staffing through predictive scheduling
software based on predictive customer traffic. Worker schedules can be adjusted in
response to updated forecasts at short notice. Customer analytics allows retail companies
21. to evaluate workers by comparing daily sales to daily traffic in a store. The use of
customer analytics data affecting the management of retail workers in a phenomenon
known as refractive surveillance. The model of refractive surveillance describes how the
collection of information on one group can affect and allow for the control of an entirely
Criticisms of Use As retail technologies become more data driven, use of customer
analytics use has raised criticisms specifically in how they affect the retail worker. Data
driven staffing algorithms can lead to irregular working schedules because they can
change on short notice to adapt to predicted traffic. Data driven assessment of sales can
also be misleading as daily traffic counters do not accurately distinguish between
customers and staff and cannot accurately account for workers‘ breaks.
Banks, insurance companies and pension funds make use of customer analytics in
understanding customer lifetime value, identifying below-zero customers which are
estimated to be around 30% of customer base, increasing cross-sales,
managing customer attrition as well as migrating customers to lower cost channels in
a targeted manner.
Municipalities utilize customer analytics in an effort to lure retailers to their cities.
Using psychographic variables, communities can be segmented based on attributes
like personality, values, interests, and lifestyle. Using this information, communities
can approach retailers that match their community‘s profile.
Customer relationship management
Analytical Customer Relationship Management, commonly abbreviated as CRM,
enables measurement of and prediction from customer data to provide a 360° view of
operational analytics is associated with the data that businesses needed to improve their
existing operations. Today companies are beginning to view it as a strategic priority; in fact,
more than 80% of their survey respondents agree that operations analytics ―plays a pivotal
role in driving profits or creating competitive advantage.‖
22. One of the industries taking advantage of the benefits of operational analytics is
manufacturers. Manufacturers especially benefit from focusing on operations
because the insights gleaned can help reduce downtime, improve productivity,
improve forecasting accuracy, maximize capacity, and increase flexibility in
response to external events. But, manufacturing is not the only industry realizing
substantial benefits. Studies show that analytics initiatives aimed at operational
improvements across industries jumped from 26% in 2013 to 70% in 2016.
Benefits of Operational Analytics
1. Increased profits
Most businesses today have the goal of reducing costs. With the help of operational analytics,
you can identify areas that need streamlining, helping you to save more money, be more
efficient, thus resulting in better profits. A whitepaper by Capgemini found that improvement
in operations using data can help raise profits up to $117 billion worldwide yearly. That‘s a
steep increase compared to customer analytics which only drives about $38 billion in profits.
The improvement in your bottom line means you are able to scale your business, no matter
what size it may be.
2. Better decision making
Instead of relying on high-end consultancy firms to make big decisions for your organization,
why not let the data do all the talking for you? Most companies would prefer to do it this way
but if you‘re a smart businessperson, you know that operational analytics is just as good of an
option, if not more cost-effective. Data comes through faster allowing you to make the most
important businesses decisions quickly. With your business acting on problems quickly, your
bottom line doesn‘t suffer from inefficiencies.
3. Competitive advantage
With the help of cognitive computing, companies are able to understand what all their data
means and carry out more efficient processes. This gives you a better advantage among your
competitors. While they are focused on analyzing customer data, you‘re here looking at
operational data so you can save money and reinvest it in more profitable pursuits.
23. If you‘re not on it yet, now is the time do so otherwise you‘ll get left behind. In one survey by
Capgemini Consulting, they found that 70% companies have started focusing on operation
processes instead of consumer processes.
4. Customer satisfaction
Although the point above sounds completely counterintuitive, operational analysis can
actually help increase customer satisfaction. Sometimes, it takes weeks or even months
before an organization can figure out exactly what is causing the drop in customer
satisfaction. There‘s simply too many factors to consider. With operational analytics, you can
get to the root cause of a performance issue and fix it right away.
5. Streamlining data
Place all your security, business, and IT data all in one place where you can manage it with
efficiency and easily spot system problems in real-time. Not to mention, you can easily create
backup and recovery for all data to ensure its safety despite natural disasters. And with the
ability to share it with your employees, you can gain relevant insights from them too. At this
point, they are not merely employees, but also stakeholders who are empowered to help the
6. Holistic operations
Operations analytics allows you to get a holistic view of your data, where you discover
certain networks that are actually interrelated with each other. When you realize that certain
types of data are dependent on a specific environment, you are able to perform a more
efficient root-cause analysis should a problem arise later on.
7. Better employee engagement
Having access to data insights encourages employees to be more engaged. It promotes
collaboration within the group, and this time, it‘s not just the data that‘s doing the talking but
the whole organization itself working together for the success of the business.
24. Since operations analytics is a relatively new concept in the business world, its benefits may
not always be felt immediately. This is caused by a number of reasons such as siloed data
sets, difficulties accessing third-party data, and lack of strong mandate from organizations
Social Analytics refers to the collection and analysis of statistical and digital data on how users
interface with an organization, particularly online.
Over the last decade, social analytics has become a primary form of business intelligence, used to
identify, predict, and respond to consumer behaviour. Throughout our everyday lives, when
browsing on an online store, using a member card to buy groceries on sale, or sharing special offers
from our favourite coffee shop on our social networks, each of us continually drops pieces of
intelligence. With nearly every click we make, data about our online activity is being collected; it
would be difficult to find a website that didn‘t monitor and analyse its usage in some way. Some
websites use only one social analytics tool (e.g. UBC uses Google Analytics), while others use
many more. Indeed, this site is being analysed using Google
Analytics. Social analytics programs enable analysts to glimpse meaningful trends in this mass of
Forms of Social Analytics
There are two forms of social analytics that have applicability to learning technologies: web
analytics and social media analytics.
1. Web Analytics
website administrators use a social analytics service, such as Google Analytics. It is the most
widely used social analytics tool globally in order to capture and analyse data. Google Analytics
first launched in 2005. The web analytics are useful to understand the following.
Site visits and unique site visits (i.e. unique, independent visitors as opposed to one visitor
25. visiting a site multiple times)
The pages that are the most and the least viewed
Search terms used to find the site
Physical location of site visitors (city/country) and the time of day that most visitors access
The last page site visitors access before leaving
The web browsers and operating systems that visitors use (for instance, the Google
Analytics on this page reveals that our team includes three Windows and one Mac users,
and two Firefox and two Chrome users!)
This information can be used to identify which parts of a website are effectively serving the site
owner‘s objectives (―Which links are directing lots of traffic to my site? Should I deepen my
relationship with that organization?‖), and which are detracting from those objectives (―Why do
people leave my site from that page more than from any others?‖).
26. 2. Social Media Analytics
In the last two to three years, we have witnessed the emergence of more sophisticated social
analytics tools that measure an organization‘s 'influence' over social media. These analytics
perform this task by collecting and analysing data related to a given organization across
various social media sites (―Do people tweet favourably about my company? Do they tweet
about it at all?‖ "What are the major trends in my field that we can see over social media?
How can I capitalize on them?"). This data can help to provide useful demographic
information on who an organization's audience is. Consider, for instance, the following
graphic developed by ‗Viral heat‘ company which compares television ratings with the
number of mentions a show gets over social media. How do you think the intelligence
gathered by social media might be useful to the producers and advertisers?
27. Social media analytics helps organizations to identify which social media tools and strategies
are measurably benefiting their objectives -- and which have a neutral effect or may even be
hindering those objectives. This data helps organizations measure the return on investment
(ROI) of their social media strategies, and to continually plan how to best use social media to
The Limitations of Social Analytics
Case study: Klout
While social analytics usefully helps to identify broad trends in quantitative, digital data - and
may be used to successfully predict individual consumer behaviour - like all statistics, it
provides neither understandings of content (i.e. the content of a tweet), nor interpretations or
explanations of behaviour. As with all statistics programs, social analytics requires skillful
analysts to statistically test findings to determine their significance, and to offer meaningful
interpretations and explanations of them. The ongoing controversy surrounding the social
analytics service Klout illustrates some of the limitations of social analytics. It also raises
questions about and the proper - and improper - ethical and scientific use of statistical social
Klout is a social analytics service that sets out to measure an individual‘s ―influence based on
your ability to drive action‖ ( www.Klout.com); this metric is based entirely on that person‘s
presence on social media websites. It assigns individuals a numerical influence score, visible
to all members of Klout. Some report that Klout scores have factored into hiring decisions.
Unlike many social media websites that one needs to actively sign on to, Klout created
profiles for people who happen to be connected to Klout members on other social networks
(e.g. a Klout member‘s Facebook friends). This included minors. Klout‘s creation of profiles
for people, especially minors, without their knowledge initiated a broad controversy
about online privacy and ethics; Klout has since ceased to create profiles for people without
their knowledge. Online discussions concerning Klout have also foregrounded that statistical
analyses and algorithms cannot measure intangible, abstract qualities, such as ―influence.‖
28. Compliance Analytics
Many organizations obtain large amounts of data, which contributes to management
challenges. Although the company can utilise the big data to gain a competitive advantage in
your industry, because of its increased regulation, its analytics becomes a challenge. This is
particularly important when compliance with various data protection regulations is needed.
The rise in data insecurity has also led to increased regulations on big data to ensure its
security. If a company manages big data, a system should be built to ensure that you interpret
it correctly and seal all the gaps that could lead to breaches.
The data stored in your organization may include data that you have received from your
customer or from public sources that you collected. To meet the requirements of various
regulatory bodies, you will be expected to provide a detailed report stating the type of data
you collect, how you use it, how you share it with suppliers—the security measures in place
to avoid data breaches.
Big Data and Compliance
Big data affects the compliance process directly because you will be expected to account for
its flow inside your organization. The regulatory bodies are keen to examine every stage of
data handling, including the collection, processing, and storage of data. The primary reason
for the comprehensive evaluation is to make sure that the data is safe from cyberattacks.
In order to get compliance status, you will build security measures to secure the data. During
the analysis, you are expected to show how each of the techniques for risk mitigation works
and their level of effectiveness. This thorough report on the data protection programs will
make the organization‘s certification easier.
Unlike organizations dealing with small data, using sophisticated analytics tools will be
needed. You must also employ qualified professionals to analyze the data, identify security
threats, and suggest strategies for mitigating them. During the enforcement process,
managing big data would take more resources compared to handling small data.
However, organizations will take advantage of managing big data, which helps to get direct
predictive information about the probability of an attack. In auditing, the auditors are likely to
adopt more rigorous steps when using this type of data than when using small data. As such,
29. the use of big data analytics is one of the surest ways of building some of the organization‘s
most robust security systems.
How Big Data is used in the Process of Compliance
Big data assists the creation of a compressive risk assessment framework by:
Fraudulent Crime Prevention: The use of big data strengthens the approach to predictive
analysis, which is an effective way of detecting criminal activities such as money laundering.
If a compliance officer uses big data for internal audits, cyber risks are discovered, and they
intervene to avoid their occurrence. It speeds up the process of compliance and builds trust
among your clients.
Managing Third Parties Threat: If you are in the process of obtaining compliance
certifications, you must maintain the risk associated with sharing the data with vendors
appropriately. Big data analytics can help you manage vendor-related risks. This you will
accomplish by carefully evaluating their ability to protect your data before sharing with them.
Helps in Customer Service: You are required to prove that your customers are pleased with
how you treat their data before you get any compliance certification. If you apply big data
analytics, you will understand your customer‘s behavior, which will directly influence the
decision-making process, thereby enabling the compliance process.
What is data compliance?
In a broad sense, data compliance is following the law of the land. It requires implementing
policies, procedures, workflows and operations to ensure legal obligations are met. Data is a
fluid entity, and compliance requires constant time and attention.
Data compliance regulations can differ from country to country or continent to continent. For
instance, U.S. companies operating globally must be cognizant of the EU's GDPR.
Why Data Compliance
All data within an organization must meet data compliance standards. It's important
employees work for a data-compliant company with a strong set of ethics for a number of
30. Data compliance signals to customers that their personal information will remain secure,
thereby building brand loyalty.
It helps avoid noncompliance and reduces the chance of bad publicity, which can damage
When a corporate code of ethics includes data compliance, it helps attract high-caliber
Data compliance enhances the bottom line and is a best practice for any data-driven
Perception is reality when it comes to protecting data
Strong brand perception is a valuable asset. Privacy policies can help strengthen a company's
image and can be a valuable marketing tool. On the flip side, data breaches are poison for
social media platforms and can lead to lawsuits. In recent years, Facebook's brand has been
tarnished by data privacy issues, which make some users hesitant to try the platform's new
Data analytics can be a positive business attribute when adhering to compliance parameters.
Many companies mine data -- often via mobile apps used to establish user profiles, which
then offer users customized deals.
This practice, however, comes with a mandate to use the information responsibly. Dynamic
Yield, a startup that provides retailers with algorithmically driven "decision logic"
technology, was purchased by McDonald's in 2019 to help companies mine data. As a result,
drive-thru customers are recommended items based on their app purchasing history. The
technology helps increase business and brand popularity.
When managing data compliance regulations, it's important to establish an internal data ethics
framework. The process should encompass regulatory obligations, while also creating a
balance between the commercial and ethical value of the company's data. When data
compliance results in lower profits, an ethics framework helps organizations avoid sacrificing
compliance for profit.
Data regulations vary across countries and industries. Some IT professionals worry that if, or
when, universal data compliance policies are established, they will give hackers a clearer
31. roadmap to follow. That's why companies always need to be diligent about protecting data --
and a reason why following basic regulations isn't enough.
One size does not fit all
Data compliance is not the same across the board. Variations in data retention under HIPAA,
for example, make transference of health records more difficult. In some states, medical
records are owned by the healthcare provider. Minimum retention periods can from vary from
seven to 10 years. HIPAA Journal also noted that the most common violations include "the
failure to perform an organization-wide risk analysis to identify risks to the confidentiality,
integrity and availability of protected health information … [and] delayed breach
In another example, CCPA, modeled after GDPR, was implemented to enhance privacy
rights and consumer protection for residents of California. It gives consumers more control
over how their data can be shared or managed. Social media users in California who want
their profile deleted from a platform and all related ecosystems can have the task performed
immediately -- not a month later as is standard for some social media platforms. Users can
also demand a report on how their data is shared with other digital platforms and request that
sharing be stopped in many cases.
Consumer awareness surrounding data can also prompt companies to rethink or modify how
they go to market.
The awareness factor
A lack of awareness for what constitutes adequate data compliance is a real concern. Some
smaller entities have limited knowledge about what it means to follow guidelines -- and some
of these firms may decide to risk bad publicity to avoid the added security expenses of
preparing for a data breach.
Taking a customer-first approach by being upfront when there is a breach -- not weeks or
months later as has been true with some high-profile incidents -- may keep some of those
customers from turning away when they see efforts are being made to fix the problem.
Sharing personal data online requires a leap of faith and means the end user is saying, "We
trust you to use this information ethically and to protect it from bad actors." Businesses
32. should not misuse the trust placed in them to be data-compliant. They must establish
guidelines and ethics that go above and beyond any current regulations.
Fraud analytics is the use of big data analysis techniques to prevent online financial fraud. It
can help financial organizations predict future fraudulent behavior, and help them apply fast
detection and mitigation of fraudulent activity in real time.
More people are using online banking or managing their finances online every year. In 2020,
the worldwide lockdown due to COVID19 convinced even more customers to use online
banking for at least a portion of their financial activities. Online fraud, already increasing
year over year, has followed suit. Account takeover (ATO), a particularly popular form of
financial fraud, jumped over 280 percent between Q2 2019 and Q2 2020. Financial
institutions must, more than ever, apply comprehensive fraud management measures to
protect their customers‘ accounts.
Types of Online Financial crimes
Account Takeover: ATO is when a fraudster uses stolen credentials to access an existing
online account, for example at a bank or merchant.
Sim Swapping: This is a form of ATO where the fraudster uses a victim‘s personal
information, stolen from a data breach or gleaned from other information sources such as
social media, to convince the mobile company to port the victim‘s phone number to the
fraudster‘s mobile phone.
Phishing: A phishing attack is when the fraudster impersonates a legitimate website in an
email or text to get the victim ultimately to divulge personal information or transfer funds.
Malware: Fraudsters use various methods, phishing for example, to trick the victim into
loading malicious software onto their device to log keystrokes, corrupt data, or render the
device unusable unless a ransom is paid.
Card Not Present (CNP): In CNP fraud the fraudster uses a stolen credit card account to
make a transaction where the physical card is not required, for example an online purchase
from an ecommerce site. As with other forms of fraud, credit card fraud is on the rise.
Man in the Middle Attack: A MitM attack occurs when a fraudster intercepts communication
between an online service and the customer for the purpose of stealing information or
hijacking the online session.
The bad news is that online fraud is constantly evolving. As banks put remediation measures
in place, new threats appear. Traditional, static rules-based fraud prevention systems can‘t
keep pace. The good news is that there is a wealth of data available to financial organizations
that can be used to predict and detect financial fraud and adapt to new threats.
Collecting a username and password at login is no longer sufficient to guard against
fraudulent activity. When someone accesses, or attempts to access, an account there is other
data that can be used to determine whether or not this is a legitimate customer and whether or
not the transaction requested is legitimate. This includes data like:
33. What device are they using?
Has this device been previously registered with the bank?
Can they verify their identity with a fingerprint?
Does the transaction being requested fit their historical patterns?
In an authentication sense, this data can be broken out into four categories:
Knowledge: something the user knows, e.g. their password, social security number, etc.
Possession: something the user has, e.g. their mobile phone, etc.
Inherence: something the user is, e.g. their fingerprint, palm print, etc.
Behavioral: something the user does or is doing, e.g. their requested transaction
Answering all these questions requires accessing and analyzing big data. It would be
impossible for fraud analysts or data scientists to process such requests manually. One thing
banks absolutely don‘t want to do is add any unwarranted friction into the customer session.
Traditionally, banks had in place a set of rules that would examine requests and offer a go/no
go decision. These rules-based anti-fraud systems keep expanding the rule sets and becoming
extremely complex, yet don‘t adapt to hidden or unknown threats. They typically result in too
many false-positives – blocking legitimate transactions – and missed fraudulent
transactions. On the other hand, machine learning (ML) provides the ability to collect
massive amounts of disparate data, analyze that data at scale and in context, and assign a risk
score in real-time. This enables a risk-based fraud analytics solution to apply the precise level
of security, at the right time, through step-up authentication.
Machine learning models for fraud detection and prevention
Fraud analytics applies machine learning techniques to financial data. Machine learning is a
subset of Artificial Intelligence (AI). Where AI is the computer implementation of a human-
like thought or decision-making process, machine learning uses mathematical algorithmic
techniques to extract complex relationships within the data being analyzed. Fraud analytics
uses machine learning to examine all the pertinent data regarding a transaction and assigns a
risk score to the transaction. Based on the risk score it makes a recommendation to allow the
transaction, block the transaction, or ask for step-up authentication before allowing the
transaction. And this can all be done in real time with or without human intervention,
providing the financial institution with enhanced fraud prevention without causing undue
friction in the customer session. Every transaction, from login to logout, can be examined for
potential fraud risk.
A machine learning system can be supervised or unsupervised. Unsupervised machine
learning models analyze unlabeled data to identify anomalies between what is usual and what
is unusual. The model can then detect otherwise hidden relationships in the data to infer a
function or instruction set that describes the underlying structure and dimensions of the data.
This function or instruction set can then be applied to new and unseen data to continue the
34. That‘s good. But a supervised model is better. With supervised machine learning, the model
is trained using labelled data (fraud data and other data) and predicts the likelihood of
fraud. You train a supervised model by presenting it with both fraudulent and legitimate
events and running it to develop an instruction set or algorithm that is applied to further
examples. The trained model can then identify unknown as well as known patterns to produce
an accurate risk score for a requested transaction.
Big data analytics techniques to combat financial fraud
Data science is part of the solution. Financial Institutions collect huge amounts of behavioral,
device and transactional data. Analysis of this data by the fraud detection system and/or fraud
investigations team can be used in the prevention and detection of financial fraud. But the
analysis will only be as good as the data in the data set. With good data, there are a number of
big data analysis techniques that a machine learning-based fraud analytics system can use to
combat financial fraud.
Predictive analytics looks at patterns to make predictions on future, heretofore unknown
events to understand the potential or propensity for fraud.
Pattern recognition and anomaly detection identifies events that don‘t conform to expected
patterns. Machine learning algorithms can learn from the data and make predictions on future
Visual analytics tools include digital channel unification which automatically aggregates and
monitors transactions for suspicious activity, web-based case management for fraud analysts
to review fraud cases and analyze key fraud indicators, and fraud visualization tools to
quickly identify the source of potentially fraudulent transactions.
Forensic analysis, the examination of the causes and consequences of a financial fraud event,
can benefit from visual analytics data which provides data on the users, devices, locations, IP
addresses and relationships associated with a fraud case. Analysis of the data and
relationships can identify potentially fraudulent behavior and expose cooperation between
Deploying a fraud analytics solution
Tier 2 and tier 3 financial institutions typically do not have large fraud teams or deep
resources to devote to fraud prevention. But they need comprehensive fraud prevention
because these institutions experience the same fraud use cases and fraud scenarios as
institutions with global operations. So it is imperative to choose wisely when selecting a
fraud prevention solution. Among the key capabilities to consider, is making sure your
preferred vendor has deep experience in fraud prevention within the banking industry.
Evaluate the solution for security and controls, scalability, and infrastructure capability. It
should use the latest advanced analytics. In addition, a risk analytics-based fraud prevention
solution should cover all of your transaction scenarios and support historical data migration.
For the tier 2 and tier 3 financial institutions with smaller, often overworked IT teams,
solution providers like OneSpan have professional services teams that can help design,
35. implement and manage a comprehensive fraud analytics solution that works for your
A final word on prevention of financial fraud using fraud analytics
Fraud will continue to grow, whether that is financial fraud, insurance fraud, or even fraud in
the healthcare industry. This has the potential to significantly disrupt the customer
relationship and customer loyalty. The challenge for financial institutions is to deploy
comprehensive fraud protection to mitigate attacks without injecting needless friction into the
customer experience. Fraud analytics solutions based on machine learning techniques have
been effective in meeting both of these objectives.
Many a compliance officer has said one of the biggest challenges they face is that they don‘t
know what they don‘t know, a fear traditionally heightened by not having enough visibility
into the overall operations of the business. But in a digital age, most the answers are there,
buried in an ocean of data, waiting to be discovered.
Once unearthed, that data—the holy grail of compliance—must be deciphered if it‘s to
unlock any true value. That, in essence, is compliance analytics: It‘s the process of gathering
all the data the company holds (and even data that it does not hold) and analyzing it using
statistical algorithms to mine for patterns and anomalies to uncover things like fraud, policy
violations, and other misconduct.
Data Searching Algorithms in Search Engines
Searching Algorithms are designed to check for an element or retrieve an element from any
data structure where it is stored. Based on the type of search operation, these algorithms are
generally classified into two categories:
1. Sequential Search: In this, the list or array is traversed sequentially and every element is
checked. For example: Linear Search.
2. Interval Search: These algorithms are specifically designed for searching in sorted data-
structures. These type of searching algorithms are much more efficient than Linear Search as
they repeatedly target the center of the search structure and divide the search space in half. For
Example: Binary Search.
Linear Search to find the element “20” in a given list of numbers
Binary Search to find the element “23” in a given list of numbers