FDA_SAKEC2018.pptx

Statistics: Unlocking the Power of Data Lock5
Financial DATA ANALYTICS
Dr. M.Vijayalakshmi, VESIT
4th jan 2018, SAKEC Mumbai

Financial Data
The financial industry has always been driven by data.
Today, Big Data is prevalent at various levels of this field, ranging from
the financial services sector to capital markets.
The availability of Big Data in this domain has opened up new avenues
for innovation and has offered immense opportunities for growth and
sustainability.
At the same time, it has presented several new challenges that must be
overcome to gain the maximum value out of it.

Financial Data Analytics in a Nut Shell

Motivation
There has been an explosion in the velocity, variety and volume of financial
data. Social media activity, mobile interactions, server logs, real-time market
feeds, customer service records, transaction details, information from existing
databases – there’s no end to the flood.
To make sense of these giant data sets, companies are increasingly turning to
data scientists for answers. These numbers gurus are:
 Capturing and analyzing new sources of data, building predictive models and running
live simulations of market events
 Using technologies such as Hadoop, NoSQL and Storm to tap into non-traditional data
sets (e.g., geolocation, sentiment data) and integrate them with more traditional
numbers (e.g., trade data)
 Finding and storing increasingly diverse data in its raw form for future analysis
They’ve been aided in this quest by the development of cloud-based data
storage and the surge of sophisticated (and sometimes free or open-source)
analytics tools.

Important Applications of Financial
Data Analytics
1. Predictive Analytics / Trading
2. Sentiment Analysis
3. Financial Fraud
4. Credit Scoring Ratings
5. Pricing
6. Customer Segmentation
7. Know Your Customer

Sentiment Analysis
Sentiment analysis (aka opinion mining) applies natural-language
processing, text analysis and computational linguistics to source material
to discover what folks really think.
Several big Businesses like MarketPsy Capital, Think Big Analytics and
MarketPsych Data are using it to:
Build algorithms around market sentiment data (e.g., Twitter feeds) that
can short the market when disasters (e.g., storms, terrorist attacks) occur
Track trends, monitor the launch of new products, respond to issues and
improve overall brand perception
Analyze unstructured voice recordings from call centers and recommend
ways to reduce customer churn, up-sell and cross-sell products and detect
fraud
Some data companies are even acting as intermediaries, collecting and
selling sentiment indicators to retail investors.

Automated Risk Credit Management
Internet finance companies are finding ways to approve loans and manage risk.
Aliloan (from AliBaba) is an automated online system that provides flexible
micro-loans to entrepreneurial online vendors.
To gauge whether a vendor is creditworthy, Alibaba collects data from its e-
commerce and payment platforms and analyzes transaction records, customer
ratings, shipping records and a host of other info.
These findings are confirmed by third-party verification and cross-checked
against external data sets (e.g., customs, tax data, electricity records, etc.).
Once the loan is granted, Alibaba continues to monitor the use of funds and
assess the business’s strategic development.
Entrepreneurs in emerging markets are also reaping the benefits. Like Aliloan,
companies such as Kreditech and Lenddo provide automated small loans based
on innovative credit scoring techniques. In these cases, much of the score is
calculated from applicants’ online social networking data.

Real Time Analytics
In days of yore, financial institutions were hampered by the lag-time between data
collection and data analysis. Real-time analytics short-circuits this problem and provides
the industry with new ways to:
Fight Financial Fraud: Banks and credit card companies routinely analyze account
balances, spending patterns, credit history, employment details, location and a load of
other data points to determine whether transactions are above aboard. If suspicious
activity is detected, they can immediately suspend the account and alert the owner.
Improve Credit Ratings: A continuous feed of online data means credit ratings can
be updated in real time. This provides lenders with a more accurate picture of a
customer’s assets, business operations and transaction history.
Provide More Accurate Pricing: Progressive Insurance already tailors its policies to
account for a customer’s changing financial situation. In the Internet of Things, data
from automobile sensors will also help insurance companies issues its policy holders
with warnings about accidents, traffic jams and weather conditions. That makes for
safer drivers and fewer payouts

Customer Segmentation
Like every other industry on the planet, banks and financial
institutions are hungry to know more about the people using their
products and services. And though they already store a ton of data
– from credit scores to day-to-day transactions – they’re not too
proud to look for it elsewhere.
 This kind of customer segmentation allows them to:
 Offer customized product offerings and services
 Improve existing profitable relationships and avoid customer churn
 Create better marketing campaigns and more attractive product offerings
 Tailor product development to specific customer segments

Predictive Analytics
By combining segmentation with predictive analytics, companies can also cut down on
risk. For example, to decide whether certain customers are likely to pay off their credit
cards, some major banks use technology developed by the company Sqrrl. This analysis
takes into account the demographic characteristics of customers’ neighborhoods and
makes calculated predictions.
Similar strides have been made in forecasting market behavior. Once upon a time (e.g.,
2009), high-frequency trading – the speedy exchange of securities – was hugely
lucrative. With competition came a drop in profits and the need for a new strategy.
HFT traders adapted by employing strategic sequential trading, using big data analytics
to identify specific market participants and anticipate their future actions. In a field of
breakneck speed, this gives HFT traders an unmistakable advantage.
By studying search volume data provided by Google Trends, they were able to identify
online precursors for stock market moves. Their results suggest that increases in search
volume for financially relevant search terms usually precede big losses in financial
markets.

Analytics of Financial Times Series
A vast majority of Financial data occurs in the form of a times series
 Stock prices (ticker data)
 Asset prices
 Customer Numbers
 Etc
So Financial Data Analytics places a lot of importance on Financial times
series analytics

Examples of financial time series
Daily log returns of Apple stock: 2007 to 2016 (10 years)
BSE index
Quarterly earnings of Coca-Cola Company: 1983-2009 Seasonal time
series useful in
 earning forecasts
 pricing weather related derivatives (e.g. energy)
 modeling intraday behavior of asset returns
Exchange rate between US Dollar vs Re
Size of insurance claims Values
High-frequency financial data: Tick-by-tick data of stock, etc

13
Mining Time-Series Data
A time series is a sequence of data points, measured typically at
successive times, spaced at (often uniform) time intervals
Time series analysis: A subfield of statistics, comprises methods that
attempt to understand such time series, often either to understand the
underlying context of the data points or to make forecasts (or
predictions)
Methods for time series analyses
 Frequency-domain methods: Model-free analyses, well-suited to
exploratory investigations
 spectral analysis vs. wavelet analysis
 Time-domain methods: Auto-correlation and cross-correlation
analysis
 Motif-based time-series analysis
Applications
 Financial: stock price, inflation
 Industry: power consumption
 Scientific: experiment results
 Meteorological: precipitation

Statistics: Unlocking the Power of Data Lock5 14
Time-Series Data Analysis: Prediction &
Regression Analysis
(Numerical) prediction is similar to classification
 construct a model
 use model to predict continuous or ordered value for a given input
Prediction is different from classification
 Classification refers to predict categorical class label
 Prediction models continuous-valued functions
Major method for prediction: regression
 model the relationship between one or more independent or
predictor variables and a dependent or response variable
Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear model, Poisson
regression, log-linear models, regression trees

What is Regression?
Modeling the relationship between one response variable and one or
more predictor variables
Analyzing the confidence of the model
E.g, height v.s weight

Regression Yields Analytical Model
Discrete data points →Analytical model
 General relationship
 Easy calculation
 Further analysis
Application - Prediction

Application - Detrending
Obtain the trend for irregular data series
Subtract trend
Reveal oscillations
trend

Linear Regression - Single Predictor
Model is linear
y = w0 + w1 x
where w0 (y-intercept) and w1
(slope) are regression coefficients
Method of least squares:
y: response
variable
x: predictor
variable
w1
w0
| |
1
| |
2
1
( )( )
1
( )
D
i i
i
D
i
i
x x y y
x x
w 

 



 x
w
y
w
1
0



Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
E.g., for 2-D data or
y = w0 + w1 x1+ w2 x2
Solvable by
 Extension of least square method
(XTX ) W=Y →W = (XTX ) -1Y
 Commercial software (SAS, S-Plus) x1
x2
y
Linear Regression – Multiple Predictor

Nonlinear Regression with Linear Method
Polynomial regression model
 E.g., y = w0 + w1 x + w2 x2 + w3 x3
Let x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
Log-linear regression model
 E. g., y = exp(w0 + w1 x + w2 x2 + w3 x3 )
Let y’=log(y)
y’= w0 + w1 x + w2 x2 + w3 x3

Generalized Linear Regression
Response y
 Distribution function in the exponential family
 Variance of y depends on E( y), not a constant
E( y) = g-1( w0 + w1 x + w2 x2 + w3 x3 )
Examples
 Logistic regression (binomial regression): probability of some
event occurring
 Poisson regression: number of customers
 …
References: Nelder and Wedderburn, 1972; McCullagh and
Nelder, 1989

22
Regression Tree (Breiman et al., 1984)
Partition the domain space
Leaf: (1) a continuous-valued
prediction; (2) average value

Model Tree
Leaf – a linear equation
More general than regression tree
Figure source: http://datamining.ihe.nl/research/model-trees.htm

Regression Trees and Model Trees
Regression tree: proposed in CART system (Breiman et al. 1984)
 CART: Classification And Regression Trees
 Each leaf stores a continuous-valued prediction
 It is the average value of the predicted attribute for the training tuples that
reach the leaf
Model tree: proposed by Quinlan (1992)
 Each leaf holds a regression model—a multivariate linear equation for the
predicted attribute
 A more general case than regression tree
Regression and model trees tend to be more accurate than linear
regression when the data cannot be represented well by a simple
linear model

A time series can be illustrated as a time-series graph
which describes a point moving with the passage of time

Categories of Time-Series Movements
Categories of Time-Series Movements
 Long-term or trend movements (trend curve): general direction in
which a time series is moving over a long interval of time
 Cyclic movements or cycle variations: long term oscillations about a
trend line or curve
e.g., business cycles, may or may not be periodic
 Seasonal movements or seasonal variations
i.e, almost identical patterns that a time series appears to follow
during corresponding months of successive years.
 Irregular or random movements
Time series analysis: decomposition of a time series into these four
basic movements
 Additive Modal: TS = T + C + S + I
 Multiplicative Modal: TS = T  C  S  I

Estimation of Trend Curve
The freehand method
 Fit the curve by looking at the graph
 Costly and barely reliable for large-scaled data mining
The least-square method
 Find the curve minimizing the sum of the squares of the deviation of points on
the curve from the corresponding data points
The moving-average method
27

Moving Average
Moving average of order n
 Smoothes the data
 Eliminates cyclic, seasonal and irregular movements
 Loses the data at the beginning or end of a series
 Sensitive to outliers (can be reduced by weighted moving
average)

Trend Discovery in Time-Series (1):
Estimation of Seasonal Variations
Seasonal index
 Set of numbers showing the relative values of a variable during the
months of the year
 E.g., if the sales during October, November, and December are 80%,
120%, and 140% of the average monthly sales for the whole year,
respectively, then 80, 120, and 140 are seasonal index numbers for
these months
Deseasonalized data
 Data adjusted for seasonal variations for better trend and cyclic
analysis
 Divide the original monthly data by the seasonal index numbers for
the corresponding months

February 2, 2023 Data Mining: Concepts and Techniques 30
Seasonal Index
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8 9 10 11 12
Month
Seasonal Index
Raw data from
http://www.bbk.ac.uk/mano
p/man/docs/QII_2_2003%2
0Time%20series.pdf

Trend Discovery in Time-Series (2)
Estimation of cyclic variations
 If (approximate) periodicity of cycles occurs, cyclic index can be constructed in
much the same manner as seasonal indexes
Estimation of irregular variations
 By adjusting the data for trend, seasonal and cyclic variations
With the systematic analysis of the trend, cyclic, seasonal, and irregular
components, it is possible to make long- or short-term predictions with
reasonable quality
31

Similarity Search in Time-Series Analysis
Normal database query finds exact match
Similarity search finds data sequences that differ only
slightly from the given query sequence
Two categories of similarity queries
 Whole matching: find a sequence that is similar to the query
sequence
 Subsequence matching: find all pairs of similar sequences
Typical Applications
 Financial market
 Market basket data analysis
 Scientific databases
 Medical diagnosis

Data Transformation
Many techniques for signal analysis require the data to be
in the frequency domain
Usually data-independent transformations are used
 The transformation matrix is determined a priori
 discrete Fourier transform (DFT)
 discrete wavelet transform (DWT)
The distance between two signals in the time domain is
the same as their Euclidean distance in the frequency
domain

Discrete Fourier Transform
DFT does a good job of concentrating energy in the first
few coefficients
If we keep only first a few coefficients in DFT, we can
compute the lower bounds of the actual distance
Feature extraction: keep the first few coefficients (F-index)
as representative of the sequence

DFT (continued)
Parseval’s Theorem
The Euclidean distance between two signals in the time
domain is the same as their distance in the frequency
domain
Keep the first few (say, 3) coefficients underestimates the
distance and there will be no false dismissals!







1
0
2
1
0
2
|
|
|
|
n
f
f
n
t
t X
x
|
]
)[
(
]
)[
(
|
|
]
[
]
[
|
3
0
2
0
2

 






f
n
t
f
Q
F
f
S
F
t
Q
t
S 


Multidimensional Indexing in Time-Series
Multidimensional index construction
 Constructed for efficient accessing using the first few Fourier coefficients
Similarity search
 Use the index to retrieve the sequences that are at most a certain small distance
away from the query sequence
 Perform post-processing by computing the actual distance between sequences in
the time domain and discard any false matches

Subsequence Matching
Break each sequence into a set of pieces of window with length w
Extract the features of the subsequence inside the window
Map each sequence to a “trail” in the feature space
Divide the trail of each sequence into “subtrails” and represent each of
them with minimum bounding rectangle
Use a multi-piece assembly algorithm to search for longer sequence
matches
37

Analysis of Similar Time Series

Enhanced Similarity Search Methods
Allow for gaps within a sequence or differences in offsets or amplitudes
Normalize sequences with amplitude scaling and offset translation
Two subsequences are considered similar if one lies within an envelope of
 width around the other, ignoring outliers
Two sequences are said to be similar if they have enough non-
overlapping time-ordered pairs of similar subsequences
Parameters specified by a user or expert: sliding window size, width of an
envelope for similarity, maximum gap, and matching fraction
39

Steps for Performing a Similarity Search
Atomic matching
 Find all pairs of gap-free windows of a small length that are
similar
Window stitching
 Stitch similar windows to form pairs of large similar
subsequences allowing gaps between atomic matches
Subsequence Ordering
 Linearly order the subsequence matches to determine whether
enough similar pieces exist

Similar Time Series Analysis
VanEck International Fund Fidelity Selective Precious Metal and Mineral Fund
Two similar mutual funds in the different fund group

Sequence Distance
A function that measures the differentness of two
sequences (of possibly unequal length)
Example: Euclidean Distance between TS Q,C



n
i i
i c
q
C
Q
D 1
2
)
(
)
,
(

Motif: Basic Concepts
What is a motif? A previously unknown, frequently
occurring sequential pattern
Match: Given subsequences Q,C ⊆ T,
C is a match for Q iff for some R
Non-Trivial Match: C = T[p..*], Q = T[q..*] and C match Q.
If p = q or ∄ non-match N = T[s..*] such that s between p,q
then match is non-trivial.
(i.e. C,Q must be separated by a non-match)
1-Motif: the subsequence with most non-trivial matches
(least variance decides ties)
k-Motif: Ck such that D(Ck,Ci) > 2R ∀i ∈ [1,k)
R
C
Q
D 
)
,
(

SAX: Symbolic Aggregate approXimation
Dim. Reduction/Compression
“Symbolic Aggregate approXimation”
SAX : ℝ → ∑
SAX : ↦ ccbaabbbabcbcb
Essentially an alphabet over the Piecewise Aggregate
Approximation (PAA) rank
Faster, simpler, more compression, yet on par with DFT,
DWT and other dim. reductions

SAX Illustration

SAX Algorithm
Parameters: alphabet size, word (segment) length (or output
rate)
1.Select probability distribution for TS
2.z-Normalize TS
3.PAA: Within each time interval, calculate aggregated value
(mean) of the segment
4.Partition TS range by equal-area partitioning the PDF into
n partitions (eq. freq. binning)
5.Label each segment with arank ∈∑ for aggregate’s
corresponding partition rank

Finding Motifs in a Time Series
EMMA Algorithm: Finds 1-(k-)motif of fixed length n
SAX Compression (Dim. Reduction)
 Possible to store D(i,j) ∀(i,j) ∈ ∑∑
 Allows use of various distance measures (Minkowski, Dynamic Time
Warping)
Multiple Tiers
 Tier 1: Uses sliding window to hash length-w SAX subsequences
(aw addresses, total size O(m)).
Bucket B with most collisions & buckets with
MINDIST(B) < R form neighborhood of B.
 Tier 2: Neighborhood is pruned using more precise ADM
algorithm. Ni with max. matches is 1-motif. Early stop if |ADM
matches| > maxk>i(|neighborhoodk|)

Hashing
c e c a b b c b a c c e c a b b c b a c
c c c c b b c c d c
w
n
2 4 2 0 1 1 2 1 0 2
5
2 2 2 2 1 1 2 2 3 2
5
2 4 2 0 1 1 2 1 0 2
5
… …
… …
…
… …
…
…
…
…

Classification in Time Series
Application: Finance,
1-Nearest Neighbor
 Pros: accurate, robust, simple
 Cons: time and space complexity (lazy learning); results are not
interpretable
0 200 400 600 800 1000 1200

Financial Data Applications
Fraud Detection - Anomaly Analysis

What are Anomalies?
Anomaly is a pattern in the data that does not conform to
the expected behavior
Also referred to as outliers, exceptions, peculiarities,
surprise, etc.
Anomalies translate to significant (often critical) real life
entities
 Cyber intrusions
 Credit card fraud

Real World Anomalies
Credit Card Fraud
 An abnormally high purchase made on a
credit card
Cyber Intrusions
 A web server involved in ftp traffic

Simple Example
N1 and N2 are regions of
normal behavior
Points o1 and o2 are
anomalies
Points in region O3 are
anomalies
X
Y
N1
N2
o1
o2
O3

Related problems
Rare Class Mining
Chance discovery
Novelty Detection
Exception Mining
Noise Removal
Black Swan*

Key Challenges
Defining a representative normal region is
challenging
The boundary between normal and outlying
behavior is often not precise
The exact notion of an outlier is different for
different application domains
Availability of labeled data for training/validation
Malicious adversaries
Data might contain noise
Normal behavior keeps evolving

Data Labels
Supervised Anomaly Detection
 Labels available for both normal data and anomalies
 Similar to rare class mining
Semi-supervised Anomaly Detection
 Labels available only for normal data
Unsupervised Anomaly Detection
 No labels assumed
 Based on the assumption that anomalies are very rare compared to normal data

Applications of Anomaly Detection
Insurance / Credit card fraud detection
Anti-Money Laundering (AML)
Fraud
Identity Theft and Fake Account Registration
Risk Modeling
Account Takeover
Promotion Credit Abuse
Customer Behavior Analytics
Cyber Security

Fraud Detection
Fraud detection refers to detection of criminal activities
occurring in commercial organizations
 Malicious users might be the actual customers of the organization
or might be posing as a customer (also known as identity theft).
Types of fraud
 Credit card fraud
 Insurance claim fraud
 Mobile / cell phone fraud
 Insider trading
Challenges
 Fast and accurate real-time detection
 Misclassification cost is very high

Classification Based Techniques
Main idea: build a classification model for normal (and anomalous (rare))
events based on labeled training data, and use it to classify each new
unseen event
Classification models must be able to handle skewed (imbalanced) class
distributions
Categories:
 Supervised classification techniques
 Require knowledge of both normal and anomaly class
 Build classifier to distinguish between normal and known anomalies
 Semi-supervised classification techniques
 Require knowledge of normal class only!
 Use modified classification model to learn the normal behavior and then detect any
deviations from normal behavior as anomalous

Classification Based Techniques
Advantages:
 Models that can be easily understood
 High accuracy in detecting many kinds of known anomalies
 Models that can be easily understood
 Normal behavior can be accurately learned
Drawbacks:
 Require both labels from both normal and anomaly class
 Cannot detect unknown and emerging anomalies
 Require labels from normal class
 Possible high false alarm rate - previously unseen (yet legitimate) data records
may be recognized as anomalies

Supervised Classification Techniques
Manipulating data records (oversampling /
undersampling / generating artificial examples)
Rule based techniques
Model based techniques
 Neural network based approaches
 Support Vector machines (SVM) based approaches
 Bayesian networks based approaches
Cost-sensitive classification techniques
Ensemble based algorithms (SMOTEBoost,
RareBoost

Semi-supervised Classification Techniques
Use modified classification model to learn the
normal behavior and then detect any deviations
from normal behavior as anomalous
Recent approaches:
 Neural network based approaches
 Support Vector machines (SVM) based approaches
 Markov model based approaches
 Rule-based approaches

Nearest Neighbor Based Techniques
Key assumption: normal points have close neighbors
while anomalies are located far from other points
General two-step approach
1. Compute neighborhood for each data record
2. Analyze the neighborhood to determine whether data
record is anomaly or not
Categories:
 Distance based methods
 Anomalies are data points most distant from other points
 Density based methods
 Anomalies are data points in low density regions

Clustering Based Techniques
Key assumption: normal data records belong to large and
dense clusters, while anomalies belong do not belong to any of
the clusters or form very small clusters
Categorization according to labels
 Semi-supervised – cluster normal data to create modes of normal
behavior. If a new instance does not belong to any of the clusters or it is
not close to any cluster, is anomaly
 Unsupervised – post-processing is needed after a clustering step to
determine the size of the clusters and the distance from the clusters is
required fro the point to be anomaly
Anomalies detected using clustering based methods can be:
 Data records that do not fit into any cluster (residuals from clustering)
 Small clusters
 Low density clusters or local anomalies (far from other points within the
same cluster)

Clustering Based Techniques
Advantages:
 No need to be supervised
 Easily adaptable to on-line / incremental mode suitable for
anomaly detection from temporal data
Drawbacks
 Computationally expensive
Using indexing structures (k-d tree, R* tree) may alleviate this
problem
 If normal points do not create any clusters the techniques
may fail
 In high dimensional spaces, data is sparse and distances
between any two data records may become quite similar.
Clustering algorithms may not give any meaningful clusters

Statistics Based Techniques
Data points are modeled using stochastic distribution 
points are determined to be outliers depending on their
relationship with this model
Advantage
 Utilize existing statistical modeling techniques to model various type
of distributions
Challenges
 With high dimensions, difficult to estimate distributions
 Parametric assumptions often do not hold for real data sets

Types of Statistical Techniques
Parametric Techniques
 Assume that the normal (and possibly anomalous) data is generated
from an underlying parametric distribution
 Learn the parameters from the normal sample
 Determine the likelihood of a test instance to be generated from this
distribution to detect anomalies
Non-parametric Techniques
 Do not assume any knowledge of parameters
 Use non-parametric techniques to learn a distribution – e.g. parzen
window estimation

Information Theory Based Techniques
Compute information content in data using information
theoretic measures, e.g., entropy, relative entropy, etc.
Key idea: Outliers significantly alter the information content
in a dataset
Approach: Detect data instances that significantly alter the
information content
 Require an information theoretic measure
Advantage
 Operate in an unsupervised mode
Challenges
 Require an information theoretic measure sensitive enough to detect
irregularity induced by very few outliers

Visualization Based Techniques
Use visualization tools to observe the data
Provide alternate views of data for manual
inspection
Anomalies are detected visually
Advantages
 Keeps a human in the loop
Disadvantages
 Works well for low dimensional data
 Can provide only aggregated or partial views for high
dimension data

Visual Data Mining*
Detecting Tele-
communication fraud
Display telephone call
patterns as a graph
Use colors to identify
fraudulent telephone
calls (anomalies)

Contextual Anomaly Detection
Detect context anomalies
General Approach
 Identify a context around a data instance (using a set of
contextual attributes)
 Determine if the data instance is anomalous w.r.t. the context
(using a set of behavioral attributes)
Assumption
 All normal instances within a context will be similar (in terms of
behavioral attributes), while the anomalies will be different

Contextual Attributes
Contextual attributes define a neighborhood
(context) for each instance
For example:
 Spatial Context
Latitude, Longitude
 Graph Context
Edges, Weights
 Sequential Context
Position, Time
 Profile Context
User demographics

Sequential Anomaly Detection
Detect anomalous sequences in a database of
sequences, or
Detect anomalous subsequence within a sequence
Data is presented as a set of symbolic sequences
 System call intrusion detection
 Proteomics
 Climate data

Motivation for On-line Anomaly Detection
Data in many rare events applications arrives continuously
at an enormous pace
There is a significant challenge to analyze such data
Examples of such rare events applications:
 Video analysis
 Network traffic monitoring
 Credit card fraudulent transactions

Sentiment Analysis for Finance
Sentiment analysis is an emerging area where structured and
unstructured data is analyzed to generate useful insights leading to
improved performances.
Information obtained from multiple sources including news wires, macro-
economic announcements, social media, micro blogs /twitter, online
(search) information such as Google trends and Wikipedia influence both
business intelligence and performance evaluation.
This sentiment data can help investors and finance professionals to
exploit the market and manage their risk exposure.
 Stock market prediction
 New product review
 Stock Trading
 Customer Brand Building

Sentiment Analysis in Finance

Thank You

FDA_SAKEC2018.pptx

Recomendados

Recomendados

Más contenido relacionado

Similar a FDA_SAKEC2018.pptx

Similar a FDA_SAKEC2018.pptx (20)

Último

Último (20)

FDA_SAKEC2018.pptx