SlideShare una empresa de Scribd logo
1 de 67
Descargar para leer sin conexión
Qu Speaker Series
Frontiers in Alternative Data : Techniques and Use
Cases
2020 Copyright QuantUniversity LLC.
Hosted By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.qu.academy
09/22/2020
Online
https://quspeakerseries9.splashthat.com/
2
QuantUniversity
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Building a platform for AI
and Machine Learning Exploration
and Experimentation
For registration information, go to
https://QuSummerSchool.splashthat.com
3
4
For registration information, go to
https://QuFallSchool.splashthat.com
5
https://qufintech.splashthat.com/
6
Frontiers in Alternative Data : Techniques and Use Cases
7
8
Speakers
9
The Book of Alternative Data
1
The Book of Alternative
Data
S E P T E M B E R 2 0 2 0
2
The Book of Alternative Data
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
• Co-authored by Alexander
Denev and Saeed Amen
• The Book of Alternative Data
(on Wiley)
• Hardback available on Amazon
USA now (elsewhere in Sep)
• Kindle available on Amazon
worldwide
• Presentation is based on the
book
3
• Common properties
• Less commonly used by market participants
• Tends to be more expensive
• Often outside financial markets (is tick data “alternative”?)
• Shorter history
• More challenging to use
• “Exhaust data” a byproduct of other processes
• Digital footprint from individual and corporate activity
• Resulted in a rapid rise in the number of alternative datasets
• Can provide an additional revenue stream for those who collect “exhaust data”
• Not all alternative data is necessarily Big Data (but it can be!)
Saeed Amen / @saeedamenfx
What is alternative data?
4
Alternative data & investments case studies
Several clear case studies have emerged demonstrating the value of analytics in combination with alternative data applied to the investment process
ON-LINE PRICE =
INFLATION
Global FSI Firm employs
technology to track prices of 5
million products on-line to
understand price shocks and
monitor shifts in inflation across 70
countries1
MOBILE FOOT TRAFFIC =
ECONOMY
Hedge Funds using location data
pulled from mobile devices to predict
outlook on economy and REIT
values4
SOCIAL+ SEARCH =
EARNINGS
$90B AUM Global Asset Manager mines
search engine data combined with social-
media data to predict results of corporate
events like quarterly earnings3
SATELLITE + SHIPS =
MISPRICED SECURITY
Hedge fund using satellite
intelligence on ships and tank levels
to identify upcoming impact to oil
producers and commodity prices5
WEB + TWITTER =
MARKET MOVINGEVENT
Data provider using 300M Websites, 150M
Twitter feeds in combination with analyst
presentations and FactSet reports to measure
rise up media food chain (e.g. blogs to
newswire) to highlight potentially market
moving events6
APP + CREDIT CARD =
PERFORMANCE
Hedge Fund looks at combination of
alternative data including credit card
transactions, geo-location, and app
downloads to analyze burger chain
performance2
1.Innovative Asset Managers, Eagle Alpha
2.“Foursquare Wants To Be The Nielsen Of Measuring The Real World,” Research Briefs, CBInsights, June 8, 2016.
3.Simone Foxman and Taylor Hall, “Acadian to Use Microsoft's Big Data Technology to Help Make Bets,” Bloomberg, March 7, 2017.
4.Rob Matheson, “Measuring the Economy With Location Data,” MIT News, March 27, 2018.
5.Fred R. Bleakley, “CargoMetrics Cracks the Code on Shipping Data,” Institutional Investor, February 04, 2016.
6.Accern website
AUM of UK-Based Man Group’s
AI/Analytics driven AHL Dimension
fund up 5x over 3 years
Accelerating AI Adoption
Deployed AI (Artificial Intelligence)
techniques to four additional funds
managing $12.3B
Varied Data Sources
Processes terabytes of data ranging
from weather forecasts to container
ship movements
Increasing Valuation
Man Group’s stock price has increased
by 55% from January to October 2017
AI Driving Profit
Artificial intelligence contributed roughly
50% of 2015 profits for the AHL
Dimension Fund
Source: Adam Satariano, “The Massive Hedge Fund Betting on AI,” Bloomberg,
September 27, 2017.
5
• Volume (increasing) – lots of data
• Variety (increasing) – not just numerical data, can be text, image, video etc.
• Velocity (increasing) – speed that data is being generated
• Variability (increasing) – inconsistencies in the data
• Veracity (decreasing) – difficult to tell if accurate (e.g. social media)
• Value (increasing) – business value of the data
Saeed Amen / @saeedamenfx
The Vs of Big Data
6
Quantitative investment strategies and vendor solutions with alpha generation capabilities are becoming critical component to
the return of the buy and sell side’s ROE to pre crisis levels
Addressing Market Challenges
Systematic/quant Investors, typically building
their own analytics
Who:
• Hedge Funds
• Sophisticated Buy Side Firms
Key Challenges:
• Access to good quality raw data or to
curated alternative data
• Maintaining access to cutting edge
technology and algorithms
Customer Needs:
• Co-location of analytics and data
• Simplified access to data and computation
• Simplified, but bespoke, data access
Sophisticated Quants
Most intuitive solutions needed. Limited
technology and programming capability
Who:
• Smaller Sell Side (DSIBs)
• Small Buy Side + Family offices
Key Challenges:
• Reducing technology costs associated
with efficient research tools
• Building/maintaining an edge against
passive benchmark returns
Customer Needs:
• Simplified access to data and computation
• Curated Signals
• Sophisticated, but low maintenance/build
cost analytics platforms
• Elastic access to analytics and associated
data science talent
Traditional Investors
Interested in derived analytics and more
intuitive solutions
Who:
• Large Sell Side (GSIBs)
• Traditional Buy Side Firms
Key Challenges:
• Reducing technology costs associated
with efficient research tools
• Retention and expansion of innovation
talent
Customer Needs:
• Simplified access to data and computation
• Curated Signals
• Simplified, but bespoke, data access
Traditional Quants
Sophisticated but ultra small scale with a
focus on highly scalable business models
Who:
• Alternative Data Providers
• Signal Factories
Key Challenges:
• Simplified access to data
• Ability and agility to scale
• Support of cutting edge algorithms and
alternative data sets
Customer Needs:
• Simplified access to data and computation
• Simplified, but bespoke, data access
• Sophisticated, but low maintenance/build
cost analytics platforms
• Marketplace creation
Fintechs
Sources: Macro trends database, January 2019; The Shift From Active to Passive Investing: Potential Risks to Financial Stability?, Federal Reserve System 2019; Deloitte Global Cost Survey, 2019; Alternative data for investment decisions:
Today’s innovation could be tomorrow’s requirement. Deloitte Centre for Financial Services, 2018 Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
7
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
The Rising Adoption of Alternative Data
Hedge funds were the innovators in this space, but the technology is reaching a tipping point and may see exponential growth
over the next year
Alternative data adoption curve – investment management constituents by phase
Largely hedge funds aggressively seeking information
advantage
Likely constituents
Aggressive long-only managers and PE
firms
Tech savvy large
complex IM firms
Traditional large
complex IM firms
Firms reluctant to
embrace new approaches
Innovators Early adopters Early majority Late majority Laggards
With large scale adoption of alternative data, early
majority firms may face regulatory and talent risks
Late majority firms and laggards may face strategic
risks as they defer or decline the use of alternative
assets
Innovators and early adopters faced data and
model risks as data sets were sourced from
nontraditional, heterogeneous sources
8
Getting to Grips with New Data Sources and Techniques
Investors are increasingly spending on alternative data, but building data science and engineering teams, and the associated
analytics platforms to fully harness such diverse data, remains a significant barrier for all but the largest firms.
Setting a data science/engineering team capable of harnessing alternative data
signals can be both expensive and time consuming:
• A diverse talent pool, typically not found within existing functions, is required
to find, analyse, model and productionise alternative insights
• The technology infrastructure required to integrate alternative and traditional
datasets further increases costs, serving as a pre-requisite for analysis
• Processes that are not optimally engineered (e.g. by inappropriate staffing)
often lead to technical debt, production failures and associated costs
• Alternative data variability makes proactive quality monitoring and
remediation an issue in which significant resources are invested
Market Trends Barriers to Entry
Source: Alternativedata.org Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
AUM 2016 2017 2018E 2019E
<$1bn 35 63 107 158
$1bn - $10bn 213 340 506 764
>$10bn 1,288 1,954 3,104 4,041
Buy Side Avg. 841 1,267 2,005 2,640
Role Entry Level Salary Bonus
Data Analyst $80k-$100k ~25%
Data Scientist $80k-$100k ~40%
Data Scout $70k-$90k ~15%
Data Engineer $80k-$110k ~30%
Head of Data $250k-$1,000k ~100%
Buy side spend on alternative data has increased over the previous 3 years and
is expected to continue to grow:
• Poor active investment performance is driving shift to passive products and
fee compression
• Active investing strategies are starting to require more diverse data to
generate strong alpha and beta predictive signals
• Savings from bundling of data streams are not currently possible due to the
segmentation of the data providers market but are becoming highly
desirable
Minimum Data Team
1 Head of Data
1 Data Scientist
1 Data Engineer
1 Data Scout
3 Data Analysts
Anticipated minimum spend
between $1m-$2m p/a,
dependent on technology
maturity, existing talent and
complexity of ambition
Average buy side spend on datasets ($k)
Total Buy Side Spend on Alternative Data ($m)
Annual Salaries of Associated Talent
9
The Buy Side is Increasing its Focus on Alternative Data
The majority of buy side believe alternative data will positively impact their investment performance. Deloitte has surveyed
over 100 investment managers (IMs) and has observed significant technological, talent and risk challenges that integrating
such diverse data presents.
10%
40%
42%
8%
0% 20% 40% 60%
IM firms’ opinion about the impact of alternative data on investment processes:
Minimal Impact
Some impact, firms that utilise alternative data early may see
some temporary advantages
Alternative data leaders will see sustained advantages in some
asset classes
Alternative data represents a secular change in IM and expertise
in this area will separate winners and losers over the next 5 years
What is your organization’s status for utilizing alternative data?
No part of the strategic plan
Considering it, but no action at this point
Currently using alternative data in a test environment
Using alternative data to augment portfolio management
decisions
Source: c. 110 responses from IM firms from the polls conducted during the Alternative Data Dbrief session on April 24, 2018. Data has been cleaned to
exclude blank and ‘Don’t Know/Not Applicable’ responses
13%
9%
49%
29%
0% 20% 40% 60%
8%
11%
51%
15%
15%
0% 20% 40% 60%
Do you think utilization of alternative data (or not) presents new or different risks
to IM firms?
No, it’s business as usual
Our existing risk mgmt. framework can be adapted in the normal
course of business to handle alternative data
A fresh look at the risks associated with this development is
appropriate
The issues presented by alternative data are significant – our
firms needs to refresh the risk mgmt. framework to assess them
Other 46%
15%
9%
14%
16%
0% 20% 40% 60%
Our organization’s adoption journey for alternative data utilization will likely or
already includes:
Proprietary platforms and processes
Use of alternative data aggregators or brokers to
facilitate acquisition
Use of alternative data crowd-sourced insights supplied
by vendors
Use of insights developed by sell-side analysts from
alternative data
More than on of these
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
10
Data as a Service Infra/Platform as a Service Analytics as a Service
Minimally refined data supplied directly
to customers.
State of the art provides:
• Connected data, via a single point of
access, and the ability to customize
the data feed to a client’s specific
requirements
• Cleansed data with appropriate
imputation and normalised data
concepts and entities
Flexible cloud infrastructure (and
platforms) provisioned with simplified
access to Data.
State of the art provides:
• Simplified access to data, while
improving usage monitoring
• Co-located cloud infrastructure
capable of supporting ultra low
latency algorithmic decisions (and
reducing comms infra costs)
• Access to cloud based elastic/burst
computing capabilities and a variety
of price point storage solutions
Analytics data platform built upon
IaaS/PaaS with pre-built environments
for large scale.
State of the art provides:
• Simplified access to data processing,
providing off-the-rack data platform
solutions that can be readily
accessed
• App store engagement model that
fosters agile fintech ecosystem
Combining AaaS model with a diverse
data science talent pool.
State of the art provides:
• Access to seasoned users of the
AaaS platform
• Access to rare skill sets such as
graph theory, natural language
processing, image processing etc.
able to generate signals from data
outside customer competencies
• Ultra flexible staffing model
minimising overheads for R&D
efforts
Pre-generated signals that are sold to
clients at a premium.
State of the art provides:
• Pre-built signals targeting market
segments and use cases; where
alternative data is used a series of
robust quality checks
• Support for 3rd party vendors (i.e.
those employing AaaS) to sell
signals
• Utilize spare capacity within the
Managed Analytics service
Managed Analytics Service Signal as a Service
Primary Buyers
Sophisticated quants who build their
own analytics and associated platforms
e.g.:
• Large Sell Side institutions
• Quantitatively advanced Hedge
Funds
As per DaaS, with greater focus on
latency dependent trading strategies.
• Large scale seeking ultra low latency
• Mid-Scale unable/unwilling to
develop complex, data-processing
centric, cloud platforms
• Large scale looking to simplify path
to innovation
• Fintechs seeking lean data science
focused operating model
• Mid-Scale unable/unwilling to
develop complex, data-processing
centric, cloud platforms
• Large scale looking to simplify path
to innovation
• Dependent on nature and pricing
strategy of signal
• Smaller Scale Wealth Managers
1 2 3 4 5
Individual investor firms must assess where their comparative advantage exists and opt for a consumption pattern that
maximizes their return on investment from data
Understanding Comparative Advantage in Data
Example Providers
Data Vendors:
• IHS Markit
• Bloomberg
• Refinitiv
• Euroclear (developing)
• Deutsche Borse (developing)
Data Vendors:
• Refinitiv
Other Examples:
• Google, Amazon, Microsoft (without
co-lo)
Data Vendors:
• No comprehensive/deep offerings
within the major players
Other Examples:
• Generic analytics vendors e.g. SAS,
Cloudera, Pivotal
Data Vendors:
• No major players (however Quandl
model is similar)
Other Examples:
• Prof. services; e.g. Deloitte,
Accenture, BCG Gamma etc.
Data Vendors:
• IHS Markit (Research Signals
service ~60 clients)
• Refinitiv (white papers on signals)
• Quandl
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
11
In order to realise maximum value for the data assets a combination of prioritization, enhancement and analysis is required,
together with a sophisticated valuation structure that reflects the value of data assets to the firm
The Information Value Chain
Thorough risk assessment is required throughout the value chain to ensure that the data stored within the
vendor and delivered to customers is regulatory compliant, technologically robust and ethically sound!
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
12
Alternative data carry greater risk than traditional data and these datasets may also introduce newer risk types
Alternative Data Adoption Alters Risk Exposure
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
The potential of new data sources to impact the investment
models and perhaps decision making, if:
• Applicablity: where data is incorporated in the model
incorrectly
• Variability: where the trading signal generated is irregular or
inconsistent under certain conditions
• Integration: where the output of the model is improperly
linked to the trading process
IM firms may face the following risks due to the rise in demand
for data science and advanced analytical skills to process
alternative data:
• Loss of intellectual capital through talent turnover
• Impact on alternative data utilization ability due to delayed
training for existing employees
Firms may face these types of data risks due to immature risk control
processes at data providers
• Data provenance risk: Violation of the terms and conditions from
the data originator while scraping websites
• Accuracy/validity risk: Data may prove unreliable or produce an
inaccurate trading signal
• Material non-public information (MNPI) risk: Receipt of a dataset
containing MNPI could result in risk events
Regulations governing the use of alternative data are still in the
early stages of maturity. There are open questions about
acceptable practices with respect to the use of alternative data.
Furthermore recent regulation introduces significant penalties for
leaks of personally identifiable information could be included in a
dataset received from a source
Data Risk Model Risk
Regulatory Risk Talent Risk
13
Define Value
Simplify
Entities
• Reconcile duplicative data assets and cleanse where appropriate to
drive data efficiency and minimise the risk of divergent and/or
conflicting data
• Link data from different sources together to realise network valuation
benefits
Access
• Document the accesses available per data source
Map
• Map all assets and
associated dictionaries
• Document existing
distribution and storage
approaches
Quality
• Assess data quality within
assets, focusing on Clarity &
Uniqueness, Validity &
Consistency, Timeliness &
Completeness and the
Accuracy, Credibility &
Confidence of the data
sources
Assess
• Third party risks
• Information compliance risks
e.g. GDPR,
Plan
Internal
• Define and share explicit valuation methodology encompassing
collection, usage, storage coverage and governance
External
• Define appropriate pricing strategy for assets (exhaust data)
Maximising the value of a data estate requires a comprehensive mapping of the estate and embedding an appropriate
governance model prioritized by the estimated value of data
Mapping the Data Estate
Market Map & Gap
• Map the data assets to the
current and potential
consumers
• Match the demand for
analytics services with the
investment in data assets
Network Maximisation
• Close gaps in coverage to
realise network benefits of
connected data
• Enhance depth of assets with
proven value
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
14
People
A diverse talent pool is required to both build
and maintain the data engineering and
analytics structure, but also to support an
external signals managed service model:
• Data & Machine Learning Engineers -
expected to both build and maintain the
infrastructure, and productize models
developed within the data science pool
• Data Scientists - including image, NLP and
network specialists, in addition to more
traditional finance quant analysts
• Business Analysts - expected to contain
financial analysts capable of analyzing and
translating business requests into data
science problems
• Data Scouts - to explore new datasets that
appear in the market
Process
Building a frictionless signal factory platform
and the data science talent that supports it
must rely upon robust governance of data,
technology and talent:
• Clear duty segregation to minimize key
person risks and bottlenecks
• Models to support autonomy and agility
• Strong data governance and stewardship to
ensure that data management is scalable
without the need to scale effort
• Fail fast proof-of-concepts
• State of the art cyber security, to both ring-
fence sensitive data and prevent external
attacks
Creating and maintaining a signals factory requires a diverse talent pool as a foundation, well designed processes and a high
end technology stack but reduces costs and allows scaling
Developing a Signals Factory Proposition
Technology
A robust and well maintained technology
platform is critical to a signal factory success
with a partnership with a cloud supplier likely
to be a pre-requisite. Key considerations
include:
• Building in a cloud native fashion, to take
full advantage of elastic storage and
compute capabilities
• Support for a variety of data storage
paradigms (e.g. graph, key value,
columnar, relational etc.)
• Seamless integration of exploration tools,
e.g. Jupyter Notebooks, Tableau etc.
• Model management frameworks, to simplify
the promotion to production of models
(likely to involve containerization)
• Support for diverse hardware including
GPUs, FPGAs etc.
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
15
Valuation of Ingested Data Assets
As a non-depletable and non-degradable asset data represents a unique valuation and backtesting challenge, particularly
pertinent in financial markets where the greater usage of an asset crowds out value.
Qualitative
A qualitative approach is likely required to
support a benchmark to measure/complement
other approaches. Considerations include:
• Cost of integration and storage
• Data quality of signal (degree of imputation
etc.)
• Depth and breadth of signal coverage
• Value of other similar signal assets
• Uniqueness of the dataset/signal
License & Latency
Constraining the number of consumers of high value
data feeds is a useful heuristic to prevent over-
exploitation but few data vendors do it. Consumers
should:
• Negotiate licensing or latency based consumption
constraints to ensure they either receive data that
other investors do not have access to or before the
market in general
• Factoring in the absence of these constraints when
valuing vendors signal data
Profit Sharing
While complex profit sharing mechanisms create
feedback within a pricing system that incentivizes
both vendor and consumer to maximize the value
of a given signal asset, significant complexities
exist within:
• Implementation e.g. the negotiation of the
degree of profit share, exposure in the event of
signal failures
• Monitoring the agreed terms of the share
Value Maximisation
Strategies
1 2
3
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
Backtesting
Solid backtesting program to understand the
alpha from alternative data is needed, but one
needs:
• to account for the usually short history of
alternative data
• to incorporate the statistical uncertainty of the
backtesting results into the price of data
4
This publication has been written in general terms and we recommend that you obtain professional advice before acting or refraining from action on any of the contents of this
publication. Deloitte LLP accepts no liability for any loss occasioned to any person acting or refraining from action as a result of any material in this publication.
Deloitte LLP is a limited liability partnership registered in England and Wales with registered number OC303675 and its registered office at 2 New Street Square, London, EC4A 3BZ,
United Kingdom.
Deloitte LLP is the United Kingdom affiliate of Deloitte NWE LLP, a member firm of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”). DTTL and
each of its member firms are legally separate and independent entities. DTTL and Deloitte NWE LLP do not provide services to clients. Please see www.deloitte.com/about to learn
more about our global network of member firms.
© 2019 Deloitte LLP. All rights reserved.
Deloitte's response to Aviva's Group Data Strategy RFP
Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
The Book of Alternative Data
Use cases
A Guide for Investors, Traders and Risk Managers
Saeed Amen, Cuemacro
Co-authored by Alexander Denev & Saeed Amen
Saeed Amen / @saeedamenfx
Case study: Federal Reserve
Communications Cuemacro
Index
Saeed Amen / @saeedamenfx
Federal Reserve data
• Federal Reserve regularly communicates with markets
• Through speeches, statements, minutes etc.
• Market reacts to this!
• Can read publicly available communications from the web
• Create a dataset of web communications
• Apply NLP to determine the sentiment of individual texts
• Construct an index to give an overall view of FOMC sentiment
• Positive sentiment is hawkish whilst negative sentiment is dovish
Saeed Amen / @saeedamenfx
Fed sentiment vs. UST10Y yield
changes
• Can see a relationship between them, as we would expect
Saeed Amen / @saeedamenfx
Case study: CLS FX flow data
to trade FX spot
Saeed Amen / @saeedamenfx
CLS data
• FX is a more fragmented market than other asset classes
• Vast majority is OTC
• Many different trading venues
• Bilateral trading
• Difficult too find comprehensive FX volume & flow data
• CLS settle most OTC deliverable FX – coverage over 50% of market
• They collect and distribute
• Hourly FX volume data
• Hourly FX flow data for price takers
• 30 minute lag – historical data since later 2012
Saeed Amen / @saeedamenfx
Create fund FX flow index
• Use fund FX flow data – tends to be more directional and positive
correlation with spot
• Create fund FX flow index
• Buy spot when very positive
• Sell spot when very negative
Saeed Amen / @saeedamenfx
Creating daily & hourly flow baskets
• Create trading baskets for daily and hourly flow strategies
• Historically, improves risk adjusted returns vs trend alone
• In-sample (left) and out-of-sample (right)
• Flow outperforms trend out-of-sample
Saeed Amen / @saeedamenfx
Case study: Geospatial
Insights satellite data to
estimate EPS
Saeed Amen / @saeedamenfx
Geospatial Insights: RetailWatch
• It is well known that satellite photography can be used to help
forecast earning per share for retail stocks
• Has been used extensively in US markets (Orbital Insight), but not
as much for European firms
• Uses car counts as a proxy for retail activity
• RetailWatch covers a number of European retailers (both publicly
traded and private companies)
• Relatively new dataset
Saeed Amen / @saeedamenfx
Using car counts to estimate EPS
• Created a car count score based upon the 6 months of activity
related to the earnings period
• Compare against Bloomberg’s consensus and actual EPS
• Present results for Marks & Spencer
Saeed Amen / @saeedamenfx
Case Study: Saving “alpha”
with transaction cost
analysis
Saeed Amen / @saeedamenfx
TCA to “save” alpha
• Big Data and alternative data isn’t just for generating alpha
• It can also be used to “save” alpha, to reduce our transaction
costs
• How much is each LP charging?
• Is one algo better than another?
• tcapy is a Python based library by Cuemacro which does
transaction cost analysis to identify how much traders are paying
for their liquidity
• Needs high frequency market tick data and also trade data from
the client
• Will do a quick demo if there’s time
Saeed Amen / @saeedamenfx
Detailed screen
• Plot a specific currency pair over a period of time, breaking down results by broker, algo etc.
Saeed Amen / @saeedamenfx
Plotting of trades/orders
• We can plot the trades/orders in the web app alongside market data
Saeed Amen / @saeedamenfx
Aggregated timeline of metrics (eg.
slippage)
Saeed Amen / @saeedamenfx
Aggregated metrics by ticker/venue
Saeed Amen / @saeedamenfx
Aggregated distribution of slippage
Saeed Amen / @saeedamenfx
Case study: Bloomberg News
to trade FX spot
@saeedamenfx / Copyright Cuemacro
Unstructured & structured news data
• Unstructured news data
• Read news articles, blogs etc. in their raw text form, then clean and then
directly apply text based analysis to add tagging and other fields
• Very time consuming as we need to handle large amounts of data and
also need to do natural language processing, which is non trivial
• Structured news data
• Vendors processes a large amount of news from numerous sources into a
more manageable dataset for us to explore
• Data more easily accessible with additional fields (eg. tagging topics)
• Traders can concentrate on creating effective trading rules and running
risk, rather than spending that time dealing with cleaning up massive
quantities of unstructured news
@saeedamenfx / Copyright Cuemacro
Automating news filtering
• Using news to trade markets is not new idea
• A trader essentially “filters” news into the “signal and the noise”
• But there is simply too much news for humans to read!
• How can we read news in automated fashion?
• Easier to use structured news datasets
• However, what news filters do we use?
• News related to unemployment?
• Buy/sell signals?
0
20
40
60
80
100
120
140
200
250
300
350
400
450
500
550
600
650
700
2002 2006 2010 2014
US Jobless Claims NI UNEMPLOY BBG count (smoothed)
Claims BBG
@saeedamenfx / Copyright Cuemacro
General approach to news filtering
• Several approaches
• Pick words or sectors which are relatively generic (and also intuitive) like “job cuts”
• The approach to this “picking” depends on our data source, each one is different
• Fit the best words according to a backtest!
• “Fitting” words which are not obviously related is data mining
• Resulting model will likely be unstable when run live
• Also caution when using hindsight to pick words
• For example, “Greek debt crisis” was obvious
• But only after the event!
• NT<GO> is nice way to visualise news
• Bloomberg has machine readable news
• Use natural language processing
0
100
200
300
400
500
600
700
2008 2010 2012 2014
Greek Debt Crisis keywords
BBG count
@saeedamenfx / Copyright Cuemacro
Specific steps for text datasets
• We can formulate a few generic steps that are used when dealing with a text based
dataset for trading purposes
• Raw data collection – web scraping and accessing internal databases
• Cleaning dataset – removing HTML tags and invalid observations
• Structuring dataset – adding tags (eg. sentiment) and compress into single database record
• Filtering dataset – choose most relevant entities/topics to prune search space
• Create an indicator – aggregate records to create indicators
• Apply a trading rule to the indicator – how to convert into buy/sell signals directly or added to other
trading factors (eg. carry)
@saeedamenfx / Copyright Cuemacro
Using Bloomberg News dataset
• We shall use a dataset consisting of Bloomberg News articles from 2009-2017
• It is a structured dataset, which saves time (eg. we avoid the time consuming raw
data collection step)
• Bloomberg News is written in a consistent style, so easier to process than general
web content
• Each news article has a number of fields tagged including:
• Timestamp of news article
• Title of news article
• Text body of the news article
• Tagging for tradable tickers related to the news (eg. %EUR for EURUSD)
• Tagging for the topic related to the news (eg. FED for articles related to Federal Reserve)
• Company specific news also has additional news analytics fields such as sentiment,
readership statistics etc.
• Topics we choose will depend on underlying dataset
@saeedamenfx / Copyright Cuemacro
Generate news signals for FX
• We want to use news to inform FX trading strategies
• Want to develop longer term strategies (ie. not high frequency headline trading)
• Hence, focus will be on macro specific news to trade FX in particular
• Tickers: %EUR, %GBP, %AUD, %NZD, %USD, %CAD, %NOK, %SEK and %JPY
• Topics: FED and ECB
• Could have chosen many other relevant macro topics
• Helps us prune the search space to most relevant news
• Steps we shall do
• Clean body text slightly (eg. remove start of article)
• Ignore very short articles as difficult to gauge sentiment
• Apply sentiment analysis for each article (shall use open source Python based libraries)
• Aggregate data into daily observations (careful about holidays!)
• Create indices for each currency/topic (Z scores for comparability)
• Also generate a news volume score (Z score for comparability)
@saeedamenfx / Copyright Cuemacro
Currency pair sentiment score
• Currency pair score = base score – terms score
• When eg. USD/JPY score is positive buy, otherwise sell
@saeedamenfx / Copyright Cuemacro
News trading rule by currency pair
• Present risk adjusted returns and compare to a generic trend following strategy
• Apply vol targeting in each instance
• News based trading role outperforms trend significantly in our sample
@saeedamenfx / Copyright Cuemacro
News trading rule as basket
• Create news and trend baskets
• News basket heavily outperforms trend basket
@saeedamenfx / Copyright Cuemacro
What about news volume?
• News volume on a currency pair is heavily correlated with its implied volatility, which
seems intuitive!
• T statistics show a statistically significant relationship in nearly every currency pair in
our sample
• News volume can be used to help us model FX volatility – is FX volatility in line with
what we could expect based on newsflow?
@saeedamenfx / Copyright Cuemacro
Scheduled events
• Before scheduled events, FX vol market makers will mark up vol curve
• Known as event volatility add-on
• LHS show EUR/USD ON vol on Fed days, and RHS for ECB days (ignores all other days)
• Have model for estimating add-on (assumes only one big event per day)
• Typically, realized underperforms on these days.. Sell vol*!
• *within reason…
@saeedamenfx / Copyright Cuemacro
News for scheduled events
• Can we use news around scheduled events, eg. FED and ECB topics in our case to
inform where the add-on is
• And also to give us an idea of where realized vol would be subsequently? Gamma
traders are taking a view on where implied – realized will be
• There does seem to be a relationship between EUR/USD vol and news before FOMC
and ECB meetings
@saeedamenfx / Copyright Cuemacro
EUR/USD vol and news on FOMC days
• Showing news volume versus add-on, implied and realized ON in EUR/USD on FOMC
days
@saeedamenfx / Copyright Cuemacro
EUR/USD vol and news on ECB days
• Showing news volume versus add-on, implied and realized ON in EUR/USD on ECB
days
@saeedamenfx / Copyright Cuemacro
Conclusion
• Alternative data primer, introducing the topic
• Talked about where to find data
• Showed examples of how to generate (and save!) alpha using
alternative data examining
• CLS FX flow data to generate FX trading signals
• Text based datasets for Fed communications
• Geospatial Insights satellite imagery to estimate EPS
• tcapy to reduce trading costs for FX
Saeed Amen / @saeedamenfx
Any questions?
• Drop me an e-mail at saeed@cuemacro.com, ring me or tweet to @saeedamenfx (or even talk
to me now, the old school way!)
Saeed Amen / @saeedamenfx
10
Demos, slides and video available on QuAcademy
Go to www.qu.academy
10
11
Instructions for the Lab:
1. Go to https://academy.qusandbox.com/#/register and register using the code:
"QUSUMMERSCHOOL"
12
See you in FALL 2020!
Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
13

Más contenido relacionado

La actualidad más candente

Ml and AI for financial professionals
Ml and AI for financial professionalsMl and AI for financial professionals
Ml and AI for financial professionalsQuantUniversity
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa polandQuantUniversity
 
Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiData Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiShivam Bansal
 
Building Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiBuilding Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiShwet Kamal Mishra
 
Ml master class northeastern university
Ml master class   northeastern universityMl master class   northeastern university
Ml master class northeastern universityQuantUniversity
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsSpotle.ai
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model RiskQuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...KTN
 
Machine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementMachine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementQuantUniversity
 
10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model GovernanceQuantUniversity
 
Fraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph LearningFraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph LearningTigerGraph
 
AI: A risk and way to manage risk
AI: A risk and way to manage riskAI: A risk and way to manage risk
AI: A risk and way to manage riskKaran Sachdeva
 
Big data, Machine learning and the Auditor
Big data, Machine learning and the AuditorBig data, Machine learning and the Auditor
Big data, Machine learning and the AuditorBharath Rao
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 

La actualidad más candente (20)

Ml and AI for financial professionals
Ml and AI for financial professionalsMl and AI for financial professionals
Ml and AI for financial professionals
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa poland
 
Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiData Science Pipelines in Python using Luigi
Data Science Pipelines in Python using Luigi
 
Building Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiBuilding Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using Luigi
 
Ml master class northeastern university
Ml master class   northeastern universityMl master class   northeastern university
Ml master class northeastern university
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and Analytics
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
Qwafafew meeting 5
Qwafafew meeting 5Qwafafew meeting 5
Qwafafew meeting 5
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
Robotics & Artificial (RAI) Intelligence webinar: Law & Regulation for RAI In...
 
NLP in Finance
NLP in FinanceNLP in Finance
NLP in Finance
 
Machine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementMachine Learning and AI in Risk Management
Machine Learning and AI in Risk Management
 
10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance10 Key Considerations for AI/ML Model Governance
10 Key Considerations for AI/ML Model Governance
 
Fraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph LearningFraud Detection and Compliance with Graph Learning
Fraud Detection and Compliance with Graph Learning
 
AI: A risk and way to manage risk
AI: A risk and way to manage riskAI: A risk and way to manage risk
AI: A risk and way to manage risk
 
Big data, Machine learning and the Auditor
Big data, Machine learning and the AuditorBig data, Machine learning and the Auditor
Big data, Machine learning and the Auditor
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 

Similar a Frontiers in Alternative Data : Techniques and Use Cases

Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graphAlan Morrison
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyAmit Parija
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesaziksa
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaData Con LA
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014MassTLC
 
Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?SAS Canada
 
Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...
Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...
Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...ShadiTraboulsi1
 
Big data
Big dataBig data
Big dataRiya
 
Leading Change in Financial Services
Leading Change in Financial ServicesLeading Change in Financial Services
Leading Change in Financial ServicesNICSA
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperExperian
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)Denodo
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)Denodo
 
Hedge Fund case study solution - Credit default swaps execution system and Gr...
Hedge Fund case study solution - Credit default swaps execution system and Gr...Hedge Fund case study solution - Credit default swaps execution system and Gr...
Hedge Fund case study solution - Credit default swaps execution system and Gr...Naveen Kumar
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudPerficient, Inc.
 
Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015Mona M. Vernon
 

Similar a Frontiers in Alternative Data : Techniques and Use Cases (20)

Data-centric design and the knowledge graph
Data-centric design and the knowledge graphData-centric design and the knowledge graph
Data-centric design and the knowledge graph
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economy
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
 
Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...
Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...
Capturing_the_data_and_advanced_analytics_opportunity_in_capital_markets_2017...
 
Big data
Big dataBig data
Big data
 
Leading Change in Financial Services
Leading Change in Financial ServicesLeading Change in Financial Services
Leading Change in Financial Services
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White Paper
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)A Successful Data Strategy for Insurers in Volatile Times (EMEA)
A Successful Data Strategy for Insurers in Volatile Times (EMEA)
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
 
Hedge Fund case study solution - Credit default swaps execution system and Gr...
Hedge Fund case study solution - Credit default swaps execution system and Gr...Hedge Fund case study solution - Credit default swaps execution system and Gr...
Hedge Fund case study solution - Credit default swaps execution system and Gr...
 
How to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics CloudHow to Capitalize on Big Data with Oracle Analytics Cloud
How to Capitalize on Big Data with Oracle Analytics Cloud
 
Pres_Big Data for Finance_vsaini
Pres_Big Data for Finance_vsainiPres_Big Data for Finance_vsaini
Pres_Big Data for Finance_vsaini
 
Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015Blocks & Bots - Digital Summit Harvard Business School 2015
Blocks & Bots - Digital Summit Harvard Business School 2015
 

Más de QuantUniversity

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfQuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiserQuantUniversity
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA DallasQuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementQuantUniversity
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio AllocationQuantUniversity
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset BenchmarksQuantUniversity
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning InterpretabilityQuantUniversity
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in ActionQuantUniversity
 
Qu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial MarketsQu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial MarketsQuantUniversity
 
Fintech in the Post-Covid Age
Fintech in the Post-Covid AgeFintech in the Post-Covid Age
Fintech in the Post-Covid AgeQuantUniversity
 
Master Class: GANS with Applications in Synthetic Data Generation
Master Class:   GANS with  Applications in  Synthetic Data GenerationMaster Class:   GANS with  Applications in  Synthetic Data Generation
Master Class: GANS with Applications in Synthetic Data GenerationQuantUniversity
 

Más de QuantUniversity (20)

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
 
The API Jungle
The API JungleThe API Jungle
The API Jungle
 
Explainable AI Workshop
Explainable AI WorkshopExplainable AI Workshop
Explainable AI Workshop
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
 
Qu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial MarketsQu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial Markets
 
Fintech in the Post-Covid Age
Fintech in the Post-Covid AgeFintech in the Post-Covid Age
Fintech in the Post-Covid Age
 
Master Class: GANS with Applications in Synthetic Data Generation
Master Class:   GANS with  Applications in  Synthetic Data GenerationMaster Class:   GANS with  Applications in  Synthetic Data Generation
Master Class: GANS with Applications in Synthetic Data Generation
 
Qwafafew meeting 4
Qwafafew meeting 4Qwafafew meeting 4
Qwafafew meeting 4
 

Último

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 

Último (20)

Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 

Frontiers in Alternative Data : Techniques and Use Cases

  • 1. Qu Speaker Series Frontiers in Alternative Data : Techniques and Use Cases 2020 Copyright QuantUniversity LLC. Hosted By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.qu.academy 09/22/2020 Online https://quspeakerseries9.splashthat.com/
  • 2. 2 QuantUniversity • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Building a platform for AI and Machine Learning Exploration and Experimentation
  • 3. For registration information, go to https://QuSummerSchool.splashthat.com 3
  • 4. 4 For registration information, go to https://QuFallSchool.splashthat.com
  • 6. 6 Frontiers in Alternative Data : Techniques and Use Cases
  • 7. 7
  • 9. 9 The Book of Alternative Data
  • 10. 1 The Book of Alternative Data S E P T E M B E R 2 0 2 0
  • 11. 2 The Book of Alternative Data Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019 • Co-authored by Alexander Denev and Saeed Amen • The Book of Alternative Data (on Wiley) • Hardback available on Amazon USA now (elsewhere in Sep) • Kindle available on Amazon worldwide • Presentation is based on the book
  • 12. 3 • Common properties • Less commonly used by market participants • Tends to be more expensive • Often outside financial markets (is tick data “alternative”?) • Shorter history • More challenging to use • “Exhaust data” a byproduct of other processes • Digital footprint from individual and corporate activity • Resulted in a rapid rise in the number of alternative datasets • Can provide an additional revenue stream for those who collect “exhaust data” • Not all alternative data is necessarily Big Data (but it can be!) Saeed Amen / @saeedamenfx What is alternative data?
  • 13. 4 Alternative data & investments case studies Several clear case studies have emerged demonstrating the value of analytics in combination with alternative data applied to the investment process ON-LINE PRICE = INFLATION Global FSI Firm employs technology to track prices of 5 million products on-line to understand price shocks and monitor shifts in inflation across 70 countries1 MOBILE FOOT TRAFFIC = ECONOMY Hedge Funds using location data pulled from mobile devices to predict outlook on economy and REIT values4 SOCIAL+ SEARCH = EARNINGS $90B AUM Global Asset Manager mines search engine data combined with social- media data to predict results of corporate events like quarterly earnings3 SATELLITE + SHIPS = MISPRICED SECURITY Hedge fund using satellite intelligence on ships and tank levels to identify upcoming impact to oil producers and commodity prices5 WEB + TWITTER = MARKET MOVINGEVENT Data provider using 300M Websites, 150M Twitter feeds in combination with analyst presentations and FactSet reports to measure rise up media food chain (e.g. blogs to newswire) to highlight potentially market moving events6 APP + CREDIT CARD = PERFORMANCE Hedge Fund looks at combination of alternative data including credit card transactions, geo-location, and app downloads to analyze burger chain performance2 1.Innovative Asset Managers, Eagle Alpha 2.“Foursquare Wants To Be The Nielsen Of Measuring The Real World,” Research Briefs, CBInsights, June 8, 2016. 3.Simone Foxman and Taylor Hall, “Acadian to Use Microsoft's Big Data Technology to Help Make Bets,” Bloomberg, March 7, 2017. 4.Rob Matheson, “Measuring the Economy With Location Data,” MIT News, March 27, 2018. 5.Fred R. Bleakley, “CargoMetrics Cracks the Code on Shipping Data,” Institutional Investor, February 04, 2016. 6.Accern website AUM of UK-Based Man Group’s AI/Analytics driven AHL Dimension fund up 5x over 3 years Accelerating AI Adoption Deployed AI (Artificial Intelligence) techniques to four additional funds managing $12.3B Varied Data Sources Processes terabytes of data ranging from weather forecasts to container ship movements Increasing Valuation Man Group’s stock price has increased by 55% from January to October 2017 AI Driving Profit Artificial intelligence contributed roughly 50% of 2015 profits for the AHL Dimension Fund Source: Adam Satariano, “The Massive Hedge Fund Betting on AI,” Bloomberg, September 27, 2017.
  • 14. 5 • Volume (increasing) – lots of data • Variety (increasing) – not just numerical data, can be text, image, video etc. • Velocity (increasing) – speed that data is being generated • Variability (increasing) – inconsistencies in the data • Veracity (decreasing) – difficult to tell if accurate (e.g. social media) • Value (increasing) – business value of the data Saeed Amen / @saeedamenfx The Vs of Big Data
  • 15. 6 Quantitative investment strategies and vendor solutions with alpha generation capabilities are becoming critical component to the return of the buy and sell side’s ROE to pre crisis levels Addressing Market Challenges Systematic/quant Investors, typically building their own analytics Who: • Hedge Funds • Sophisticated Buy Side Firms Key Challenges: • Access to good quality raw data or to curated alternative data • Maintaining access to cutting edge technology and algorithms Customer Needs: • Co-location of analytics and data • Simplified access to data and computation • Simplified, but bespoke, data access Sophisticated Quants Most intuitive solutions needed. Limited technology and programming capability Who: • Smaller Sell Side (DSIBs) • Small Buy Side + Family offices Key Challenges: • Reducing technology costs associated with efficient research tools • Building/maintaining an edge against passive benchmark returns Customer Needs: • Simplified access to data and computation • Curated Signals • Sophisticated, but low maintenance/build cost analytics platforms • Elastic access to analytics and associated data science talent Traditional Investors Interested in derived analytics and more intuitive solutions Who: • Large Sell Side (GSIBs) • Traditional Buy Side Firms Key Challenges: • Reducing technology costs associated with efficient research tools • Retention and expansion of innovation talent Customer Needs: • Simplified access to data and computation • Curated Signals • Simplified, but bespoke, data access Traditional Quants Sophisticated but ultra small scale with a focus on highly scalable business models Who: • Alternative Data Providers • Signal Factories Key Challenges: • Simplified access to data • Ability and agility to scale • Support of cutting edge algorithms and alternative data sets Customer Needs: • Simplified access to data and computation • Simplified, but bespoke, data access • Sophisticated, but low maintenance/build cost analytics platforms • Marketplace creation Fintechs Sources: Macro trends database, January 2019; The Shift From Active to Passive Investing: Potential Risks to Financial Stability?, Federal Reserve System 2019; Deloitte Global Cost Survey, 2019; Alternative data for investment decisions: Today’s innovation could be tomorrow’s requirement. Deloitte Centre for Financial Services, 2018 Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
  • 16. 7 Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019 The Rising Adoption of Alternative Data Hedge funds were the innovators in this space, but the technology is reaching a tipping point and may see exponential growth over the next year Alternative data adoption curve – investment management constituents by phase Largely hedge funds aggressively seeking information advantage Likely constituents Aggressive long-only managers and PE firms Tech savvy large complex IM firms Traditional large complex IM firms Firms reluctant to embrace new approaches Innovators Early adopters Early majority Late majority Laggards With large scale adoption of alternative data, early majority firms may face regulatory and talent risks Late majority firms and laggards may face strategic risks as they defer or decline the use of alternative assets Innovators and early adopters faced data and model risks as data sets were sourced from nontraditional, heterogeneous sources
  • 17. 8 Getting to Grips with New Data Sources and Techniques Investors are increasingly spending on alternative data, but building data science and engineering teams, and the associated analytics platforms to fully harness such diverse data, remains a significant barrier for all but the largest firms. Setting a data science/engineering team capable of harnessing alternative data signals can be both expensive and time consuming: • A diverse talent pool, typically not found within existing functions, is required to find, analyse, model and productionise alternative insights • The technology infrastructure required to integrate alternative and traditional datasets further increases costs, serving as a pre-requisite for analysis • Processes that are not optimally engineered (e.g. by inappropriate staffing) often lead to technical debt, production failures and associated costs • Alternative data variability makes proactive quality monitoring and remediation an issue in which significant resources are invested Market Trends Barriers to Entry Source: Alternativedata.org Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019 AUM 2016 2017 2018E 2019E <$1bn 35 63 107 158 $1bn - $10bn 213 340 506 764 >$10bn 1,288 1,954 3,104 4,041 Buy Side Avg. 841 1,267 2,005 2,640 Role Entry Level Salary Bonus Data Analyst $80k-$100k ~25% Data Scientist $80k-$100k ~40% Data Scout $70k-$90k ~15% Data Engineer $80k-$110k ~30% Head of Data $250k-$1,000k ~100% Buy side spend on alternative data has increased over the previous 3 years and is expected to continue to grow: • Poor active investment performance is driving shift to passive products and fee compression • Active investing strategies are starting to require more diverse data to generate strong alpha and beta predictive signals • Savings from bundling of data streams are not currently possible due to the segmentation of the data providers market but are becoming highly desirable Minimum Data Team 1 Head of Data 1 Data Scientist 1 Data Engineer 1 Data Scout 3 Data Analysts Anticipated minimum spend between $1m-$2m p/a, dependent on technology maturity, existing talent and complexity of ambition Average buy side spend on datasets ($k) Total Buy Side Spend on Alternative Data ($m) Annual Salaries of Associated Talent
  • 18. 9 The Buy Side is Increasing its Focus on Alternative Data The majority of buy side believe alternative data will positively impact their investment performance. Deloitte has surveyed over 100 investment managers (IMs) and has observed significant technological, talent and risk challenges that integrating such diverse data presents. 10% 40% 42% 8% 0% 20% 40% 60% IM firms’ opinion about the impact of alternative data on investment processes: Minimal Impact Some impact, firms that utilise alternative data early may see some temporary advantages Alternative data leaders will see sustained advantages in some asset classes Alternative data represents a secular change in IM and expertise in this area will separate winners and losers over the next 5 years What is your organization’s status for utilizing alternative data? No part of the strategic plan Considering it, but no action at this point Currently using alternative data in a test environment Using alternative data to augment portfolio management decisions Source: c. 110 responses from IM firms from the polls conducted during the Alternative Data Dbrief session on April 24, 2018. Data has been cleaned to exclude blank and ‘Don’t Know/Not Applicable’ responses 13% 9% 49% 29% 0% 20% 40% 60% 8% 11% 51% 15% 15% 0% 20% 40% 60% Do you think utilization of alternative data (or not) presents new or different risks to IM firms? No, it’s business as usual Our existing risk mgmt. framework can be adapted in the normal course of business to handle alternative data A fresh look at the risks associated with this development is appropriate The issues presented by alternative data are significant – our firms needs to refresh the risk mgmt. framework to assess them Other 46% 15% 9% 14% 16% 0% 20% 40% 60% Our organization’s adoption journey for alternative data utilization will likely or already includes: Proprietary platforms and processes Use of alternative data aggregators or brokers to facilitate acquisition Use of alternative data crowd-sourced insights supplied by vendors Use of insights developed by sell-side analysts from alternative data More than on of these Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
  • 19. 10 Data as a Service Infra/Platform as a Service Analytics as a Service Minimally refined data supplied directly to customers. State of the art provides: • Connected data, via a single point of access, and the ability to customize the data feed to a client’s specific requirements • Cleansed data with appropriate imputation and normalised data concepts and entities Flexible cloud infrastructure (and platforms) provisioned with simplified access to Data. State of the art provides: • Simplified access to data, while improving usage monitoring • Co-located cloud infrastructure capable of supporting ultra low latency algorithmic decisions (and reducing comms infra costs) • Access to cloud based elastic/burst computing capabilities and a variety of price point storage solutions Analytics data platform built upon IaaS/PaaS with pre-built environments for large scale. State of the art provides: • Simplified access to data processing, providing off-the-rack data platform solutions that can be readily accessed • App store engagement model that fosters agile fintech ecosystem Combining AaaS model with a diverse data science talent pool. State of the art provides: • Access to seasoned users of the AaaS platform • Access to rare skill sets such as graph theory, natural language processing, image processing etc. able to generate signals from data outside customer competencies • Ultra flexible staffing model minimising overheads for R&D efforts Pre-generated signals that are sold to clients at a premium. State of the art provides: • Pre-built signals targeting market segments and use cases; where alternative data is used a series of robust quality checks • Support for 3rd party vendors (i.e. those employing AaaS) to sell signals • Utilize spare capacity within the Managed Analytics service Managed Analytics Service Signal as a Service Primary Buyers Sophisticated quants who build their own analytics and associated platforms e.g.: • Large Sell Side institutions • Quantitatively advanced Hedge Funds As per DaaS, with greater focus on latency dependent trading strategies. • Large scale seeking ultra low latency • Mid-Scale unable/unwilling to develop complex, data-processing centric, cloud platforms • Large scale looking to simplify path to innovation • Fintechs seeking lean data science focused operating model • Mid-Scale unable/unwilling to develop complex, data-processing centric, cloud platforms • Large scale looking to simplify path to innovation • Dependent on nature and pricing strategy of signal • Smaller Scale Wealth Managers 1 2 3 4 5 Individual investor firms must assess where their comparative advantage exists and opt for a consumption pattern that maximizes their return on investment from data Understanding Comparative Advantage in Data Example Providers Data Vendors: • IHS Markit • Bloomberg • Refinitiv • Euroclear (developing) • Deutsche Borse (developing) Data Vendors: • Refinitiv Other Examples: • Google, Amazon, Microsoft (without co-lo) Data Vendors: • No comprehensive/deep offerings within the major players Other Examples: • Generic analytics vendors e.g. SAS, Cloudera, Pivotal Data Vendors: • No major players (however Quandl model is similar) Other Examples: • Prof. services; e.g. Deloitte, Accenture, BCG Gamma etc. Data Vendors: • IHS Markit (Research Signals service ~60 clients) • Refinitiv (white papers on signals) • Quandl Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
  • 20. 11 In order to realise maximum value for the data assets a combination of prioritization, enhancement and analysis is required, together with a sophisticated valuation structure that reflects the value of data assets to the firm The Information Value Chain Thorough risk assessment is required throughout the value chain to ensure that the data stored within the vendor and delivered to customers is regulatory compliant, technologically robust and ethically sound! Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
  • 21. 12 Alternative data carry greater risk than traditional data and these datasets may also introduce newer risk types Alternative Data Adoption Alters Risk Exposure Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019 The potential of new data sources to impact the investment models and perhaps decision making, if: • Applicablity: where data is incorporated in the model incorrectly • Variability: where the trading signal generated is irregular or inconsistent under certain conditions • Integration: where the output of the model is improperly linked to the trading process IM firms may face the following risks due to the rise in demand for data science and advanced analytical skills to process alternative data: • Loss of intellectual capital through talent turnover • Impact on alternative data utilization ability due to delayed training for existing employees Firms may face these types of data risks due to immature risk control processes at data providers • Data provenance risk: Violation of the terms and conditions from the data originator while scraping websites • Accuracy/validity risk: Data may prove unreliable or produce an inaccurate trading signal • Material non-public information (MNPI) risk: Receipt of a dataset containing MNPI could result in risk events Regulations governing the use of alternative data are still in the early stages of maturity. There are open questions about acceptable practices with respect to the use of alternative data. Furthermore recent regulation introduces significant penalties for leaks of personally identifiable information could be included in a dataset received from a source Data Risk Model Risk Regulatory Risk Talent Risk
  • 22. 13 Define Value Simplify Entities • Reconcile duplicative data assets and cleanse where appropriate to drive data efficiency and minimise the risk of divergent and/or conflicting data • Link data from different sources together to realise network valuation benefits Access • Document the accesses available per data source Map • Map all assets and associated dictionaries • Document existing distribution and storage approaches Quality • Assess data quality within assets, focusing on Clarity & Uniqueness, Validity & Consistency, Timeliness & Completeness and the Accuracy, Credibility & Confidence of the data sources Assess • Third party risks • Information compliance risks e.g. GDPR, Plan Internal • Define and share explicit valuation methodology encompassing collection, usage, storage coverage and governance External • Define appropriate pricing strategy for assets (exhaust data) Maximising the value of a data estate requires a comprehensive mapping of the estate and embedding an appropriate governance model prioritized by the estimated value of data Mapping the Data Estate Market Map & Gap • Map the data assets to the current and potential consumers • Match the demand for analytics services with the investment in data assets Network Maximisation • Close gaps in coverage to realise network benefits of connected data • Enhance depth of assets with proven value Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
  • 23. 14 People A diverse talent pool is required to both build and maintain the data engineering and analytics structure, but also to support an external signals managed service model: • Data & Machine Learning Engineers - expected to both build and maintain the infrastructure, and productize models developed within the data science pool • Data Scientists - including image, NLP and network specialists, in addition to more traditional finance quant analysts • Business Analysts - expected to contain financial analysts capable of analyzing and translating business requests into data science problems • Data Scouts - to explore new datasets that appear in the market Process Building a frictionless signal factory platform and the data science talent that supports it must rely upon robust governance of data, technology and talent: • Clear duty segregation to minimize key person risks and bottlenecks • Models to support autonomy and agility • Strong data governance and stewardship to ensure that data management is scalable without the need to scale effort • Fail fast proof-of-concepts • State of the art cyber security, to both ring- fence sensitive data and prevent external attacks Creating and maintaining a signals factory requires a diverse talent pool as a foundation, well designed processes and a high end technology stack but reduces costs and allows scaling Developing a Signals Factory Proposition Technology A robust and well maintained technology platform is critical to a signal factory success with a partnership with a cloud supplier likely to be a pre-requisite. Key considerations include: • Building in a cloud native fashion, to take full advantage of elastic storage and compute capabilities • Support for a variety of data storage paradigms (e.g. graph, key value, columnar, relational etc.) • Seamless integration of exploration tools, e.g. Jupyter Notebooks, Tableau etc. • Model management frameworks, to simplify the promotion to production of models (likely to involve containerization) • Support for diverse hardware including GPUs, FPGAs etc. Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
  • 24. 15 Valuation of Ingested Data Assets As a non-depletable and non-degradable asset data represents a unique valuation and backtesting challenge, particularly pertinent in financial markets where the greater usage of an asset crowds out value. Qualitative A qualitative approach is likely required to support a benchmark to measure/complement other approaches. Considerations include: • Cost of integration and storage • Data quality of signal (degree of imputation etc.) • Depth and breadth of signal coverage • Value of other similar signal assets • Uniqueness of the dataset/signal License & Latency Constraining the number of consumers of high value data feeds is a useful heuristic to prevent over- exploitation but few data vendors do it. Consumers should: • Negotiate licensing or latency based consumption constraints to ensure they either receive data that other investors do not have access to or before the market in general • Factoring in the absence of these constraints when valuing vendors signal data Profit Sharing While complex profit sharing mechanisms create feedback within a pricing system that incentivizes both vendor and consumer to maximize the value of a given signal asset, significant complexities exist within: • Implementation e.g. the negotiation of the degree of profit share, exposure in the event of signal failures • Monitoring the agreed terms of the share Value Maximisation Strategies 1 2 3 Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019 Backtesting Solid backtesting program to understand the alpha from alternative data is needed, but one needs: • to account for the usually short history of alternative data • to incorporate the statistical uncertainty of the backtesting results into the price of data 4
  • 25. This publication has been written in general terms and we recommend that you obtain professional advice before acting or refraining from action on any of the contents of this publication. Deloitte LLP accepts no liability for any loss occasioned to any person acting or refraining from action as a result of any material in this publication. Deloitte LLP is a limited liability partnership registered in England and Wales with registered number OC303675 and its registered office at 2 New Street Square, London, EC4A 3BZ, United Kingdom. Deloitte LLP is the United Kingdom affiliate of Deloitte NWE LLP, a member firm of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”). DTTL and each of its member firms are legally separate and independent entities. DTTL and Deloitte NWE LLP do not provide services to clients. Please see www.deloitte.com/about to learn more about our global network of member firms. © 2019 Deloitte LLP. All rights reserved. Deloitte's response to Aviva's Group Data Strategy RFP Maximising Data Value, a Vendor Perspective | Deloitte LLP 2019
  • 26. The Book of Alternative Data Use cases A Guide for Investors, Traders and Risk Managers Saeed Amen, Cuemacro Co-authored by Alexander Denev & Saeed Amen Saeed Amen / @saeedamenfx
  • 27. Case study: Federal Reserve Communications Cuemacro Index Saeed Amen / @saeedamenfx
  • 28. Federal Reserve data • Federal Reserve regularly communicates with markets • Through speeches, statements, minutes etc. • Market reacts to this! • Can read publicly available communications from the web • Create a dataset of web communications • Apply NLP to determine the sentiment of individual texts • Construct an index to give an overall view of FOMC sentiment • Positive sentiment is hawkish whilst negative sentiment is dovish Saeed Amen / @saeedamenfx
  • 29. Fed sentiment vs. UST10Y yield changes • Can see a relationship between them, as we would expect Saeed Amen / @saeedamenfx
  • 30. Case study: CLS FX flow data to trade FX spot Saeed Amen / @saeedamenfx
  • 31. CLS data • FX is a more fragmented market than other asset classes • Vast majority is OTC • Many different trading venues • Bilateral trading • Difficult too find comprehensive FX volume & flow data • CLS settle most OTC deliverable FX – coverage over 50% of market • They collect and distribute • Hourly FX volume data • Hourly FX flow data for price takers • 30 minute lag – historical data since later 2012 Saeed Amen / @saeedamenfx
  • 32. Create fund FX flow index • Use fund FX flow data – tends to be more directional and positive correlation with spot • Create fund FX flow index • Buy spot when very positive • Sell spot when very negative Saeed Amen / @saeedamenfx
  • 33. Creating daily & hourly flow baskets • Create trading baskets for daily and hourly flow strategies • Historically, improves risk adjusted returns vs trend alone • In-sample (left) and out-of-sample (right) • Flow outperforms trend out-of-sample Saeed Amen / @saeedamenfx
  • 34. Case study: Geospatial Insights satellite data to estimate EPS Saeed Amen / @saeedamenfx
  • 35. Geospatial Insights: RetailWatch • It is well known that satellite photography can be used to help forecast earning per share for retail stocks • Has been used extensively in US markets (Orbital Insight), but not as much for European firms • Uses car counts as a proxy for retail activity • RetailWatch covers a number of European retailers (both publicly traded and private companies) • Relatively new dataset Saeed Amen / @saeedamenfx
  • 36. Using car counts to estimate EPS • Created a car count score based upon the 6 months of activity related to the earnings period • Compare against Bloomberg’s consensus and actual EPS • Present results for Marks & Spencer Saeed Amen / @saeedamenfx
  • 37. Case Study: Saving “alpha” with transaction cost analysis Saeed Amen / @saeedamenfx
  • 38. TCA to “save” alpha • Big Data and alternative data isn’t just for generating alpha • It can also be used to “save” alpha, to reduce our transaction costs • How much is each LP charging? • Is one algo better than another? • tcapy is a Python based library by Cuemacro which does transaction cost analysis to identify how much traders are paying for their liquidity • Needs high frequency market tick data and also trade data from the client • Will do a quick demo if there’s time Saeed Amen / @saeedamenfx
  • 39. Detailed screen • Plot a specific currency pair over a period of time, breaking down results by broker, algo etc. Saeed Amen / @saeedamenfx
  • 40. Plotting of trades/orders • We can plot the trades/orders in the web app alongside market data Saeed Amen / @saeedamenfx
  • 41. Aggregated timeline of metrics (eg. slippage) Saeed Amen / @saeedamenfx
  • 42. Aggregated metrics by ticker/venue Saeed Amen / @saeedamenfx
  • 43. Aggregated distribution of slippage Saeed Amen / @saeedamenfx
  • 44. Case study: Bloomberg News to trade FX spot @saeedamenfx / Copyright Cuemacro
  • 45. Unstructured & structured news data • Unstructured news data • Read news articles, blogs etc. in their raw text form, then clean and then directly apply text based analysis to add tagging and other fields • Very time consuming as we need to handle large amounts of data and also need to do natural language processing, which is non trivial • Structured news data • Vendors processes a large amount of news from numerous sources into a more manageable dataset for us to explore • Data more easily accessible with additional fields (eg. tagging topics) • Traders can concentrate on creating effective trading rules and running risk, rather than spending that time dealing with cleaning up massive quantities of unstructured news @saeedamenfx / Copyright Cuemacro
  • 46. Automating news filtering • Using news to trade markets is not new idea • A trader essentially “filters” news into the “signal and the noise” • But there is simply too much news for humans to read! • How can we read news in automated fashion? • Easier to use structured news datasets • However, what news filters do we use? • News related to unemployment? • Buy/sell signals? 0 20 40 60 80 100 120 140 200 250 300 350 400 450 500 550 600 650 700 2002 2006 2010 2014 US Jobless Claims NI UNEMPLOY BBG count (smoothed) Claims BBG @saeedamenfx / Copyright Cuemacro
  • 47. General approach to news filtering • Several approaches • Pick words or sectors which are relatively generic (and also intuitive) like “job cuts” • The approach to this “picking” depends on our data source, each one is different • Fit the best words according to a backtest! • “Fitting” words which are not obviously related is data mining • Resulting model will likely be unstable when run live • Also caution when using hindsight to pick words • For example, “Greek debt crisis” was obvious • But only after the event! • NT<GO> is nice way to visualise news • Bloomberg has machine readable news • Use natural language processing 0 100 200 300 400 500 600 700 2008 2010 2012 2014 Greek Debt Crisis keywords BBG count @saeedamenfx / Copyright Cuemacro
  • 48. Specific steps for text datasets • We can formulate a few generic steps that are used when dealing with a text based dataset for trading purposes • Raw data collection – web scraping and accessing internal databases • Cleaning dataset – removing HTML tags and invalid observations • Structuring dataset – adding tags (eg. sentiment) and compress into single database record • Filtering dataset – choose most relevant entities/topics to prune search space • Create an indicator – aggregate records to create indicators • Apply a trading rule to the indicator – how to convert into buy/sell signals directly or added to other trading factors (eg. carry) @saeedamenfx / Copyright Cuemacro
  • 49. Using Bloomberg News dataset • We shall use a dataset consisting of Bloomberg News articles from 2009-2017 • It is a structured dataset, which saves time (eg. we avoid the time consuming raw data collection step) • Bloomberg News is written in a consistent style, so easier to process than general web content • Each news article has a number of fields tagged including: • Timestamp of news article • Title of news article • Text body of the news article • Tagging for tradable tickers related to the news (eg. %EUR for EURUSD) • Tagging for the topic related to the news (eg. FED for articles related to Federal Reserve) • Company specific news also has additional news analytics fields such as sentiment, readership statistics etc. • Topics we choose will depend on underlying dataset @saeedamenfx / Copyright Cuemacro
  • 50. Generate news signals for FX • We want to use news to inform FX trading strategies • Want to develop longer term strategies (ie. not high frequency headline trading) • Hence, focus will be on macro specific news to trade FX in particular • Tickers: %EUR, %GBP, %AUD, %NZD, %USD, %CAD, %NOK, %SEK and %JPY • Topics: FED and ECB • Could have chosen many other relevant macro topics • Helps us prune the search space to most relevant news • Steps we shall do • Clean body text slightly (eg. remove start of article) • Ignore very short articles as difficult to gauge sentiment • Apply sentiment analysis for each article (shall use open source Python based libraries) • Aggregate data into daily observations (careful about holidays!) • Create indices for each currency/topic (Z scores for comparability) • Also generate a news volume score (Z score for comparability) @saeedamenfx / Copyright Cuemacro
  • 51. Currency pair sentiment score • Currency pair score = base score – terms score • When eg. USD/JPY score is positive buy, otherwise sell @saeedamenfx / Copyright Cuemacro
  • 52. News trading rule by currency pair • Present risk adjusted returns and compare to a generic trend following strategy • Apply vol targeting in each instance • News based trading role outperforms trend significantly in our sample @saeedamenfx / Copyright Cuemacro
  • 53. News trading rule as basket • Create news and trend baskets • News basket heavily outperforms trend basket @saeedamenfx / Copyright Cuemacro
  • 54. What about news volume? • News volume on a currency pair is heavily correlated with its implied volatility, which seems intuitive! • T statistics show a statistically significant relationship in nearly every currency pair in our sample • News volume can be used to help us model FX volatility – is FX volatility in line with what we could expect based on newsflow? @saeedamenfx / Copyright Cuemacro
  • 55. Scheduled events • Before scheduled events, FX vol market makers will mark up vol curve • Known as event volatility add-on • LHS show EUR/USD ON vol on Fed days, and RHS for ECB days (ignores all other days) • Have model for estimating add-on (assumes only one big event per day) • Typically, realized underperforms on these days.. Sell vol*! • *within reason… @saeedamenfx / Copyright Cuemacro
  • 56. News for scheduled events • Can we use news around scheduled events, eg. FED and ECB topics in our case to inform where the add-on is • And also to give us an idea of where realized vol would be subsequently? Gamma traders are taking a view on where implied – realized will be • There does seem to be a relationship between EUR/USD vol and news before FOMC and ECB meetings @saeedamenfx / Copyright Cuemacro
  • 57. EUR/USD vol and news on FOMC days • Showing news volume versus add-on, implied and realized ON in EUR/USD on FOMC days @saeedamenfx / Copyright Cuemacro
  • 58. EUR/USD vol and news on ECB days • Showing news volume versus add-on, implied and realized ON in EUR/USD on ECB days @saeedamenfx / Copyright Cuemacro
  • 59. Conclusion • Alternative data primer, introducing the topic • Talked about where to find data • Showed examples of how to generate (and save!) alpha using alternative data examining • CLS FX flow data to generate FX trading signals • Text based datasets for Fed communications • Geospatial Insights satellite imagery to estimate EPS • tcapy to reduce trading costs for FX Saeed Amen / @saeedamenfx
  • 60. Any questions? • Drop me an e-mail at saeed@cuemacro.com, ring me or tweet to @saeedamenfx (or even talk to me now, the old school way!) Saeed Amen / @saeedamenfx
  • 61.
  • 62.
  • 63.
  • 64. 10 Demos, slides and video available on QuAcademy Go to www.qu.academy 10
  • 65. 11 Instructions for the Lab: 1. Go to https://academy.qusandbox.com/#/register and register using the code: "QUSUMMERSCHOOL"
  • 66. 12 See you in FALL 2020!
  • 67. Thank you! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 13