SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Real Time Machine Learning Architecture &
Sentiment Analysis
Quantcon 2016, Singapore
Juan CHENG, PHD
Data Scientist
cheng.juan@infotrie.com
www.infotrie.com
@infotrie
www.finsents.com
@finsents
● About us
● News analytics in finance
● A news analytics case
○ Information extraction of text
○ Text feature extraction for machine learning classification
○ Big data tools applied
○ Architecture that combines all
Frederic GEORJON
CEO
Ajil GEORGE
Head of Development Center
Daniel ABROUK
Head of EMEA
Paris/Singapore London
LONG Zhicheng
CTO
Singapore India
FinSentS.com
➔ Real-time information
and trading portal
➔ Millions of sources /
Multilingual
➔ Saas or on premises
➔ Real-time Alerts
➔ Actionable signals
Sentiment Data
➔ Through API or 1/3 parties
➔ Up to 15 years of history
➔ Low latency / Tick by tick
➔ 50,000+ entities
➔ Stock, Forex, commodities,
index, Macroeconomic topics
etc…
Consultancy and Training
➔ Trading Technology
➔ Algorithmic trading
➔ Big Data
➔ Natural Language
Processing (NLP)
➔ Machine Learning
B.
No, I’m a quant. I
found it’s hard to
quantified news.
A.
No, I found news are
noisy. They are just
too much.
C.
Yes. But I found using
news is not very efficient.
I have to manually
related them to my
portfolio.
Access to News / News
management
- Visualization tools
- Filtering tools
- On demand view
Feed from multiple sources:
- Social Media
- Web based content
- Private sources
- Internal data
News Content Alerts
based on sentiment
indicator
Provide accurate
information from Big
Data environment and
pushed it front of Users
in real time for Risk
management
Dashboard
- Consolidated
Dashboard
- Portfolio Alerts
Actionable indicators
Users receive news
signals for trading /
hedging / risk
management based
sentiment indicator
Algo Trading / Robo Trading
Real Time algorithmic trading
Sentiment indicator and News
Analytics
Equity Research / Sales Team Hedging Trader / Prop Trader
- News Tag Cloud
- Filtering newsfeed with
Social media blotter, news
blotter
- Search Engine on demand
- Topics detection
- Rumours alerts
- News qualification per
importance
- Relevant information
from single screen
- Automatic Alert
- Integrated to OMS
Provide relevant news
analytics indicator for
hedging or trade idea
generation
Fully integrated news
analytics signals integrated
to algo trading strategies
Reuters
MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT
AT&T acquires Time Warner for $85 billion
NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion,
the boldest move yet by a telecommunications company to acquire content to
stream over its high-speed network to attract a growing number of online
viewers.
The trend of consolidation comes as technology advances have been upending
traditional entertainment companies. Many in the industry believe that getting
bigger is the best way to compete with companies like Google, Apple, Netflix and
Facebook.
David Goldman and Paul R. La Monica contributed to this report.
Reuters
MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT
AT&T acquires Time Warner for $85 billion
NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion,
the boldest move yet by a telecommunications company to acquire content to
stream over its high-speed network to attract a growing number of online
viewers.
The trend of consolidation comes as technology advances have been upending
traditional entertainment companies. Many in the industry believe that getting
bigger is the best way to compete with companies like Google, Apple, Netflix and
Facebook.
David Goldman and Paul R. La Monica contributed to this report.
Source
Category
Time
Location
Named Entity
Sentiment
Event
Hacking skill, regex,nlp, named entity recognition, pos taggers
Train Document Set:
d1: The sky is blue.
d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.
d4: We can see the shining sun, the
bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
Train Document Set:
d1: The sky is blue.
d2: The sun is bright.
Vocabulary
Term frequency(TF)
TF emphasize a term which is almost present in the entire corpus
TD-IDF
TF example IDF example
Normalized TD-IDF
Train Document Set:
d1: The sky is blue.
d2: The sun is bright.
Test Document Set:
d3: The sun in the sky is bright.
d4: We can see the shining sun, the
bright sun.
Vector Space Model (VSM)
t1 t2...
d1
d2 ...
Machine Learning
- Companies, indexes
- People, locations, organizations
- Events
- Regions
NLP
Text
- Dow Jones, bloomberg
- Web news, blogs, twitter
- 1000+ sources
Feature Extraction
Classification
Sentiment
- 15 years history
- Tens of millions of articles
Training
Indexing
- Sector/industry
- Commodity, FX, ETFs
- Political, country risk
- Macroeconomic
- Fear, greed, anger,
happiness
Aggregation
❏ Guaranteed data processing
❏ Horizontal scalability
❏ Fault-tolerance
❏ Higher level abstraction than message passing
❏ Real-time machine learning for classification and predictive
analytics
Analytics on
Massive Historical
Text Data
Analytics on
recent pass
Realtime
analytics
Batch layer real-time layer
Fast and general engine for large-scale distributed data processing
Memory Network CPU’s Disk
Reference: spark
Logistic regression in Hadoop and Spark
open source distributed realtime computation system, easily process unbounded streams of data
Storm was benchmarked at
processing one million 100
byte messages per second
per node on hardware with the
following specs:
● Processor: 2x Intel
E5645@2.4Ghz
● Memory: 24 GB
Reference: storm
Spout
bolt
✓ Guaranteed data processing
✓ Horizontal scalability
✓ Fault-tolerance
✓ Higher level abstraction than message
passing
✓ Real-time machine learning for
classification and predictive analytics
NoSQL Database
cache
persistent
Kafka Filter, topic classification,
sentiment calculation,
entity detection, stock
mapping, sentiment
aggregation
Apache Storm
DFS
Nlp models
ML models
Producers
Blogs, twitter,
news,
bloomberg...
Model training, batch
cleaning, batch calculation
Apache Spark
Solr
Relational
Database
Web
app
➔ Scale analysis pipeline
➔ Live stats
➔ Recommendations
➔ Predictions
➔ Realtime analytics
➔ Online machine learning
Apply similar architecture in
Available @www.infotrie.com
contact@infotrie.com
@infotrie
www.finsents.com
@finsents
Sentiment in itself is a powerful trading indicator out of which
multiple trading strategies can be build
Simulate impact of
complex events
MIFID alert
Improve Client's communication
Regulatory
Process complex / low signals
events
ESG monitoring
Ecological – Social –
Governance
An union calls for
a strike in a
factory in
Argentina?
Negative news coverage is
accelerating for a stock I
hold in Chinese press but
are not yet in English press?
A European company
employs children in
Bangladesh (*)?
ACTIONS
1
1
1
1
1
1
1
1
1
3
2
3
1
1
1
1
1
1
1
1
1
1
3
2
3
1
1
1
1
1
1
1
1
1
1
3
2
3
1
dfs
9
6
3
9
9
6
9
3
text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)Job
Executor
Nimbus
Zookeeper
Zookeeper
Worker
Worker
Worker
Worker
Velocity
Big
Data
Variety
- News, blogs, social media,
analyst reports, company
announcement, traders’ chat
room…
- Financial reports, price,
economic events...
- Weather, GPS, image....
Volumn
- ETL
- Machine learning
- Correlation analysis,
- regressions….
- As fast as possible

Más contenido relacionado

La actualidad más candente

La actualidad más candente (9)

Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
Latency in Automated Trading Systems by Andrei Kirilenko at QuantCon 2016
 
“The Sum of The Parts Must Be Greater Than the Whole; Is There or What Is A C...
“The Sum of The Parts Must Be Greater Than the Whole; Is There or What Is A C...“The Sum of The Parts Must Be Greater Than the Whole; Is There or What Is A C...
“The Sum of The Parts Must Be Greater Than the Whole; Is There or What Is A C...
 
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael..."Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
 
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese..."Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
 
Quant in Crypto Land
Quant in Crypto LandQuant in Crypto Land
Quant in Crypto Land
 
Blockchain/Tokenization as a Business: token types, business models, fundrais...
Blockchain/Tokenization as a Business: token types, business models, fundrais...Blockchain/Tokenization as a Business: token types, business models, fundrais...
Blockchain/Tokenization as a Business: token types, business models, fundrais...
 
Careers in Finance for Tech Graduates
Careers in Finance for Tech GraduatesCareers in Finance for Tech Graduates
Careers in Finance for Tech Graduates
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
 
Internship presentation
Internship presentationInternship presentation
Internship presentation
 

Similar a “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Juan Cheng, Data Scientist at Infotrie

Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
Perficient, Inc.
 
Harness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleHarness the Power of Big Data with Oracle
Harness the Power of Big Data with Oracle
Sai Janakiram Penumuru
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
Geetha982072
 
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku LepistoCOSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
Amazon Web Services
 

Similar a “Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Juan Cheng, Data Scientist at Infotrie (20)

“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
 
Applications of AI in Supply Chain Management: Hype versus Reality
Applications of AI in Supply Chain Management: Hype versus RealityApplications of AI in Supply Chain Management: Hype versus Reality
Applications of AI in Supply Chain Management: Hype versus Reality
 
Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Filtering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingFiltering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media Streaming
 
Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"Algorithm Marketplace and the new "Algorithm Economy"
Algorithm Marketplace and the new "Algorithm Economy"
 
Session 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data BenchmarksSession 1 - The Current Landscape of Big Data Benchmarks
Session 1 - The Current Landscape of Big Data Benchmarks
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Integrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and PerficientIntegrate Big Data into Your Organization with Informatica and Perficient
Integrate Big Data into Your Organization with Informatica and Perficient
 
applications and advantages of python
applications and advantages of pythonapplications and advantages of python
applications and advantages of python
 
Spark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWXSpark-Zeppelin-ML on HWX
Spark-Zeppelin-ML on HWX
 
Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022
 
Harness the Power of Big Data with Oracle
Harness the Power of Big Data with OracleHarness the Power of Big Data with Oracle
Harness the Power of Big Data with Oracle
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
 
Benefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleBenefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycle
 
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
[DSC Croatia 22] Building smarter ML and AI models and making them more accur...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
emerging trends.pdf
emerging trends.pdfemerging trends.pdf
emerging trends.pdf
 
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku LepistoCOSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
COSCUP - Open Source Engines Providing Big Data in the Cloud, Markku Lepisto
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 

Más de Quantopian

"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red..."From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
Quantopian
 

Más de Quantopian (20)

Being open (source) in the traditionally secretive field of quant finance.
Being open (source) in the traditionally secretive field of quant finance.Being open (source) in the traditionally secretive field of quant finance.
Being open (source) in the traditionally secretive field of quant finance.
 
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
Stauth common pitfalls_stock_market_modeling_pqtc_fall2018
 
Tearsheet feedback webinar 10.10.18
Tearsheet feedback webinar 10.10.18Tearsheet feedback webinar 10.10.18
Tearsheet feedback webinar 10.10.18
 
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,..."Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
"Three Dimensional Time: Working with Alternative Data" by Kathryn Glowinski,...
 
"Alpha from Alternative Data" by Emmett Kilduff, Founder and CEO of Eagle Alpha
"Alpha from Alternative Data" by Emmett Kilduff,  Founder and CEO of Eagle Alpha"Alpha from Alternative Data" by Emmett Kilduff,  Founder and CEO of Eagle Alpha
"Alpha from Alternative Data" by Emmett Kilduff, Founder and CEO of Eagle Alpha
 
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob..."Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
 
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas..."Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
"Quant Trading for a Living – Lessons from a Life in the Trenches" by Andreas...
 
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
“Market Insights Through the Lens of a Risk Model” by Olivier d'Assier, Head ...
 
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin..."Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
"Maximize Alpha with Systematic Factor Testing" by Cheng Peng, Software Engin...
 
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
"Fundamental Forecasts: Methods and Timing" by Vinesh Jha, CEO of ExtractAlpha
 
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D..."From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
"From Alpha Discovery to Portfolio Construction: Pitfalls and Solutions" by D...
 
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo..."Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
"Deep Reinforcement Learning for Optimal Order Placement in a Limit Order Boo...
 
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes..."Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
"Making the Grade: A Look Inside the Algorithm Evaluation Process" by Dr. Jes...
 
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos..."Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
"Building Diversified Portfolios that Outperform Out-of-Sample" by Dr. Marcos...
 
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red..."From Insufficient Economic data to Economic Big Data – How Trade Data is red...
"From Insufficient Economic data to Economic Big Data – How Trade Data is red...
 
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ..."A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
"A Framework-Based Approach to Building Quantitative Trading Systems" by Dr. ...
 
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ..."Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
"Don't Lose Your Shirt Trading Mean-Reversion" by Edith Mandel, Principal at ...
 
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C..."Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
 
"Quantum Hierarchical Risk Parity - A Quantum-Inspired Approach to Portfolio ...
"Quantum Hierarchical Risk Parity - A Quantum-Inspired Approach to Portfolio ..."Quantum Hierarchical Risk Parity - A Quantum-Inspired Approach to Portfolio ...
"Quantum Hierarchical Risk Parity - A Quantum-Inspired Approach to Portfolio ...
 
"Snake Oil, Swamp Land, and Factor-Based Investing" by Gary Antonacci, author...
"Snake Oil, Swamp Land, and Factor-Based Investing" by Gary Antonacci, author..."Snake Oil, Swamp Land, and Factor-Based Investing" by Gary Antonacci, author...
"Snake Oil, Swamp Land, and Factor-Based Investing" by Gary Antonacci, author...
 

Último

MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
Cocity Enterprises
 
Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Business Principles, Tools, and Techniques in Participating in Various Types...
Business Principles, Tools, and Techniques  in Participating in Various Types...Business Principles, Tools, and Techniques  in Participating in Various Types...
Business Principles, Tools, and Techniques in Participating in Various Types...
 
Technology industry / Finnish economic outlook
Technology industry / Finnish economic outlookTechnology industry / Finnish economic outlook
Technology industry / Finnish economic outlook
 
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai MultipleDubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
Dubai Call Girls Deira O525547819 Dubai Call Girls Bur Dubai Multiple
 
Female Russian Escorts Mumbai Call Girls-((ANdheri))9833754194-Jogeshawri Fre...
Female Russian Escorts Mumbai Call Girls-((ANdheri))9833754194-Jogeshawri Fre...Female Russian Escorts Mumbai Call Girls-((ANdheri))9833754194-Jogeshawri Fre...
Female Russian Escorts Mumbai Call Girls-((ANdheri))9833754194-Jogeshawri Fre...
 
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdfMASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
MASTERING FOREX: STRATEGIES FOR SUCCESS.pdf
 
Pension dashboards forum 1 May 2024 (1).pdf
Pension dashboards forum 1 May 2024 (1).pdfPension dashboards forum 1 May 2024 (1).pdf
Pension dashboards forum 1 May 2024 (1).pdf
 
falcon-invoice-discounting-unlocking-prime-investment-opportunities
falcon-invoice-discounting-unlocking-prime-investment-opportunitiesfalcon-invoice-discounting-unlocking-prime-investment-opportunities
falcon-invoice-discounting-unlocking-prime-investment-opportunities
 
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsMahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Mahendragarh Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Significant AI Trends for the Financial Industry in 2024 and How to Utilize Them
Significant AI Trends for the Financial Industry in 2024 and How to Utilize ThemSignificant AI Trends for the Financial Industry in 2024 and How to Utilize Them
Significant AI Trends for the Financial Industry in 2024 and How to Utilize Them
 
Premium Call Girls Bangalore Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
Premium Call Girls Bangalore Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...Premium Call Girls Bangalore Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
Premium Call Girls Bangalore Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
Lion One Corporate Presentation May 2024
Lion One Corporate Presentation May 2024Lion One Corporate Presentation May 2024
Lion One Corporate Presentation May 2024
 
W.D. Gann Theory Complete Information.pdf
W.D. Gann Theory Complete Information.pdfW.D. Gann Theory Complete Information.pdf
W.D. Gann Theory Complete Information.pdf
 
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
Female Escorts Service in Hyderabad Starting with 5000/- for Savita Escorts S...
 
Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...
Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...
Escorts Indore Call Girls-9155612368-Vijay Nagar Decent Fantastic Call Girls ...
 
GIFT City Overview India's Gateway to Global Finance
GIFT City Overview  India's Gateway to Global FinanceGIFT City Overview  India's Gateway to Global Finance
GIFT City Overview India's Gateway to Global Finance
 
Test bank for advanced assessment interpreting findings and formulating diffe...
Test bank for advanced assessment interpreting findings and formulating diffe...Test bank for advanced assessment interpreting findings and formulating diffe...
Test bank for advanced assessment interpreting findings and formulating diffe...
 
Bhubaneswar🌹Ravi Tailkes ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...
Bhubaneswar🌹Ravi Tailkes  ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...Bhubaneswar🌹Ravi Tailkes  ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...
Bhubaneswar🌹Ravi Tailkes ❤CALL GIRLS 9777949614 💟 CALL GIRLS IN bhubaneswar ...
 
Webinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech BelgiumWebinar on E-Invoicing for Fintech Belgium
Webinar on E-Invoicing for Fintech Belgium
 
Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in Tilak Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
7 tips trading Deriv Accumulator Options
7 tips trading Deriv Accumulator Options7 tips trading Deriv Accumulator Options
7 tips trading Deriv Accumulator Options
 

“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Finance” by Juan Cheng, Data Scientist at Infotrie

  • 1. Real Time Machine Learning Architecture & Sentiment Analysis Quantcon 2016, Singapore Juan CHENG, PHD Data Scientist cheng.juan@infotrie.com www.infotrie.com @infotrie www.finsents.com @finsents
  • 2. ● About us ● News analytics in finance ● A news analytics case ○ Information extraction of text ○ Text feature extraction for machine learning classification ○ Big data tools applied ○ Architecture that combines all
  • 3. Frederic GEORJON CEO Ajil GEORGE Head of Development Center Daniel ABROUK Head of EMEA Paris/Singapore London LONG Zhicheng CTO Singapore India
  • 4. FinSentS.com ➔ Real-time information and trading portal ➔ Millions of sources / Multilingual ➔ Saas or on premises ➔ Real-time Alerts ➔ Actionable signals Sentiment Data ➔ Through API or 1/3 parties ➔ Up to 15 years of history ➔ Low latency / Tick by tick ➔ 50,000+ entities ➔ Stock, Forex, commodities, index, Macroeconomic topics etc… Consultancy and Training ➔ Trading Technology ➔ Algorithmic trading ➔ Big Data ➔ Natural Language Processing (NLP) ➔ Machine Learning
  • 5. B. No, I’m a quant. I found it’s hard to quantified news. A. No, I found news are noisy. They are just too much. C. Yes. But I found using news is not very efficient. I have to manually related them to my portfolio.
  • 6. Access to News / News management - Visualization tools - Filtering tools - On demand view Feed from multiple sources: - Social Media - Web based content - Private sources - Internal data News Content Alerts based on sentiment indicator Provide accurate information from Big Data environment and pushed it front of Users in real time for Risk management Dashboard - Consolidated Dashboard - Portfolio Alerts Actionable indicators Users receive news signals for trading / hedging / risk management based sentiment indicator Algo Trading / Robo Trading Real Time algorithmic trading Sentiment indicator and News Analytics Equity Research / Sales Team Hedging Trader / Prop Trader - News Tag Cloud - Filtering newsfeed with Social media blotter, news blotter - Search Engine on demand - Topics detection - Rumours alerts - News qualification per importance - Relevant information from single screen - Automatic Alert - Integrated to OMS Provide relevant news analytics indicator for hedging or trade idea generation Fully integrated news analytics signals integrated to algo trading strategies
  • 7. Reuters MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT AT&T acquires Time Warner for $85 billion NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers. The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook. David Goldman and Paul R. La Monica contributed to this report.
  • 8. Reuters MARKET NEWS | Fri Oct 21, 2016 | 2:18am EDT AT&T acquires Time Warner for $85 billion NEW YORK- AT&T Inc said it agreed to buy Time Warner Inc for $85.4 billion, the boldest move yet by a telecommunications company to acquire content to stream over its high-speed network to attract a growing number of online viewers. The trend of consolidation comes as technology advances have been upending traditional entertainment companies. Many in the industry believe that getting bigger is the best way to compete with companies like Google, Apple, Netflix and Facebook. David Goldman and Paul R. La Monica contributed to this report. Source Category Time Location Named Entity Sentiment Event Hacking skill, regex,nlp, named entity recognition, pos taggers
  • 9. Train Document Set: d1: The sky is blue. d2: The sun is bright. Test Document Set: d3: The sun in the sky is bright. d4: We can see the shining sun, the bright sun. Vector Space Model (VSM) t1 t2... d1 d2 ...
  • 10. Train Document Set: d1: The sky is blue. d2: The sun is bright. Vocabulary Term frequency(TF)
  • 11. TF emphasize a term which is almost present in the entire corpus TD-IDF TF example IDF example Normalized TD-IDF
  • 12. Train Document Set: d1: The sky is blue. d2: The sun is bright. Test Document Set: d3: The sun in the sky is bright. d4: We can see the shining sun, the bright sun. Vector Space Model (VSM) t1 t2... d1 d2 ... Machine Learning
  • 13. - Companies, indexes - People, locations, organizations - Events - Regions NLP Text - Dow Jones, bloomberg - Web news, blogs, twitter - 1000+ sources Feature Extraction Classification Sentiment - 15 years history - Tens of millions of articles Training Indexing - Sector/industry - Commodity, FX, ETFs - Political, country risk - Macroeconomic - Fear, greed, anger, happiness Aggregation
  • 14. ❏ Guaranteed data processing ❏ Horizontal scalability ❏ Fault-tolerance ❏ Higher level abstraction than message passing ❏ Real-time machine learning for classification and predictive analytics
  • 15. Analytics on Massive Historical Text Data Analytics on recent pass Realtime analytics Batch layer real-time layer
  • 16. Fast and general engine for large-scale distributed data processing Memory Network CPU’s Disk Reference: spark Logistic regression in Hadoop and Spark
  • 17. open source distributed realtime computation system, easily process unbounded streams of data Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs: ● Processor: 2x Intel E5645@2.4Ghz ● Memory: 24 GB Reference: storm Spout bolt
  • 18. ✓ Guaranteed data processing ✓ Horizontal scalability ✓ Fault-tolerance ✓ Higher level abstraction than message passing ✓ Real-time machine learning for classification and predictive analytics
  • 19. NoSQL Database cache persistent Kafka Filter, topic classification, sentiment calculation, entity detection, stock mapping, sentiment aggregation Apache Storm DFS Nlp models ML models Producers Blogs, twitter, news, bloomberg... Model training, batch cleaning, batch calculation Apache Spark Solr Relational Database Web app
  • 20. ➔ Scale analysis pipeline ➔ Live stats ➔ Recommendations ➔ Predictions ➔ Realtime analytics ➔ Online machine learning Apply similar architecture in
  • 22. Sentiment in itself is a powerful trading indicator out of which multiple trading strategies can be build Simulate impact of complex events
  • 23. MIFID alert Improve Client's communication Regulatory Process complex / low signals events ESG monitoring Ecological – Social – Governance An union calls for a strike in a factory in Argentina? Negative news coverage is accelerating for a stock I hold in Chinese press but are not yet in English press? A European company employs children in Bangladesh (*)? ACTIONS
  • 26. Velocity Big Data Variety - News, blogs, social media, analyst reports, company announcement, traders’ chat room… - Financial reports, price, economic events... - Weather, GPS, image.... Volumn - ETL - Machine learning - Correlation analysis, - regressions…. - As fast as possible