SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Discontinuities Demo
Shrayes Ramesh, PhD.
Data Tactics Corporation

Thursday, November 14, 13
Challenges
• how do you decide which variables capture
what happened?
• /when/ did an event happen
• what's the effect of the event on the variables
• Can we construct a UI and algorithm to tackle
all three problem simultaneously?

Thursday, November 14, 13
Goals
• The goal is to feed in raw data as the sole
input, and obtain answers to all three
questions:
• (1) when did an event likely occur
• (2) what variables can we use to measure the
event
• (3) what was the effect of the event on those
variables

Thursday, November 14, 13
variable outcome

Simple example

effect

event

Thursday, November 14, 13

time
Scaled up

Thursday, November 14, 13
With limited insight…
• if we know timing and the important
variables, we can measure the effect of the
shock on the variables. (standard regression
techniques)
• if we know the set of important variables and
track variables over time, we can identify
timing of shocks.
• if we know timing and have a long history of
variable evolution, we can cluster variables by
their behavior at the important point in time
(relative to other points in time)
Thursday, November 14, 13
estimated effect

Methodology

correct
effect

correct timing
Thursday, November 14, 13
Methodology
•

For every time T and variable K, run an OLS, under the hypothesis that a shock
occurred at time T to variable K

•

Sample is restricted to variables for a neighborhood around t, i.e. [t-bandwith , t
+bandwidth]:
Y(K,t) = A(K,T) + B(K,T)S(t) + e(K,t) with
S(t) = 1(t>T) is an indicator with T as the time to test

•

Results are stored as the matrix of coefficients B(K,T)

•

OLS estimates of B(K,T) are biased towards zero to the extent that S(t) is
misspecified.

•

In other words, B(K,T) will be maximally different from zero (and unbiased) at the
true break T

Thursday, November 14, 13
Methodology

Thursday, November 14, 13
Answers
• when did an event likely occur?
– aggregate (sum) effects across all variables

• what variables can we use to measure the
event?
– which variables had the largest effect at time point?

• what was the effect of the event on those
variables?
– we just measured that

• what variables move together often across
time?
– show similar variables
Thursday, November 14, 13
Example 1: Super Bowl tweets
• Twitter streaming API (every tweet)
• Sample of data selected from Sunday,
February 3, 1600-2210 hours
• Binned into minute-by-minute word counts
• Out of 651k 1-grams, kept 1035 least sparse
(> 30% sparse) words.
• Input data is 371x1035 matrix

Thursday, November 14, 13
SUPERBOWL SHINY

Thursday, November 14, 13
Network graph of variables with
correlations > .95

Halftime show

Power outage

Thursday, November 14, 13
Deployable and Repeatable
• The model only requires data to be
transformed to a KxT matrix.
– K variables
– T time periods

We could use this model on many other data sets!
• minute-by-minute word count in twitter
• stock prices
• chatter on social media forums

Thursday, November 14, 13
Network graph of forums with
correlations > .27

Hezbollah

Thursday, November 14, 13
Future improvements
• OLS is simple and efficient, but other models
may be more accurate at estimating effects in
some cases
• exploring different approaches to choosing
which variables to consider and approach to
aggregating variable effects.
• massively parallel on all 630k words
simultaneously?
• real-time analytics on streaming data

Thursday, November 14, 13

Más contenido relacionado

Más de DataTactics

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATADataTactics
 
C Star Analytic Presentation
C Star Analytic PresentationC Star Analytic Presentation
C Star Analytic PresentationDataTactics
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka DataTactics
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Data Tactics Analytics Practice
Data Tactics Analytics PracticeData Tactics Analytics Practice
Data Tactics Analytics PracticeDataTactics
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data ConferenceDataTactics
 
Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag DataTactics
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013DataTactics
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and ReportsDataTactics
 
Data Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionData Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionDataTactics
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013DataTactics
 
Horizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataHorizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataDataTactics
 
Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2DataTactics
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013DataTactics
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief AnalyticsDataTactics
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcDataTactics
 
Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1DataTactics
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3DataTactics
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source BriefDataTactics
 

Más de DataTactics (20)

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
 
C Star Analytic Presentation
C Star Analytic PresentationC Star Analytic Presentation
C Star Analytic Presentation
 
Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka Text Analysis Using Twitter: A Case Study in Dhaka
Text Analysis Using Twitter: A Case Study in Dhaka
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Analytics Practice
Data Tactics Analytics PracticeData Tactics Analytics Practice
Data Tactics Analytics Practice
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
DLISA
DLISADLISA
DLISA
 
Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and Reports
 
Data Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and DescriptionData Tactics Unified Dataspace Architecture and Description
Data Tactics Unified Dataspace Architecture and Description
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
 
Horizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataHorizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence Data
 
Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2Bill Ontology Summit (08 feb 1400hrs) v2
Bill Ontology Summit (08 feb 1400hrs) v2
 
DT Company Overview January 2013
DT Company Overview January 2013DT Company Overview January 2013
DT Company Overview January 2013
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief Analytics
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
 

Último

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Discontinuities Demo

  • 1. Discontinuities Demo Shrayes Ramesh, PhD. Data Tactics Corporation Thursday, November 14, 13
  • 2. Challenges • how do you decide which variables capture what happened? • /when/ did an event happen • what's the effect of the event on the variables • Can we construct a UI and algorithm to tackle all three problem simultaneously? Thursday, November 14, 13
  • 3. Goals • The goal is to feed in raw data as the sole input, and obtain answers to all three questions: • (1) when did an event likely occur • (2) what variables can we use to measure the event • (3) what was the effect of the event on those variables Thursday, November 14, 13
  • 6. With limited insight… • if we know timing and the important variables, we can measure the effect of the shock on the variables. (standard regression techniques) • if we know the set of important variables and track variables over time, we can identify timing of shocks. • if we know timing and have a long history of variable evolution, we can cluster variables by their behavior at the important point in time (relative to other points in time) Thursday, November 14, 13
  • 8. Methodology • For every time T and variable K, run an OLS, under the hypothesis that a shock occurred at time T to variable K • Sample is restricted to variables for a neighborhood around t, i.e. [t-bandwith , t +bandwidth]: Y(K,t) = A(K,T) + B(K,T)S(t) + e(K,t) with S(t) = 1(t>T) is an indicator with T as the time to test • Results are stored as the matrix of coefficients B(K,T) • OLS estimates of B(K,T) are biased towards zero to the extent that S(t) is misspecified. • In other words, B(K,T) will be maximally different from zero (and unbiased) at the true break T Thursday, November 14, 13
  • 10. Answers • when did an event likely occur? – aggregate (sum) effects across all variables • what variables can we use to measure the event? – which variables had the largest effect at time point? • what was the effect of the event on those variables? – we just measured that • what variables move together often across time? – show similar variables Thursday, November 14, 13
  • 11. Example 1: Super Bowl tweets • Twitter streaming API (every tweet) • Sample of data selected from Sunday, February 3, 1600-2210 hours • Binned into minute-by-minute word counts • Out of 651k 1-grams, kept 1035 least sparse (> 30% sparse) words. • Input data is 371x1035 matrix Thursday, November 14, 13
  • 13. Network graph of variables with correlations > .95 Halftime show Power outage Thursday, November 14, 13
  • 14. Deployable and Repeatable • The model only requires data to be transformed to a KxT matrix. – K variables – T time periods We could use this model on many other data sets! • minute-by-minute word count in twitter • stock prices • chatter on social media forums Thursday, November 14, 13
  • 15. Network graph of forums with correlations > .27 Hezbollah Thursday, November 14, 13
  • 16. Future improvements • OLS is simple and efficient, but other models may be more accurate at estimating effects in some cases • exploring different approaches to choosing which variables to consider and approach to aggregating variable effects. • massively parallel on all 630k words simultaneously? • real-time analytics on streaming data Thursday, November 14, 13