From QuantCon 2017: Lookahead bias and stale data when used in an algorithm are generally categorized as "incorrect data". In fact, the issue does not lie with the data itself, but instead is an issue of perspective. This talk will examine how data is typically viewed through the lens of time, and why, on the whole, that approach is wrong.
At Quantopian, we've tried several ways of handling data with regards to time, and we'll talk about lessons learned along the way. We'll also discuss what multidimensionality means for financial data specifically, and how we can apply this to get better results in backtesting.
Additionally, we'll touch on how to apply multidimensionality to more general data, and why it's important for anyone working with applied data to take this approach.
2. Disclaimer
This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for
any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing
contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed
herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the
information contained herein, Quantopian has not taken into account the investment needs, objectives, and financial circumstances of any
particular investor. Additionally, this presentation is being provided on the express basis that it and any related communications (whether
written or oral) will not cause Quantopian to become an investment advice fiduciary under ERISA or the Internal Revenue Code with respect
to any retirement plan or IRA investor, as the recipients are fully aware that the Quantopian (i) is not undertaking to provide impartial
investment advice, make a recommendation regarding the acquisition, holding or disposal of an investment, act as an impartial adviser, or
give advice in a fiduciary capacity, and (ii) has a financial interest in the offering and sale of one or more products and services, which may
depend on a number of factors relating to Quantopian’s internal business objectives, and which has been disclosed to the recipient. Nothing
set forth herein or any information conveyed (in writing or orally) in connection with this presentation is intended to constitute a
recommendation that any person take or refrain from taking any course of action within the meaning of U.S. Department of Labor Regulation
§2510.3-21(b)(1), including without limitation buying, selling or continuing to hold any security. No information contained herein should be
regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates
is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act
of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the
materials presented herein. You are advised to contact your own financial advisor or other fiduciary unrelated to Quantopian about whether
any given course of action may be appropriate for your circumstances. The information provided herein is intended to be used solely by the
recipient in considering the products or services described herein and may not be used for any other reason, personal or otherwise. Any
views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian at the
time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may
quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.
3. What’ll It Be?
Let’s chat about how data is typically viewed through the lens of time.
Because generally, that way is typically some percentage wrong. Let me tell you
why.
Lessons learned at Q along the way.
What does “multidimensional” data mean for MY data?
More importantly, what does it mean for ANYONE’S data?
Quantopian.com
11. Enter, Fundamentals
What if we captured, every day, what we knew the latest value to be for every
piece of known information?
It should now be the corrected state of the world for any present moment.
3/1/17
3/3/17
3/5/17
3/9/17
3/7/17
3/13/17
3/11/17
First Known
Revisions
10 12
11
10 10 10 10 10 10 11 12 12 12 12 12 12 12Seen
Quantopian.com
12. Still Not Quite Right
But we still have the same problem, for revisions to updated data.
And what happens if you have 250GB of sparse data alone, before you even
forward fill those values?
Quantopian.com
14. Lookahead Bias
Using data for a backtest that we didn’t know at the time is called lookahead
bias.
We try VERY hard to avoid this, because it corrupts evaluation of any strategy.
“I know that Apple did well in the past, so I’m going to backtest a strategy that
just holds Apple after 2005.”
Quantopian.com
15. Stale Data
This can be equally disastrous!
If the data is never updated, you may be stuck with hilariously incorrect values.
“My vendor told me that company ABCD announced a split of 1:25, so I on that
day, I traded 25 times what I normally would. But when it actually happened it
was only 1:5!”
Quantopian.com
16. So, What Do We Do?
This is referred to as point-in-time data.
It’s a BIG deal.
Personally, I think it’s the BIGGEST deal.
(I’m really biased because I do this for a living.)
Quantopian.com
21. Perspective Matters
It’s not just about when the data happened.
It’s also about when you’re observing the data.
If you have a dataset that ever has updates, revisions, or corrections, the data
for a single data can change as you move through time.
Quantopian.com
25. Bi-Temporal Data
Separates the concepts of when the information HAPPENED from when we
KNOW it.
Maintains accuracy with regards to data changes through history.
Allows questions asked to be answered with regards to perspective.
Quantopian.com
29. 5 297 9
5 87 9
as-of date 1,
timestamp 1
as-of date 1,
timestamp 2
30. But...Do We Care?
PROS
Reproduces events EXACTLY as they
occurred.
Allows for accurate modeling of
simulations from the past.
Easily allows for vendor updates to
the most accurate known data.
CONS
Can force modeling of atypical past
events that wouldn’t happen in
modern day.
If there are system errors, can be
proliferated even into past data.
Data shown can be “imperfect” from
vendor or ingestor error.
Quantopian.com
31. Data Analyses Should be Replicable
Realistic view of data delivery, instead of the optimized view.
The world isn’t perfect, and your data is DEFINITELY not perfect.
But, it should at least be consistently imperfect.
Quantopian.com
32. Different Users, Different Needs
Quantopian.com
Point in time data is a layer of complexity.
In evaluation, only care- does it have alpha?
Users further on have the luxury of checking survivability.
99% of users don’t want to see a platform’s mistakes.
(I made that statistic up, but I’m pretty sure it’s accurate.)
33. What Else Can this Do for Me?
TWTR Actual Time
Perspective
2006 2017
2006 Tweeter -------
2017 Tweeter Twitter
Quantopian.com
34. Verify data model assumptions
“We never change our data after the fact. We wouldn’t do that”
~A Quantopian Data Vendor
Quantopian.com
Is that All?
38. Fundamentals, Redux
We just deployed a new system.
Capture not only the first known value, but also the adjustments to those
values.
3/1/17
3/3/17
3/5/17
3/9/17
3/7/17
3/13/17
3/11/17
First Known
Revisions
Perspective of
3/6/17
10 12
11
10 10 10 10 10 10 - - - - - - - -
11 11 11 11 11 11 11 11 12 12 12 12 12 12Perspective of
3/14/17
39. Point in Timeness as a Service
Raw data history for all (ingested) time
But we’d like to update the way that users can give us data too!
40. But Wait, there’s More
Point in Time data doesn’t just have to be stock specific data.
This should be applicable to any field, any data.
43. Disclaimer
This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for
any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing
contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed
herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the
information contained herein, Quantopian has not taken into account the investment needs, objectives, and financial circumstances of any
particular investor. Additionally, this presentation is being provided on the express basis that it and any related communications (whether
written or oral) will not cause Quantopian to become an investment advice fiduciary under ERISA or the Internal Revenue Code with respect
to any retirement plan or IRA investor, as the recipients are fully aware that the Quantopian (i) is not undertaking to provide impartial
investment advice, make a recommendation regarding the acquisition, holding or disposal of an investment, act as an impartial adviser, or
give advice in a fiduciary capacity, and (ii) has a financial interest in the offering and sale of one or more products and services, which may
depend on a number of factors relating to Quantopian’s internal business objectives, and which has been disclosed to the recipient. Nothing
set forth herein or any information conveyed (in writing or orally) in connection with this presentation is intended to constitute a
recommendation that any person take or refrain from taking any course of action within the meaning of U.S. Department of Labor Regulation
§2510.3-21(b)(1), including without limitation buying, selling or continuing to hold any security. No information contained herein should be
regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates
is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act
of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the
materials presented herein. You are advised to contact your own financial advisor or other fiduciary unrelated to Quantopian about whether
any given course of action may be appropriate for your circumstances. The information provided herein is intended to be used solely by the
recipient in considering the products or services described herein and may not be used for any other reason, personal or otherwise. Any
views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian at the
time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may
quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.