Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Big Data On Data You Don’t Have

20 visualizaciones

Publicado el

Traditional Big Data is done on Data you have. You load the data into a repository and perform map reduce or other style calculations on the data. However, certain industries need to perform complex operations on data you might not have. Data you can acquire, Data that can be shared with you, and Data that you can model are all types of data you may not have but may need to integrate instantly into a complex data analysis. Problem is: you may not even know you need this data until deep into the execution stack at runtime. This talk discusses a new functional language paradigm for dealing naturally with data you don’t have and about how to make all data first-class citizens, regardless of whether you have it or you don’t, and we will give a demo of a project written in Scala to deal exactly with this issue.

Publicado en: Software
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Big Data On Data You Don’t Have

  1. 1. Information Classification: GENERAL BIG DATA (on Data you don’t have) 1
  2. 2. Information Classification: GENERAL HOW DO WE DEAL WITH INFINITE DIMENSIONAL DATA….. BY GENERALIZING TRADITIONAL MAP REDUCE PARADIGM…….
  3. 3. Information Classification: GENERAL DISCLAIMER
  4. 4. Information Classification: GENERAL THERE ARE FOUR SOURCES OF DATA 4 Data I have (traditional “Big Data”) Data I can model Data I can acquire Data someone else can acquire or model
  5. 5. Information Classification: GENERAL HOW WE REPRESENT THESE ITEMS 5 Pre-Calculated Data Formulas you have Services Things People can Share with Me
  6. 6. Information Classification: GENERAL 6 Pre-Calculated Data Formulas Services Things People can Share with Me MSCI Beon™
  7. 7. Information Classification: GENERAL 7 Jim Burns David Clark
  8. 8. Information Classification: GENERAL MSCI PLATFORM – A NEXT GENERATION LEAP 8 Big Data Repository Hadoop / Cloudera etc Slice/Dice Traditional Big Data “Data you Have” Paradigm Beon New Front End NEW Big Data Paradigm Calculation and Data Services On Demand Data Expressions The Morning Load Virtual fields Dynamic new data
  9. 9. Information Classification: GENERAL COMPLEX QUESTIONS 9
  10. 10. Information Classification: GENERAL WHAT IS A COMPLEX QUESTION VERSUS A SPECIFIC QUESTION? 10 Specific questions can be hard, for example: • What happens to sea level if the temperature goes up 1.5 degrees by 2035? • What properties are on the beach and over x meters above sea level in Marbella? • What are the biggest real estate bargains in a portfolio. Complex questions are combinations of specific questions. • What should I buy if I believe that temperatures are going to raise 1.5 degrees by 2035 and I only want property that will be at least 1 meter above sea level in 2035 but still on the beach.
  11. 11. Information Classification: GENERAL HOW TO ANSWER A COMPLEX QUESTION 11 So to answer a complex question you need something that can answer this Let Portfolio = All the houses in Marbella safeHouses = Filter( SeaLevel >= 1.0 + seaLevelRise(1.5 c)) Portfolio BestBargains = BargainFinder safeHouses It does this by calling the services below for certain calculations. Platform Marbella Houses Planet Simulator Sea Level RaiseHouse Database Execute the question above, Filtering, etc.. Bargain Finder
  12. 12. Information Classification: GENERAL GENERALIZING MAP-REDUCE UH OH – SOME MATH…… 12
  13. 13. Information Classification: GENERAL 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ Just for simplicity, lets assume we only care about real numbers (obviously, we could have tuples, strings, dictionaries, any valid type honestly…) Standard map reduce, Gamma is your class object/structure/thing 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ} First things first, we need a context.
  14. 14. Information Classification: GENERAL ? Yesterday Today My Portfolio is worth $43 My Portfolio is worth $40 Result I lost $3  I lost $3/1.1 = € 2.72 My Portfolio is worth € 35.83 My Portfolio is worth € 36.36 I made € .53  The reason for the error is that this is a lie. You DID NOT LOSE $3. The answer is “I have made or lost ($40 in todays context - $43 in yesterdays context)”
  15. 15. Information Classification: GENERAL Now we also toss in some services……. 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, ℂ → {ℝ, ℂ} Becomes 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, 𝑆, ℂ → {ℝ, ℂ} 𝑤ℎ𝑒𝑟𝑒 𝑆 = 𝑆1, 𝑆2, … , 𝑆𝑛 𝑜𝑢𝑟 𝑠𝑒𝑟𝑣𝑖𝑐𝑒𝑠 But what our services??? This is a functional language conference soooo, we use functions to access services. 𝑙𝑒𝑡 Ϝ = Ϝ𝑖, 𝑗 𝑎𝑙𝑙 𝑖, 𝑗 𝑤𝑖𝑡ℎ Ϝ𝑖, 𝑗: {Γ, 𝑆1,𝑆2,…., 𝑆𝑖,ℂ} → {ℝ, ℂ} 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ} So new services can leverage old services
  16. 16. Information Classification: GENERAL 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ} 𝓏: ⊕𝑖=1…𝑚 Γ, Ϝ, ℂ → 𝑘=1…𝑛 {ℝ, ℂ} Data You Have Data You Can Acquire Data You Can Model Obvious Extensions…
  17. 17. Information Classification: GENERAL 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ, Ϝ, ℂ → {ℝ, ℂ} 𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑜𝑏𝑗𝑒𝑐𝑡 𝑠𝑝𝑎𝑐𝑒 Γ, Ϝ, ℂ , 𝑏𝑢𝑡 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑜𝑡ℎ𝑒𝑟𝑠 Example: • Γ = Customer Records • F = purchasesOfWine(tenor) • ℂ = Date Example: • Γ = CountryList + wineSales • F = • weather() • totalWineSales(tenor) • ℂ = Date, weather Customer Space Country Space TRANSFORM
  18. 18. Information Classification: GENERAL 18 Customer Location Wine purchasesOfWine(tenor) Bob Spain 1/1/2019 – 3btl 15/3/2019 – 2btl Mary France 15/1/2019 – 2btl Juan Spain 12/5/2019 – 6 btl Edward England 13/4/2019 – 8 btl TRANSFORM Country Purchases totalWineSales(tenor) Weather() Spain 11 bottles France 2 bottles England 8 bottles Γ1, Ϝ1, ℂ1 Γ2, Ϝ2, ℂ2 𝓣 𝟏 Γ2, Ϝ2, ℂ2 = 𝒯1 ∘ Γ1, Ϝ1, ℂ1
  19. 19. Information Classification: GENERAL 𝓏: ⊕𝑖=1…𝑚 Γ, Ϝ, ℂ → 𝑘=1…𝑛 {ℝ, ℂ} Step 1: Step 2: 𝒯𝑘: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 →⊕𝑖=1…𝑛 Γ 𝑘 + 1, Ϝ 𝑘 + 1, ℂ 𝑘 + 1 Step 1: Step 2: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 = 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1 ∘⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1 𝓏: ⊕𝑖=1…𝑚 Γ 𝑘, Ϝ 𝑘, ℂ 𝑘 → 𝑖=1…𝑛 {ℝ, ℂ} THE FINAL FORMULA 𝑖=1…𝑚 {ℝ, ℂ} = 𝓏 ∘ 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1 ∘⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1
  20. 20. Information Classification: GENERAL WEBSMACK FRAMEWORK 20
  21. 21. Information Classification: GENERAL 21
  22. 22. Information Classification: GENERAL 𝑥 = 𝓏 ∘ 𝒯𝑘 ∘ 𝒯𝑘−1 ∘ ⋯ ∘ 𝒯1: ⊕𝑖=1…𝑛 Γ1, Ϝ1, ℂ1 → 𝑖=1…𝑚 {ℝ, ℂ} 𝑡𝑟𝑎𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 … . . 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ 𝑁𝐸𝑊 𝐴𝑁𝐷 𝐼𝑀𝑃𝑅𝑂𝑉𝐸𝐷 𝑚𝑎𝑝 𝑥 𝑤ℎ𝑒𝑟𝑒 𝑥: Γ → ℝ Evaluation Evaluation
  23. 23. Information Classification: GENERAL rootVFP |> scenario (asOf(15-05-2019)) |> Load “position” (filter(“MSCI USA – Daily”)) |> filter (instrument.ESG.WomenOnBoard = true) THIS IS NOT AN IMPERATIVE ORDERING!!!!!!!!!!!! Companies with Women on Board MSCI IBM Apple |> scenario (timeseries(Date(1,1,2019),Date(15,5,2019) ) ) Companies with Women on Board 1/1/2019 – {list of companies} 2/1/2019 – (list of companies) 3/1/2019 – (list of companies)
  24. 24. Information Classification: GENERAL THIS NATURALLY LETS YOU MAKE A 5TH GENERATION FRONT END
  25. 25. Information Classification: GENERAL 25
  26. 26. Information Classification: GENERAL HOW THE MACHINE WORKS 26
  27. 27. Information Classification: GENERAL Service API layer MSCI BEON – A NEW PARADIGM 27 Framework based on the Beon Engine Functions Library Process X I’m Process X and I can provide x Process Y I’m Process Y and I can provide y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t
  28. 28. Information Classification: GENERAL Service API layer MSCI BEON – A NEW PARADIGM 28 Everything starts with a question … Functions Library Process X Process Y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t Query API ResultSpec request
  29. 29. Information Classification: GENERAL Service API layer MSCI BEON – A NEW PARADIGM 29 The question is then expanded, compiled into byte code, and then parametrized with a context … Functions Library Process X Process Y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t Query API ResultSpec request Compiler Execution Enginea s w d t m o u c h p a s w d m o c h p a s w d c a s c Context Compiler
  30. 30. Information Classification: GENERAL Service API layer MSCI BEON – A NEW PARADIGM 30 Then executed against the various data services. Results are then recombined and presented back. Functions Library Process X Process Y Process S Process T Process C x -> ProcessX y -> ProcessY s -> ProcessS t -> ProcessT c -> ProcessC Beon Engine a = x + y b = s / t Query API ResultSpec request Compiler Execution Enginea s w d t m o u c h p a s w d m o c h p a s w d c a s c Conte xt Processing …

×