SlideShare una empresa de Scribd logo
1 de 40
Cascalog Workshop
Example query
Execution

1. Pre-aggregation
2. Aggregation
3. Post-aggregation
Variable dependencies
Pre-aggregation
• Start from generator variables
• Resolve as many variables as possible using:
 • Joins
 • Functions
• Use as many filters as possible
• Join all sources into one set of tuples
Aggregation


• Group by resolved output variables
• Apply all aggregators to each group
Post-aggregation


• Resolve the rest of the variables
• Apply rest of filters
Example query
Query planner




 Start with generators
Query planner

          [?person2 ?age2 ?double-age2]




Add functions and filters until fixed point
Query planner

  [?person2 ?age2 ?double-age2]

   [?person1 ?person2 ?age2 ?double-age2]




       Do a join
Query planner

          [?person2 ?age2 ?double-age2]

           [?person1 ?person2 ?age2 ?double-age2]




Add functions and filters until fixed point
Query planner

                              [?person2 ?age2 ?double-age2]

                               [?person1 ?person2 ?age2 ?double-age2]

[?person1 ?age1 ?person2 ?age2 ?double-age2]




                                   Do a join
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]




[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

               Add functions and filters until fixed point
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta


[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                 Group by already satisfied output vars
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                    Execute aggregators on each group
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]

               Add functions and filters until fixed point
Query planner

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
Cascading pipes

• Each: can occur in Map or Reduce
• GroupBy: Causes a Reduce step
• Every: One or more follow GroupBy
• CoGroup: Join implementation, causes
  Reduce step
To Cascading
To Cascading
              Each


 [?person2 ?age2 ?double-age2]
To Cascading

 [?person2 ?age2 ?double-age2]
                             CoGroup
   [?person1 ?person2 ?age2 ?double-age2]
To Cascading

                              [?person2 ?age2 ?double-age2]

                               [?person1 ?person2 ?age2 ?double-age2]
  CoGroup
[?person1 ?age1 ?person2 ?age2 ?double-age2]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]
                      Each


                       Each


[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta
                                                      GroupBy
[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]

                                                                                       Every
                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                    Execute aggregators on each group
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]
                                                                             Each

[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
To Cascading

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
                                                                                 Each
                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]
                                                                            Job 1
                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]

   Job 2                           [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]


                                                       Project fields to [?delta ?count]
To MapReduce

                                 [?person2 ?age2 ?double-age2]

                                   [?person1 ?person2 ?age2 ?double-age2]

   [?person1 ?age1 ?person2 ?age2 ?double-age2]


                                        Group by ?delta              [?delta ?count]



[?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
                                                        Job 3
                                                       Project fields to [?delta ?count]
defmapop
[A1, B1, C1]                            [A1, B1, C1, D1, E1]



[A2, B2, C2]                            [A2, B2, C2, D2, E2]



[A3, B3, C3]                            [A3, B3, C3, D3, E3]



               Appends fields to tuple
deffilterop
[A1, B1, C1]     true
                            [A1, B1, C1]
[A2, B2, C2]     false      [A3, B3, C3]


[A3, B3, C3]     true
defmapcatop
                      [    [“a red dog”, “a”]
                                                               [“a red dog”, “a”]
[“a red dog”]             [“a red dog”, “red”]
                          [“a red dog”, “dog”]   ]            [“a red dog”, “red”]

   [“ ”]                          []                          [“a red dog”, “dog”]

                                                               [“hello”, “hello”]
  [“hello”]           [    [“hello”, “hello”]    ]
                Map                                  Concat
Aggregators
[“key1”, 1]         [“key1”, 1]
                                       [“key1”, 3]
[“key3”, 3]         [“key1”, 2]

Map Task 1         Reduce Task 1


[“key2”, 3]         [“key2”, 3]
                                       [“key2”, 3]
[“key1”, 2]         [“key3”, 3]
                                      [“key3”, 4]
[“key3”, 1]         [“key3”, 1]
Map Task 2         Reduce Task 2


Regular aggregators - all data goes to reducers
defparallelagg
 [“nathan”]           [“nathan”, 1]
                                                [“nathan”, 2]
  [“alice”]            [“alice”, 1]                                 [“nathan”, 3]
                                                  [“alice”, 1]
 [“nathan”]           [“nathan”, 1]
  Map Task 1            Map Task 1                Map Task 1        Reduce Task 1
                                      Combine            Combine
               Init
                                       (Map)             (Reduce)
                                                                    [“sally”, 1]
 [“nathan”]           [“nathan”, 1]             [“nathan”, 1]
                                                                    [“alice”, 1]
  [“sally”]            [“sally”, 1]              [“sally”, 1]
 Map Task 2             Map Task 2                 Map Task 2       Reduce Task 2


Parallel aggregators - partial aggregation done in mappers
combine
[1]             [3]

[2]             [4]

[3]             [5]


        [1]

        [2]

        [3]
        [3]
        [4]

        [5]
union
[1]           [3]

[2]           [4]

[3]           [5]


       [1]

       [2]

       [3]

       [4]

       [5]
ElephantDB
                                   Shard 0
                                   Shard 1
                                   Shard 2       Distributed
Key/Value pairs
                                   Shard 3       Filesystem
                    Pre-shard      Shard 4
                   and index in
                                   Shard 5
                   MapReduce


                  Generation of domain of data
ElephantDB
DFS                       ElephantDB
                             Server
Shard 0
Shard 1
Shard 2                   ElephantDB
                             Server
Shard 3
Shard 4
Shard 5                   ElephantDB
                             Server


     Serving domain of data

Más contenido relacionado

Destacado

Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...
Vicki Shaw
 

Destacado (8)

Lab safety 12_10_13
Lab safety 12_10_13Lab safety 12_10_13
Lab safety 12_10_13
 
ebay for Beginners
ebay for Beginnersebay for Beginners
ebay for Beginners
 
Hands-On LinkedIn for Beginners
Hands-On LinkedIn for BeginnersHands-On LinkedIn for Beginners
Hands-On LinkedIn for Beginners
 
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticasAprendiendo sobre las emociones de los pacientes mediante obras artísticas
Aprendiendo sobre las emociones de los pacientes mediante obras artísticas
 
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory ManagementQuantifying the Performance of Garbage Collection vs. Explicit Memory Management
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
 
Power tecnologia
Power tecnologiaPower tecnologia
Power tecnologia
 
Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016Missao Piaui Diario da Serra 2016
Missao Piaui Diario da Serra 2016
 
Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...Reasons for foreign listings by South African junior mining and exploration c...
Reasons for foreign listings by South African junior mining and exploration c...
 

Más de nathanmarz

Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
nathanmarz
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
nathanmarz
 

Más de nathanmarz (17)

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineering
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrong
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
 
Storm
StormStorm
Storm
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
ElephantDB
ElephantDBElephantDB
ElephantDB
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Cascalog at Strange Loop
Cascalog at Strange LoopCascalog at Strange Loop
Cascalog at Strange Loop
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Day
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
Cascalog
CascalogCascalog
Cascalog
 
Cascading
CascadingCascading
Cascading
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Cascalog workshop

  • 5. Pre-aggregation • Start from generator variables • Resolve as many variables as possible using: • Joins • Functions • Use as many filters as possible • Join all sources into one set of tuples
  • 6. Aggregation • Group by resolved output variables • Apply all aggregators to each group
  • 7. Post-aggregation • Resolve the rest of the variables • Apply rest of filters
  • 9. Query planner Start with generators
  • 10. Query planner [?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  • 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
  • 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  • 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
  • 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisfied output vars
  • 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  • 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 19. Cascading pipes • Each: can occur in Map or Reduce • GroupBy: Causes a Reduce step • Every: One or more follow GroupBy • CoGroup: Join implementation, causes Reduce step
  • 21. To Cascading Each [?person2 ?age2 ?double-age2]
  • 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
  • 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup [?person1 ?age1 ?person2 ?age2 ?double-age2]
  • 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  • 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  • 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project fields to [?delta ?count]
  • 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  • 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project fields to [?delta ?count]
  • 32. defmapop [A1, B1, C1] [A1, B1, C1, D1, E1] [A2, B2, C2] [A2, B2, C2, D2, E2] [A3, B3, C3] [A3, B3, C3, D3, E3] Appends fields to tuple
  • 33. deffilterop [A1, B1, C1] true [A1, B1, C1] [A2, B2, C2] false [A3, B3, C3] [A3, B3, C3] true
  • 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”] [“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
  • 35. Aggregators [“key1”, 1] [“key1”, 1] [“key1”, 3] [“key3”, 3] [“key1”, 2] Map Task 1 Reduce Task 1 [“key2”, 3] [“key2”, 3] [“key2”, 3] [“key1”, 2] [“key3”, 3] [“key3”, 4] [“key3”, 1] [“key3”, 1] Map Task 2 Reduce Task 2 Regular aggregators - all data goes to reducers
  • 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2 Parallel aggregators - partial aggregation done in mappers
  • 37. combine [1] [3] [2] [4] [3] [5] [1] [2] [3] [3] [4] [5]
  • 38. union [1] [3] [2] [4] [3] [5] [1] [2] [3] [4] [5]
  • 39. ElephantDB Shard 0 Shard 1 Shard 2 Distributed Key/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  • 40. ElephantDB DFS ElephantDB Server Shard 0 Shard 1 Shard 2 ElephantDB Server Shard 3 Shard 4 Shard 5 ElephantDB Server Serving domain of data

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n