SlideShare una empresa de Scribd logo
1 de 108
Descargar para leer sin conexión
Reducers
                         A library and model for collection processing in Clojure




                                                                             Leonardo Borges
                                                                             @leonardo_borges
                                                                             http://www.leonardoborges.com
                                                                             http://www.thoughtworks.com
Thursday, 30 August 12
Reducers
                         A library and model for collection processing in Clojure


                                                                                 less
                                                                              or
                                                                     m i   ns
                                                             in 20
                                                       ...                         Leonardo Borges
                                                                                   @leonardo_borges
                                                                                   http://www.leonardoborges.com
                                                                                   http://www.thoughtworks.com
Thursday, 30 August 12
Reducers huh? Here’s the gist




Thursday, 30 August 12
Reducers huh? Here’s the gist




                         You get parallel versions of reduce, map and filter




Thursday, 30 August 12
Reducers huh? Here’s the gist




                         You get parallel versions of reduce, map and filter



                                            Ta-da! I’m done!



Thursday, 30 August 12
Reducers huh? Here’s the gist




                         You get parallel versions of reduce, map and filter



                                             Ta-da! I’m done!

                                     and well under my 20 min limit :)

Thursday, 30 August 12
Alright, alright I’m kidding




Thursday, 30 August 12
How do reducers make parallelism possible?




Thursday, 30 August 12
How do reducers make parallelism possible?



                                   • JVM’s Fork/Join framework
                                   • Reduction Transformers




Thursday, 30 August 12
Before we start - this is bleeding edge stuff
                         Java requirements

                         • Fork/Join framework
                          • Java 7 [1] or
                          • Java 6 + the JSR166 jar [2]
                         Clojure requirements

                         • 1.5.0-* (this is still MASTER on github [3] as of 30/08/2012)


                                                                       [1] - http://jdk7.java.net/
                                                                       [2] - http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar
                                                                       [3] - https://github.com/clojure/clojure
Thursday, 30 August 12
The Fork/Join Framework




Thursday, 30 August 12
The Fork/Join Framework

                         •Based on divide and conquer




Thursday, 30 August 12
The Fork/Join Framework

                         •Based on divide and conquer
                         •Work stealing algorithm




Thursday, 30 August 12
The Fork/Join Framework

                         •Based on divide and conquer
                         •Work stealing algorithm
                         •Uses deques - double ended queues.




Thursday, 30 August 12
The Fork/Join Framework

                         •Based on divide and conquer
                         •Work stealing algorithm
                         •Uses deques - double ended queues.
                         •Progressively divides the workload into tasks, up to a threshold




Thursday, 30 August 12
The Fork/Join Framework

                         •Based on divide and conquer
                         •Work stealing algorithm
                         •Uses deques - double ended queues.
                         •Progressively divides the workload into tasks, up to a threshold
                         •Once it finished one task, it pops another one form its deque




Thursday, 30 August 12
The Fork/Join Framework

                         •Based on divide and conquer
                         •Work stealing algorithm
                         •Uses deques - double ended queues.
                         •Progressively divides the workload into tasks, up to a threshold
                         •Once it finished one task, it pops another one form its deque
                         •After at least two tasks have finished, results can be combined/joined




Thursday, 30 August 12
The Fork/Join Framework

                         •Based on divide and conquer
                         •Work stealing algorithm
                         •Uses deques - double ended queues.
                         •Progressively divides the workload into tasks, up to a threshold
                         •Once it finished one task, it pops another one form its deque
                         •After at least two tasks have finished, results can be combined/joined
                         •Idle workers can pop tasks from the deques of workers which fall behind




Thursday, 30 August 12
Text is boring


Thursday, 30 August 12
Fork/Join algorithm - simplified view




Thursday, 30 August 12
Fork/Join algorithm - simplified view




   Workload is put in “deques”




Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                                         ...and progressively halved




Thursday, 30 August 12
Fork/Join algorithm - simplified view




Thursday, 30 August 12
Fork/Join algorithm - simplified view




                         ...up to a configured threshold




Thursday, 30 August 12
Fork/Join algorithm - simplified view




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                         Combine




                                    Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                         Combine




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                         Combine                            Combine




                             Worker 1                   Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                                           Combine




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                 Combine




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                           Combine




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                             Worker 1                    Worker 2

                         Idle workers can “steal” items from other workers
Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                             Combine Combine




                          Worker 1                     Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                                        Combine




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                                    Combine




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                       Combine




                          Worker 1                    Worker 2


Thursday, 30 August 12
Fork/Join algorithm - simplified view




                                      Final result




                          Worker 1                    Worker 2


Thursday, 30 August 12
Let’s talk about Reducers




Thursday, 30 August 12
Let’s talk about Reducers

                         Motivations

                         • Performance
                          • via less allocation
                          • via parallelism (leverage Fork/Join)




Thursday, 30 August 12
Let’s talk about Reducers

                         Motivations                               Issues

                         • Performance                             • Lists and Seqs are sequential
                          • via less allocation                    • map / filter implies order
                          • via parallelism (leverage Fork/Join)




Thursday, 30 August 12
A closer look at what map does
                         ;; a naive map implementation
                         (defn map [f coll]
                           (if (seq coll)
                             (cons (f (first coll)) (map f (rest coll)))
                             '()))




Thursday, 30 August 12
A closer look at what map does
                             ;; a naive map implementation
                             (defn map [f coll]
                               (if (seq coll)
                                 (cons (f (first coll)) (map f (rest coll)))
                                 '()))


                         • Recursion




Thursday, 30 August 12
A closer look at what map does
                             ;; a naive map implementation
                             (defn map [f coll]
                               (if (seq coll)
                                 (cons (f (first coll)) (map f (rest coll)))
                                 '()))


                         • Recursion
                         • Order




Thursday, 30 August 12
A closer look at what map does
                              ;; a naive map implementation
                              (defn map [f coll]
                                (if (seq coll)
                                  (cons (f (first coll)) (map f (rest coll)))
                                  '()))


                         • Recursion
                         • Order
                         • Laziness (not shown)



Thursday, 30 August 12
A closer look at what map does
                              ;; a naive map implementation
                              (defn map [f coll]
                                (if (seq coll)
                                  (cons (f (first coll)) (map f (rest coll)))
                                  '()))


                         • Recursion
                         • Order
                         • Laziness (not shown)
                         • Consumes List


Thursday, 30 August 12
A closer look at what map does
                              ;; a naive map implementation
                              (defn map [f coll]
                                (if (seq coll)
                                  (cons (f (first coll)) (map f (rest coll)))
                                  '()))


                         • Recursion
                         • Order
                         • Laziness (not shown)
                         • Consumes List
                         • Builds List

Thursday, 30 August 12
A closer look at what map does
                              ;; a naive map implementation
                              (defn map [f coll]
                                (if (seq coll)
                                  (cons (f (first coll)) (map f (rest coll)))
                                  '()))


                         • Recursion
                         • Order                        Oh, and it also applies the function
                         • Laziness (not shown)         to each item before putting the result
                         • Consumes List                into the new list
                         • Builds List

Thursday, 30 August 12
A closer look at what map does
                              ;; a naive map implementation
                              (defn map [f coll]
                                (if (seq coll)
                                  (cons (f (first coll)) (map f (rest coll)))
                                  '()))
                                                           This is what mapping means!

                         • Recursion
                         • Order                          Oh, and it also applies the function
                         • Laziness (not shown)           to each item before putting the result
                         • Consumes List                  into the new list
                         • Builds List

Thursday, 30 August 12
Reduction Transformers




Thursday, 30 August 12
Reduction Transformers


                         • Idea is to build map / filter on top of reduce to break from sequentiality




Thursday, 30 August 12
Reduction Transformers


                         • Idea is to build map / filter on top of reduce to break from sequentiality
                         • map / filter then builds nothing and consumes nothing




Thursday, 30 August 12
Reduction Transformers


                         • Idea is to build map / filter on top of reduce to break from sequentiality
                         • map / filter then builds nothing and consumes nothing
                         • It changes what reduce means to the collection by transforming the reducing
                         functions




Thursday, 30 August 12
What map is really all about
                         (defn mapping [f]
                           (fn [f1]
                             (fn [result input]
                               (f1 result (f input)))))




Thursday, 30 August 12
But wait!
                         If map doesn’t consume the list any longer, who does?

                             • reduce does!
                             • Since Clojure 1.4 reduce lets the collection reduce itself
                              (through the CollReduce / CollFold protocols)
                              • Think of what this means for tree-like structures such as
                               vectors
                             • This is key to leveraging the Fork/Join framework




Thursday, 30 August 12
Now we can use mapping to create reducing functions
                               (reduce ((mapping inc) +) 0 [1 2 3 4])
                               ;; 14




Thursday, 30 August 12
Now we can use mapping to create reducing functions
                               (reduce ((mapping inc) +) 0 [1 2 3 4])
                               ;; 14




                                    (fn [result input]
                                      (+ result (inc input)))




Thursday, 30 August 12
Now we can use mapping to create reducing functions
                             (reduce ((mapping inc) conj) [] [1 2 3 4])
                             ;; [2 3 4 5]




Thursday, 30 August 12
Now we can use mapping to create reducing functions
                             (reduce ((mapping inc) conj) [] [1 2 3 4])
                             ;; [2 3 4 5]




                                    (fn [result input]
                                      (conj result (inc input)))




Thursday, 30 August 12
Now we can use mapping to create reducing functions
                             (reduce ((mapping inc) conj) [] [1 2 3 4])
                             ;; [2 3 4 5]




                                    (fn [result input]
                                      (conj result (inc input)))


                                  But it feels awkward to use it in this form

Thursday, 30 August 12
What do we have so far?


                         • Performance has been improved due to less allocations
                          • No intermediary lists need to be built (see Haskell’s StreamFusion [4])
                         • However reduce is still sequential




                                                                                        [4] - http://bit.ly/streamFusion
Thursday, 30 August 12
Enters fold




Thursday, 30 August 12
Enters fold

                         • Takes the sequentiality out or foldl, foldr and reduce




Thursday, 30 August 12
Enters fold

                         • Takes the sequentiality out or foldl, foldr and reduce
                         • Potentially parallel (fallsback to standard reduce otherwise)




Thursday, 30 August 12
Enters fold

                         • Takes the sequentiality out or foldl, foldr and reduce
                         • Potentially parallel (fallsback to standard reduce otherwise)
                         • Reduce/Combine strategy (think Fork/Join Framework)




Thursday, 30 August 12
Enters fold

                         • Takes the sequentiality out or foldl, foldr and reduce
                         • Potentially parallel (fallsback to standard reduce otherwise)
                         • Reduce/Combine strategy (think Fork/Join Framework)
                         • Segments the collection




Thursday, 30 August 12
Enters fold

                         • Takes the sequentiality out or foldl, foldr and reduce
                         • Potentially parallel (fallsback to standard reduce otherwise)
                         • Reduce/Combine strategy (think Fork/Join Framework)
                         • Segments the collection
                         • Runs multiple reduces in parallel




Thursday, 30 August 12
Enters fold

                         • Takes the sequentiality out or foldl, foldr and reduce
                         • Potentially parallel (fallsback to standard reduce otherwise)
                         • Reduce/Combine strategy (think Fork/Join Framework)
                         • Segments the collection
                         • Runs multiple reduces in parallel
                         • Uses a combining function to join/reduce results




Thursday, 30 August 12
Enters fold

                         • Takes the sequentiality out or foldl, foldr and reduce
                         • Potentially parallel (fallsback to standard reduce otherwise)
                         • Reduce/Combine strategy (think Fork/Join Framework)
                         • Segments the collection
                         • Runs multiple reduces in parallel
                         • Uses a combining function to join/reduce results


                                    (defn fold [combinef reducef coll]
                                      ...)


Thursday, 30 August 12
The combining function is a monoid
                         • A binary function with an identity element
                         • All the following functions are equivalent monoids




Thursday, 30 August 12
The combining function is a monoid
                         • A binary function with an identity element
                         • All the following functions are equivalent monoids

                                                      +
                                                      (+ 2 3) ; 5
                                                      (+) ; 0




Thursday, 30 August 12
The combining function is a monoid
                         • A binary function with an identity element
                         • All the following functions are equivalent monoids

                                                (defn my-+
                                                  ([] 0)
                                                  ([a b] (+ a b)))

                                                (my-+ 2 3) ; 5
                                                (my-+) ; 0




Thursday, 30 August 12
The combining function is a monoid
                         • A binary function with an identity element
                         • All the following functions are equivalent monoids

                                (require ‘[clojure.core.reducers :as r])

                                (def my-+
                                  (r/monoid + (fn [] 0)))

                                (my-+ 2 3) ; 5
                                (my-+) ; 0



Thursday, 30 August 12
fold by examples


                         ;; all examples assume the reducers library
                         is available as r
                         (ns reducers-playground.core
                           (:require [clojure.core.reducers :as r]))




Thursday, 30 August 12
fold by examples:
                         increment all even positive integers up to 10 million
                                         and add them all up




Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk




Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk
                     (def my-vector (into [] (range 10000000)))




Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk
                     (def my-vector (into [] (range 10000000)))

                     (time (reduce + (map inc (filter even? my-vector))))




Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk
                     (def my-vector (into [] (range 10000000)))

                     (time (reduce + (map inc (filter even? my-vector))))
                     ;; 500msecs




Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk
                     (def my-vector (into [] (range 10000000)))

                     (time (reduce + (map inc (filter even? my-vector))))
                     ;; 500msecs

                     (time (reduce + (r/map inc (r/filter even? my-vector))))




Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk
                     (def my-vector (into [] (range 10000000)))

                     (time (reduce + (map inc (filter even? my-vector))))
                     ;; 500msecs

                     (time (reduce + (r/map inc (r/filter even? my-vector))))
                     ;; 260msecs




Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk
                     (def my-vector (into [] (range 10000000)))

                     (time (reduce + (map inc (filter even? my-vector))))
                     ;; 500msecs

                     (time (reduce + (r/map inc (r/filter even? my-vector))))
                     ;; 260msecs

                     (time (r/fold + (r/map inc (r/filter even? my-vector))))


Thursday, 30 August 12
fold by examples:
                          increment all even positive integers up to 10 million
                                          and add them all up
                     ;; these were taken from Rich’s reducers talk
                     (def my-vector (into [] (range 10000000)))

                     (time (reduce + (map inc (filter even? my-vector))))
                     ;; 500msecs

                     (time (reduce + (r/map inc (r/filter even? my-vector))))
                     ;; 260msecs

                     (time (r/fold + (r/map inc (r/filter even? my-vector))))
                     ;; 130msecs

Thursday, 30 August 12
fold by examples:
                                    standard word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB

                (defn count-words [text]
                  (reduce
                   (fn [memo word]
                      (assoc memo word (inc (get memo word 0))))
                   {}
                   (map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))




Thursday, 30 August 12
fold by examples:
                                    standard word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB

                (defn count-words [text]
                  (reduce
                   (fn [memo word]
                      (assoc memo word (inc (get memo word 0))))
                   {}
                   (map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))

                (time (count-words wiki-dump)) ;; 45 secs


Thursday, 30 August 12
fold by examples:
                                     parallel word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB

                (defn p-count-words [text]
                  (r/fold
                   (r/monoid (partial merge-with +) hash-map)
                   (fn [memo word]
                     (assoc memo word (inc (get memo word 0))))
                   (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))




Thursday, 30 August 12
fold by examples:
                                     parallel word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB

                (defn p-count-words [text]
                  (r/fold
                   (r/monoid (partial merge-with +) hash-map)        Combining fn
                   (fn [memo word]
                     (assoc memo word (inc (get memo word 0))))
                   (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))




Thursday, 30 August 12
fold by examples:
                                     parallel word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
                                                Will be called at the leaves to merge the
                (defn p-count-words [text]                partial computations
                  (r/fold
                   (r/monoid (partial merge-with +) hash-map)
                   (fn [memo word]
                     (assoc memo word (inc (get memo word 0))))
                   (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))




Thursday, 30 August 12
fold by examples:
                                     parallel word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB
                                                       Will be called with no arguments to
                (defn p-count-words [text]                     provide a seed value
                  (r/fold
                   (r/monoid (partial merge-with +) hash-map)
                   (fn [memo word]
                     (assoc memo word (inc (get memo word 0))))
                   (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))




Thursday, 30 August 12
fold by examples:
                                     parallel word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB

                (defn p-count-words [text]
                  (r/fold
                   (r/monoid (partial merge-with +) hash-map)
                   (fn [memo word]
                     (assoc memo word (inc (get memo word 0))))
                   (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))




Thursday, 30 August 12
fold by examples:
                                     parallel word count

                (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB

                (defn p-count-words [text]
                  (r/fold
                   (r/monoid (partial merge-with +) hash-map)
                   (fn [memo word]
                     (assoc memo word (inc (get memo word 0))))
                   (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))

                (time (p-count-words wiki-dump)) ;; 30 secs


Thursday, 30 August 12
fold by examples:
                               Load 100k records into PostgreSQL



                  (def records
                    (into [] (line-seq
                               (BufferedReader. (FileReader. "dump.txt")))))




Thursday, 30 August 12
fold by examples:
                                    Load 100k records into PostgreSQL


                         (time (doseq [record records]
                           (let [tokens (clojure.string/split record #"t" )]
                                  (insert users/users
                                          (values {
                                                    :account-id (nth tokens 0)
                                                    ...
                                                    })))))




Thursday, 30 August 12
fold by examples:
                                      Load 100k records into PostgreSQL


                         (time (doseq [record records]
                           (let [tokens (clojure.string/split record #"t" )]
                                  (insert users/users
                                          (values {
                                                    :account-id (nth tokens 0)
                                                    ...
                                                    })))))



                         ;; 90 secs
Thursday, 30 August 12
fold by examples:
                         Load 100k records into PostgreSQL in parallel
(time (r/fold
       +
       (r/map (fn [record]
                (let [tokens (clojure.string/split record #"t" )]
                  (do (insert users/users
                              (values {
                                        :account-id (nth tokens 0)
                                        ...
                                        }))
                      1))) records)))



Thursday, 30 August 12
fold by examples:
                         Load 100k records into PostgreSQL in parallel
(time (r/fold
       +
       (r/map (fn [record]
                (let [tokens (clojure.string/split record #"t" )]
                  (do (insert users/users
                              (values {
                                        :account-id (nth tokens 0)
                                        ...
                                        }))
                      1))) records)))


;; 50 secs
Thursday, 30 August 12
When to use it




Thursday, 30 August 12
When to use it

                         • Exploring decision trees




Thursday, 30 August 12
When to use it

                         • Exploring decision trees
                         • Image processing




Thursday, 30 August 12
When to use it

                         • Exploring decision trees
                         • Image processing
                         • As a building block for bigger, distributed systems such as Datomic and
                          Cascalog (maybe around parallel agregators)




Thursday, 30 August 12
When to use it

                         • Exploring decision trees
                         • Image processing
                         • As a building block for bigger, distributed systems such as Datomic and
                          Cascalog (maybe around parallel agregators)
                         • Basically any list intensive program




Thursday, 30 August 12
When to use it

                         • Exploring decision trees
                         • Image processing
                         • As a building block for bigger, distributed systems such as Datomic and
                          Cascalog (maybe around parallel agregators)
                         • Basically any list intensive program


                                    But the tools are available to anyone so be creative!



Thursday, 30 August 12
Resources

                         • The Anatomy of a Reducer - http://bit.ly/anatomyReducers
                         • Rich’s announcement post on Reducers - http://bit.ly/reducersANN
                         • Rich Hickey - Reducers - EuroClojure 2012 - http://bit.ly/reducersVideo
                          (this presentation was heavily inspired by this video)
                         • The Source on github - http://bit.ly/reducersCore



                                                                                      Leonardo Borges
                                                                                      @leonardo_borges
                                                                                      http://www.leonardoborges.com
                                                                                      http://www.thoughtworks.com
Thursday, 30 August 12
Thanks!




                             Questions?



                                 Leonardo Borges
                                @leonardo_borges
                         http://www.leonardoborges.com
                          http://www.thoughtworks.com

Thursday, 30 August 12

Más contenido relacionado

Más de Leonardo Borges

Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsLeonardo Borges
 
High Performance web apps in Om, React and ClojureScript
High Performance web apps in Om, React and ClojureScriptHigh Performance web apps in Om, React and ClojureScript
High Performance web apps in Om, React and ClojureScriptLeonardo Borges
 
Programação functional reativa: lidando com código assíncrono
Programação functional reativa: lidando com código assíncronoProgramação functional reativa: lidando com código assíncrono
Programação functional reativa: lidando com código assíncronoLeonardo Borges
 
Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013
Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013
Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013Leonardo Borges
 
Intro to Clojure's core.async
Intro to Clojure's core.asyncIntro to Clojure's core.async
Intro to Clojure's core.asyncLeonardo Borges
 
Functional Reactive Programming in Clojurescript
Functional Reactive Programming in ClojurescriptFunctional Reactive Programming in Clojurescript
Functional Reactive Programming in ClojurescriptLeonardo Borges
 
Clojure/West 2013 in 30 mins
Clojure/West 2013 in 30 minsClojure/West 2013 in 30 mins
Clojure/West 2013 in 30 minsLeonardo Borges
 
The many facets of code reuse in JavaScript
The many facets of code reuse in JavaScriptThe many facets of code reuse in JavaScript
The many facets of code reuse in JavaScriptLeonardo Borges
 
Heroku addons development - Nov 2011
Heroku addons development - Nov 2011Heroku addons development - Nov 2011
Heroku addons development - Nov 2011Leonardo Borges
 
Clouds Against the Floods
Clouds Against the FloodsClouds Against the Floods
Clouds Against the FloodsLeonardo Borges
 

Más de Leonardo Borges (14)

Functional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event SystemsFunctional Reactive Programming / Compositional Event Systems
Functional Reactive Programming / Compositional Event Systems
 
High Performance web apps in Om, React and ClojureScript
High Performance web apps in Om, React and ClojureScriptHigh Performance web apps in Om, React and ClojureScript
High Performance web apps in Om, React and ClojureScript
 
Programação functional reativa: lidando com código assíncrono
Programação functional reativa: lidando com código assíncronoProgramação functional reativa: lidando com código assíncrono
Programação functional reativa: lidando com código assíncrono
 
Monads in Clojure
Monads in ClojureMonads in Clojure
Monads in Clojure
 
Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013
Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013
Clojure Macros Workshop: LambdaJam 2013 / CUFP 2013
 
Intro to Clojure's core.async
Intro to Clojure's core.asyncIntro to Clojure's core.async
Intro to Clojure's core.async
 
Functional Reactive Programming in Clojurescript
Functional Reactive Programming in ClojurescriptFunctional Reactive Programming in Clojurescript
Functional Reactive Programming in Clojurescript
 
Clojure/West 2013 in 30 mins
Clojure/West 2013 in 30 minsClojure/West 2013 in 30 mins
Clojure/West 2013 in 30 mins
 
The many facets of code reuse in JavaScript
The many facets of code reuse in JavaScriptThe many facets of code reuse in JavaScript
The many facets of code reuse in JavaScript
 
Heroku addons development - Nov 2011
Heroku addons development - Nov 2011Heroku addons development - Nov 2011
Heroku addons development - Nov 2011
 
Clouds Against the Floods
Clouds Against the FloodsClouds Against the Floods
Clouds Against the Floods
 
Arel in Rails 3
Arel in Rails 3Arel in Rails 3
Arel in Rails 3
 
Testing with Spring
Testing with SpringTesting with Spring
Testing with Spring
 
JRuby in The Enterprise
JRuby in The EnterpriseJRuby in The Enterprise
JRuby in The Enterprise
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Clojure Reducers / clj-syd Aug 2012

  • 1. Reducers A library and model for collection processing in Clojure Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.com Thursday, 30 August 12
  • 2. Reducers A library and model for collection processing in Clojure less or m i ns in 20 ... Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.com Thursday, 30 August 12
  • 3. Reducers huh? Here’s the gist Thursday, 30 August 12
  • 4. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filter Thursday, 30 August 12
  • 5. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filter Ta-da! I’m done! Thursday, 30 August 12
  • 6. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filter Ta-da! I’m done! and well under my 20 min limit :) Thursday, 30 August 12
  • 7. Alright, alright I’m kidding Thursday, 30 August 12
  • 8. How do reducers make parallelism possible? Thursday, 30 August 12
  • 9. How do reducers make parallelism possible? • JVM’s Fork/Join framework • Reduction Transformers Thursday, 30 August 12
  • 10. Before we start - this is bleeding edge stuff Java requirements • Fork/Join framework • Java 7 [1] or • Java 6 + the JSR166 jar [2] Clojure requirements • 1.5.0-* (this is still MASTER on github [3] as of 30/08/2012) [1] - http://jdk7.java.net/ [2] - http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar [3] - https://github.com/clojure/clojure Thursday, 30 August 12
  • 12. The Fork/Join Framework •Based on divide and conquer Thursday, 30 August 12
  • 13. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm Thursday, 30 August 12
  • 14. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. Thursday, 30 August 12
  • 15. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold Thursday, 30 August 12
  • 16. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its deque Thursday, 30 August 12
  • 17. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its deque •After at least two tasks have finished, results can be combined/joined Thursday, 30 August 12
  • 18. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its deque •After at least two tasks have finished, results can be combined/joined •Idle workers can pop tasks from the deques of workers which fall behind Thursday, 30 August 12
  • 19. Text is boring Thursday, 30 August 12
  • 20. Fork/Join algorithm - simplified view Thursday, 30 August 12
  • 21. Fork/Join algorithm - simplified view Workload is put in “deques” Thursday, 30 August 12
  • 22. Fork/Join algorithm - simplified view ...and progressively halved Thursday, 30 August 12
  • 23. Fork/Join algorithm - simplified view Thursday, 30 August 12
  • 24. Fork/Join algorithm - simplified view ...up to a configured threshold Thursday, 30 August 12
  • 25. Fork/Join algorithm - simplified view Worker 1 Worker 2 Thursday, 30 August 12
  • 26. Fork/Join algorithm - simplified view Worker 1 Worker 2 Thursday, 30 August 12
  • 27. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 28. Fork/Join algorithm - simplified view Worker 1 Worker 2 Thursday, 30 August 12
  • 29. Fork/Join algorithm - simplified view Worker 1 Worker 2 Thursday, 30 August 12
  • 30. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 31. Fork/Join algorithm - simplified view Combine Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 32. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 33. Fork/Join algorithm - simplified view Worker 1 Worker 2 Thursday, 30 August 12
  • 34. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 35. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 36. Fork/Join algorithm - simplified view Worker 1 Worker 2 Thursday, 30 August 12
  • 37. Fork/Join algorithm - simplified view Worker 1 Worker 2 Idle workers can “steal” items from other workers Thursday, 30 August 12
  • 38. Fork/Join algorithm - simplified view Combine Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 39. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 40. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 41. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2 Thursday, 30 August 12
  • 42. Fork/Join algorithm - simplified view Final result Worker 1 Worker 2 Thursday, 30 August 12
  • 43. Let’s talk about Reducers Thursday, 30 August 12
  • 44. Let’s talk about Reducers Motivations • Performance • via less allocation • via parallelism (leverage Fork/Join) Thursday, 30 August 12
  • 45. Let’s talk about Reducers Motivations Issues • Performance • Lists and Seqs are sequential • via less allocation • map / filter implies order • via parallelism (leverage Fork/Join) Thursday, 30 August 12
  • 46. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) Thursday, 30 August 12
  • 47. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) • Recursion Thursday, 30 August 12
  • 48. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) • Recursion • Order Thursday, 30 August 12
  • 49. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) • Recursion • Order • Laziness (not shown) Thursday, 30 August 12
  • 50. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) • Recursion • Order • Laziness (not shown) • Consumes List Thursday, 30 August 12
  • 51. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) • Recursion • Order • Laziness (not shown) • Consumes List • Builds List Thursday, 30 August 12
  • 52. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) • Recursion • Order Oh, and it also applies the function • Laziness (not shown) to each item before putting the result • Consumes List into the new list • Builds List Thursday, 30 August 12
  • 53. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) '())) This is what mapping means! • Recursion • Order Oh, and it also applies the function • Laziness (not shown) to each item before putting the result • Consumes List into the new list • Builds List Thursday, 30 August 12
  • 55. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentiality Thursday, 30 August 12
  • 56. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentiality • map / filter then builds nothing and consumes nothing Thursday, 30 August 12
  • 57. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentiality • map / filter then builds nothing and consumes nothing • It changes what reduce means to the collection by transforming the reducing functions Thursday, 30 August 12
  • 58. What map is really all about (defn mapping [f] (fn [f1] (fn [result input] (f1 result (f input))))) Thursday, 30 August 12
  • 59. But wait! If map doesn’t consume the list any longer, who does? • reduce does! • Since Clojure 1.4 reduce lets the collection reduce itself (through the CollReduce / CollFold protocols) • Think of what this means for tree-like structures such as vectors • This is key to leveraging the Fork/Join framework Thursday, 30 August 12
  • 60. Now we can use mapping to create reducing functions (reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14 Thursday, 30 August 12
  • 61. Now we can use mapping to create reducing functions (reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14 (fn [result input] (+ result (inc input))) Thursday, 30 August 12
  • 62. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5] Thursday, 30 August 12
  • 63. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5] (fn [result input] (conj result (inc input))) Thursday, 30 August 12
  • 64. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5] (fn [result input] (conj result (inc input))) But it feels awkward to use it in this form Thursday, 30 August 12
  • 65. What do we have so far? • Performance has been improved due to less allocations • No intermediary lists need to be built (see Haskell’s StreamFusion [4]) • However reduce is still sequential [4] - http://bit.ly/streamFusion Thursday, 30 August 12
  • 67. Enters fold • Takes the sequentiality out or foldl, foldr and reduce Thursday, 30 August 12
  • 68. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) Thursday, 30 August 12
  • 69. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) Thursday, 30 August 12
  • 70. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection Thursday, 30 August 12
  • 71. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallel Thursday, 30 August 12
  • 72. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallel • Uses a combining function to join/reduce results Thursday, 30 August 12
  • 73. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallel • Uses a combining function to join/reduce results (defn fold [combinef reducef coll] ...) Thursday, 30 August 12
  • 74. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids Thursday, 30 August 12
  • 75. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids + (+ 2 3) ; 5 (+) ; 0 Thursday, 30 August 12
  • 76. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids (defn my-+ ([] 0) ([a b] (+ a b))) (my-+ 2 3) ; 5 (my-+) ; 0 Thursday, 30 August 12
  • 77. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids (require ‘[clojure.core.reducers :as r]) (def my-+ (r/monoid + (fn [] 0))) (my-+ 2 3) ; 5 (my-+) ; 0 Thursday, 30 August 12
  • 78. fold by examples ;; all examples assume the reducers library is available as r (ns reducers-playground.core (:require [clojure.core.reducers :as r])) Thursday, 30 August 12
  • 79. fold by examples: increment all even positive integers up to 10 million and add them all up Thursday, 30 August 12
  • 80. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk Thursday, 30 August 12
  • 81. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) Thursday, 30 August 12
  • 82. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) Thursday, 30 August 12
  • 83. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs Thursday, 30 August 12
  • 84. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) Thursday, 30 August 12
  • 85. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs Thursday, 30 August 12
  • 86. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs (time (r/fold + (r/map inc (r/filter even? my-vector)))) Thursday, 30 August 12
  • 87. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs (time (r/fold + (r/map inc (r/filter even? my-vector)))) ;; 130msecs Thursday, 30 August 12
  • 88. fold by examples: standard word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) Thursday, 30 August 12
  • 89. fold by examples: standard word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) (time (count-words wiki-dump)) ;; 45 secs Thursday, 30 August 12
  • 90. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) Thursday, 30 August 12
  • 91. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) Combining fn (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) Thursday, 30 August 12
  • 92. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB Will be called at the leaves to merge the (defn p-count-words [text] partial computations (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) Thursday, 30 August 12
  • 93. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB Will be called with no arguments to (defn p-count-words [text] provide a seed value (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) Thursday, 30 August 12
  • 94. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) Thursday, 30 August 12
  • 95. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) (time (p-count-words wiki-dump)) ;; 30 secs Thursday, 30 August 12
  • 96. fold by examples: Load 100k records into PostgreSQL (def records (into [] (line-seq (BufferedReader. (FileReader. "dump.txt"))))) Thursday, 30 August 12
  • 97. fold by examples: Load 100k records into PostgreSQL (time (doseq [record records] (let [tokens (clojure.string/split record #"t" )] (insert users/users (values { :account-id (nth tokens 0) ... }))))) Thursday, 30 August 12
  • 98. fold by examples: Load 100k records into PostgreSQL (time (doseq [record records] (let [tokens (clojure.string/split record #"t" )] (insert users/users (values { :account-id (nth tokens 0) ... }))))) ;; 90 secs Thursday, 30 August 12
  • 99. fold by examples: Load 100k records into PostgreSQL in parallel (time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records))) Thursday, 30 August 12
  • 100. fold by examples: Load 100k records into PostgreSQL in parallel (time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records))) ;; 50 secs Thursday, 30 August 12
  • 101. When to use it Thursday, 30 August 12
  • 102. When to use it • Exploring decision trees Thursday, 30 August 12
  • 103. When to use it • Exploring decision trees • Image processing Thursday, 30 August 12
  • 104. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators) Thursday, 30 August 12
  • 105. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators) • Basically any list intensive program Thursday, 30 August 12
  • 106. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators) • Basically any list intensive program But the tools are available to anyone so be creative! Thursday, 30 August 12
  • 107. Resources • The Anatomy of a Reducer - http://bit.ly/anatomyReducers • Rich’s announcement post on Reducers - http://bit.ly/reducersANN • Rich Hickey - Reducers - EuroClojure 2012 - http://bit.ly/reducersVideo (this presentation was heavily inspired by this video) • The Source on github - http://bit.ly/reducersCore Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.com Thursday, 30 August 12
  • 108. Thanks! Questions? Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.com Thursday, 30 August 12