This document discusses doing data science with Clojure. It notes that Clojure excels at structure manipulation and encoding through functions over collections without rigid data structures. This allows for composable and fast data analysis in a way that focuses on the intent through consistent APIs and currying. Live programming is also discussed as a way to catch errors early and enable faster iteration through more context and easier debugging. The ecosystem of Clojure tools is presented as facilitating tasks like machine learning, plotting, and using notebooks as dashboards.
3. The analytics chasm
Ideal. Almost real-time, can
be done during brainstorming
without disrupting flow
< 2min < 20min project
squeeze in
somewhere
in the day
fail
roadmap
ahoy!
4. Easy things should be easy,
and hard things should be
possible.
— L. Wall
5. Data frames considered
harmful
• Data frame (=table) conflates representation and
abstraction
• Clojure excels in structure manipulation/encoding
6. github.com/sbelak/huri
• No data structures, just functions over collections
• Composable (even DSLs — no macros!)
• Reasonably fast (transducers <3)
• Do-what-I-mean (auto-sort, liberal with inputs, …)
• Minimal buy-in
• Support reaching into nested structures everywhere
7. Composability is key to
quick iterating
• Curried versions where possible
• ->> and partial friendly
• Side benefit: consistent API
8. “This is possibly Clojure’s most important
property: the syntax expresses the code’s
semantic layers. An experienced reader of
Clojure can skip over most of the code and
have a lossless understanding of its high-
level intent.”
— Z. Tellman, Elements of Clojure
21. huri.plot
• DSL that compiles to ggplot2
• Targets Gorilla REPL
• Follows the rest of Huri’s design philosophy
• bar chart, scatter plot, line chart, box & violin plot,
heatmap, histogram
22.
23. Takeouts
• Speed-of-answer matters
• Data science is about communication
• We don’t have to reinvent every wheel in Clojure
• Clojure is fantastic at structure manipulation, play
to its strengths
• Blurring the line between environment and work is
a powerful idea