5. Accidental complexity in existing tools
Pig The query language is different
than the programming language
Hive
6. When query tool is separate from
programming language
Friction when embedding custom operations
Interlacing queries with regular application logic
is unnatural
Generating queries dynamically is difficult
7. Clojure
General purpose programming language
Dialect of Lisp that compiles to Java bytecode
“Programmable programming language”: Easy to
build Domain Specific Languages (DSL) in Clojure
10. Cascalog
Full power of a general purpose
programming language available at all times
11. Cascalog
Full power of a general purpose
programming language available at all times
Cascalog is a Clojure library
Example query: (?<- (stdout) [?p ?a] (age ?p 25))
13. Some of Cascalog’s features
Inner and outer joins
Aggregators
Functions
Subqueries
Sorting
Read from and write to arbitrary data sources
› HDFS
› HBase
› MySQL
› Etc.
14. When query tool is separate from
programming language
Friction when embedding custom operations
Interlacing queries with regular application logic
is unnatural
Generating queries dynamically is difficult
15. Cascalog, on the other hand...
Custom operations defined just like any other
function
Interlacing queries with regular application logic
is trivial
Generating queries dynamically is easy and
idiomatic
16. Try Cascalog yourself!
Project Page
http://www.github.com/nathanmarz/cascalog
Introductory Tutorial
http://nathanmarz.com/blog/introducing-
cascalog/
5 minutes to install Clojure, Hadoop, and
Cascalog locally! See project README
18. More benefits to being Clojure DSL
Excellent module system
Interactive REPL
Make use of any Clojure function in queries
Notas del editor
This is the Title slide.
Please use the name of the presentation that was used in the abstract submission.
This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
This is a topic/content slide. Duplicate as many of these as are needed. Generally, there is one slide per three minutes of talk time.
- UDFs, custom duct tape for registering and finding dependencies, separate files
- separate files, testing?, error handling
- things that you didn&#x2019;t think were possible become idiomatic. compose queries, parameterize, pass queries and operations around
This is the final slide; generally for questions at the end of the talk.
Please post your contact information here.