Dask Tutorial at PyConDE / PyData Karlsruhe 2018. These were the introductory slides that mainly contain the link to Matthew Rocklin's Dask workshop at PyData NYC 2018 whereon this workshop was based.
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Scalable Scientific Computing with Dask
1. 1
PyCon.DE / PyData Karlsruhe 2018
Uwe L. Korn
Scalable Scientific Computing with
Dask
2. 2
• Senior Data Scientist at Blue Yonder
(@BlueYonderTech)
• Apache {Arrow, Parquet} PMC
• Data Engineer and Architect with heavy
focus around Pandas
About me
xhochy
mail@uwekorn.com
3. 3
• Execution and definition of task graphs
• a parallel computing library that scales the existing Python ecosystem.
• scales down to your laptop laptop
• sclaes up to a cluster
What is Dask?
4. 4
• multi-core and distributed parallel execution
• low-level: task schedulers for computation graphs
• high-level: Array, Bag and DataFrame
More than a single CPU
5. 5
Dask is
• More light-weight
• In Python, operates well with C/C++/Fortran/LLVM or other natively
compiled code
• Part of the Python ecosystem
What about Spark?
6. 6
Spark is
• Written in Scala and works well within the JVM
• Python support is very limited
• Brings its own ecosystem
• Able to provide more higher level optimizations
What about Spark?