SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Anna Holschuh, Target
Extending Apache Spark APIs
Without Going Near Spark Source
or a Compiler
#DevSAIS19
What This Talk is About
• Scala programming constructs
• Functional programming paradigms
• Tips for organizing code in production systems
2#DevSAIS19
Who am I
• Lead Data Engineer at Target since 2016
• Deep love of all things Target
• Primary career focus has been building backend
systems with a personal passion for Machine Learning
problems
• Started working in Spark in 2015
3#DevSAIS19
Agenda
• Motivation
• Scala’s “Enrich My Library” Pattern
• An Example
• Other Uses
4#DevSAIS19
Agenda
• Motivation
• Scala’s “Enrich My Library” Pattern
• An Example
• Other Uses
5#DevSAIS19
Motivation
Let’s go through an example…
• We have a system of Authors, Articles, and
Comments on those Articles
• From the example, Spark/Scala lends itself
well to functional programming paradigms
• What happens when the system grows in
size/complexity and it becomes necessary
to inject more custom code into the mix?
• Can we keep things concise, readable, and
efficient using the same functional style of
code development?
6#DevSAIS19
Motivation
Functional Programming Refresher
• Declarative style of writing code (vs.
Imperative)
• Favors composition with functions
• Avoids shared state, mutability, and side
effects.
7#DevSAIS19
Motivation
A Validation Framework was born…
• Tasked with building an on-demand
computation system consuming various
data sources
• There were many ways for this data to go
wrong
• Needed a way to fail fast and in a
predictable way when a certain bar for
quality was not being met
8#DevSAIS19
Motivation
A Validation Framework was born…
• Desired ability to “sprinkle” .validate() calls
throughout our existing Spark ETL code
9#DevSAIS19
This is possible with
Scala’s
“Enrich My Library”
Pattern
Agenda
• Motivation
• Scala’s “Enrich My Library” Pattern
• An Example
• Other Uses
10#DevSAIS19
“Enrich My Library”
A Scala programming pattern…
• Allows us to augment existing APIs
• Analogous features in other languages
• Also known as “Pimp My Library” for
Googling purposes
• Syntactic sugar that uses implicit classes
to guide the compiler
11#DevSAIS19
Reference: https://docs.scala-lang.org/overviews/core/implicit-classes.html
“Enrich My Library”
What are implicits?
Scala supports a keyword “implicit” that allows
the compiler to implicitly make connections at
compile-time as opposed to explicitly having to
call a function or feed in a variable. Scala
supports implicit values, parameters, functions,
and classes.
What is an implicit class?
Introduced formally with Scala 2.10 although
it’s possible to achieve the same effect in
previous versions through different constructs.
Allows extension of classes one normally
wouldn’t have access to.
12#DevSAIS19
Reference: https://docs.scala-lang.org/overviews/core/implicit-classes.html
Agenda
• Motivation
• Scala’s “Enrich My Library” Pattern
• An Example
• Other Uses
13#DevSAIS19
An Example
Back to our example…
How do we go from
THIS
14#DevSAIS19
Motivation
15#DevSAIS19
Back to our example…
To
THIS
An Example
16#DevSAIS19
Step 1: Build a Validation class to
work with
• Abstract class parameterized with type T
representing the object type that we plan to
validate
• Contains metadata relevant to running a
validation
• Has an abstract .execute() method to be filled
in by concrete subclasses
• Contains a concrete implementation
.performValidation() that calls on the abstract
execute method
An Example
17#DevSAIS19
Step 2: Add an implicit class to allow
the decoration of existing types with
new methods
• The class can be named anything
• It must be nested in a package or object
• It can only have one parameter. This defines
what class it’s augmenting.
• Extra arguments can be passed through the
implicit parameter list.
• .validate() delegates back to the validation object
being passed into the method and uses the
object being decorated to carry out the
validation.
An Example
18#DevSAIS19
Step 3: Define a validation
• Our validation extends a Validation typed with
Dataset[Article]
• It fills in the abstract method .execute() which
defines what the validation is checking for
• This means that any time the compiler finds a
Dataset[Article] type, we can call .validate() on
it with this validation supplied because of our
implicit class
• Roughly 20 lines of concise and isolated code
is nicely separated from the core ETL job
An Example
19#DevSAIS19
Step 4: Instantiate your validation
and pull it in scope
• This is what triggers the compiler to link
Datasets of Articles to the .validate()
method through the defined implicit class
An Example
20#DevSAIS19
Step 5: Don’t forget Unit Tests
• It is straightforward to develop concise
and isolated unit tests for each validation
that is developed
• ScalaTest with FunSpec are used to
achieve BDD-style tests
An Example
21#DevSAIS19
Step 6: And we’re done!
• We have been able to develop concise,
isolated, testable code that can fit
seamlessly into existing Spark jobs
• Data is messy, and we have the ability to
address this problem in an elegant way
• “Enrich my library” has allowed us to
extend Spark APIs so we can stay true to
functional programming paradigms
Agenda
• Motivation
• Scala’s “Enrich My Library” Pattern
• An Example
• Other Uses
22#DevSAIS19
Other Uses
23#DevSAIS19
Code organization and readability
• Move long blocks of related ETL code into
implicit class function definitions to help
organize code
Other Uses
24#DevSAIS19
Support other common functionalities
used in production systems
ü Validations
• Metrics Collection
• Logging
• Checkpointing
• Notifications
• …
Disclaimer
These are powerful programming constructs that
can greatly increase productivity and enable the
buildout of concise and elegant framework code.
Overuse can lead to cryptic and esoteric systems
that can cause engineers great pain and suffering.
Find the right balance!
25#DevSAIS19
Takeaways
• The “Enrich My Library” programming pattern
enables concise, clean, and readable code
• It enabled us to create a framework that supports
rapid development of new validations with a
relatively small amount of code
• The resulting code is isolated, testable, and easy to
understand
26#DevSAIS19
Come Work At Target
• We are hiring in Data Science and Data Engineering
• Solve real-world problems ranging from supply chain
logistics to smart stores to personalization and so on
• Offices in…
o Sunnyvale, CA
o Minneapolis, MN
o Pittsburgh, PA
o Bangalore, India
27#DevSAIS19
work somewhere you
Acknowledgements
• Thank you Spark Summit
• Thank you Target
• Thank you wonderful team members at Target
• Thank you vibrant Spark and Scala communities
28#DevSAIS19
QUESTIONS
29#DevSAIS19
annamaria.holschuh@target.com

Más contenido relacionado

La actualidad más candente

Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Spark Summit
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowDatabricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkDatabricks
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesDatabricks
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkDatabricks
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkDatabricks
 
Insights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingInsights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingDatabricks
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiDatabricks
 
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer AgarwalDatabricks
 

La actualidad más candente (20)

Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
A Collaborative Data Science Development Workflow
A Collaborative Data Science Development WorkflowA Collaborative Data Science Development Workflow
A Collaborative Data Science Development Workflow
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis Gkoufas
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
Insights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingInsights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured Streaming
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 

Similar a Extending Apache Spark APIs Without Going Near Spark Source or a Compiler with Anna Holschuh

Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryDatabricks
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019Istvan Rath
 
Azure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusabilityAzure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusabilityStephane Lapointe
 
Evolving IGN’s New APIs with Scala
 Evolving IGN’s New APIs with Scala Evolving IGN’s New APIs with Scala
Evolving IGN’s New APIs with ScalaManish Pandit
 
Mastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksMastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksGaurav Singh
 
7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflows7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflowsWisecube AI
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer PresentationDamien Dallimore
 
Setting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsSetting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsDaniel Stange
 
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdfcNguyn506241
 
CodeIgniter for Startups, cicon2010
CodeIgniter for Startups, cicon2010CodeIgniter for Startups, cicon2010
CodeIgniter for Startups, cicon2010Joel Gascoigne
 
Testing with laravel
Testing with laravelTesting with laravel
Testing with laravelDerek Binkley
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredAcquia
 
Beyond Domino Designer
Beyond Domino DesignerBeyond Domino Designer
Beyond Domino DesignerPaul Withers
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificDatabricks
 
Add-On Development: EE Expects that Every Developer will do his Duty
Add-On Development: EE Expects that Every Developer will do his DutyAdd-On Development: EE Expects that Every Developer will do his Duty
Add-On Development: EE Expects that Every Developer will do his Dutyreedmaniac
 
Integrating Machine Learning Capabilities into your team
Integrating Machine Learning Capabilities into your teamIntegrating Machine Learning Capabilities into your team
Integrating Machine Learning Capabilities into your teamCameron Vetter
 

Similar a Extending Apache Spark APIs Without Going Near Spark Source or a Compiler with Anna Holschuh (20)

Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous Delivery
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019
 
Azure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusabilityAzure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusability
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Evolving IGN’s New APIs with Scala
 Evolving IGN’s New APIs with Scala Evolving IGN’s New APIs with Scala
Evolving IGN’s New APIs with Scala
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Mastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksMastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net Tricks
 
7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflows7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflows
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer Presentation
 
Setting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce AppsSetting Up CircleCI Workflows for Your Salesforce Apps
Setting Up CircleCI Workflows for Your Salesforce Apps
 
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
49.INS2065.Computer Based Technologies.TA.NguyenDucAnh.pdf
 
CodeIgniter for Startups, cicon2010
CodeIgniter for Startups, cicon2010CodeIgniter for Startups, cicon2010
CodeIgniter for Startups, cicon2010
 
Testing with laravel
Testing with laravelTesting with laravel
Testing with laravel
 
Tips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding RequiredTips for Tuning Solr Search: No Coding Required
Tips for Tuning Solr Search: No Coding Required
 
Beyond Domino Designer
Beyond Domino DesignerBeyond Domino Designer
Beyond Domino Designer
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher ScientificEnabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
 
Agile sites2
Agile sites2Agile sites2
Agile sites2
 
Add-On Development: EE Expects that Every Developer will do his Duty
Add-On Development: EE Expects that Every Developer will do his DutyAdd-On Development: EE Expects that Every Developer will do his Duty
Add-On Development: EE Expects that Every Developer will do his Duty
 
presentation
presentationpresentation
presentation
 
Integrating Machine Learning Capabilities into your team
Integrating Machine Learning Capabilities into your teamIntegrating Machine Learning Capabilities into your team
Integrating Machine Learning Capabilities into your team
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 

Último (20)

Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 

Extending Apache Spark APIs Without Going Near Spark Source or a Compiler with Anna Holschuh

  • 1. Anna Holschuh, Target Extending Apache Spark APIs Without Going Near Spark Source or a Compiler #DevSAIS19
  • 2. What This Talk is About • Scala programming constructs • Functional programming paradigms • Tips for organizing code in production systems 2#DevSAIS19
  • 3. Who am I • Lead Data Engineer at Target since 2016 • Deep love of all things Target • Primary career focus has been building backend systems with a personal passion for Machine Learning problems • Started working in Spark in 2015 3#DevSAIS19
  • 4. Agenda • Motivation • Scala’s “Enrich My Library” Pattern • An Example • Other Uses 4#DevSAIS19
  • 5. Agenda • Motivation • Scala’s “Enrich My Library” Pattern • An Example • Other Uses 5#DevSAIS19
  • 6. Motivation Let’s go through an example… • We have a system of Authors, Articles, and Comments on those Articles • From the example, Spark/Scala lends itself well to functional programming paradigms • What happens when the system grows in size/complexity and it becomes necessary to inject more custom code into the mix? • Can we keep things concise, readable, and efficient using the same functional style of code development? 6#DevSAIS19
  • 7. Motivation Functional Programming Refresher • Declarative style of writing code (vs. Imperative) • Favors composition with functions • Avoids shared state, mutability, and side effects. 7#DevSAIS19
  • 8. Motivation A Validation Framework was born… • Tasked with building an on-demand computation system consuming various data sources • There were many ways for this data to go wrong • Needed a way to fail fast and in a predictable way when a certain bar for quality was not being met 8#DevSAIS19
  • 9. Motivation A Validation Framework was born… • Desired ability to “sprinkle” .validate() calls throughout our existing Spark ETL code 9#DevSAIS19 This is possible with Scala’s “Enrich My Library” Pattern
  • 10. Agenda • Motivation • Scala’s “Enrich My Library” Pattern • An Example • Other Uses 10#DevSAIS19
  • 11. “Enrich My Library” A Scala programming pattern… • Allows us to augment existing APIs • Analogous features in other languages • Also known as “Pimp My Library” for Googling purposes • Syntactic sugar that uses implicit classes to guide the compiler 11#DevSAIS19 Reference: https://docs.scala-lang.org/overviews/core/implicit-classes.html
  • 12. “Enrich My Library” What are implicits? Scala supports a keyword “implicit” that allows the compiler to implicitly make connections at compile-time as opposed to explicitly having to call a function or feed in a variable. Scala supports implicit values, parameters, functions, and classes. What is an implicit class? Introduced formally with Scala 2.10 although it’s possible to achieve the same effect in previous versions through different constructs. Allows extension of classes one normally wouldn’t have access to. 12#DevSAIS19 Reference: https://docs.scala-lang.org/overviews/core/implicit-classes.html
  • 13. Agenda • Motivation • Scala’s “Enrich My Library” Pattern • An Example • Other Uses 13#DevSAIS19
  • 14. An Example Back to our example… How do we go from THIS 14#DevSAIS19
  • 16. An Example 16#DevSAIS19 Step 1: Build a Validation class to work with • Abstract class parameterized with type T representing the object type that we plan to validate • Contains metadata relevant to running a validation • Has an abstract .execute() method to be filled in by concrete subclasses • Contains a concrete implementation .performValidation() that calls on the abstract execute method
  • 17. An Example 17#DevSAIS19 Step 2: Add an implicit class to allow the decoration of existing types with new methods • The class can be named anything • It must be nested in a package or object • It can only have one parameter. This defines what class it’s augmenting. • Extra arguments can be passed through the implicit parameter list. • .validate() delegates back to the validation object being passed into the method and uses the object being decorated to carry out the validation.
  • 18. An Example 18#DevSAIS19 Step 3: Define a validation • Our validation extends a Validation typed with Dataset[Article] • It fills in the abstract method .execute() which defines what the validation is checking for • This means that any time the compiler finds a Dataset[Article] type, we can call .validate() on it with this validation supplied because of our implicit class • Roughly 20 lines of concise and isolated code is nicely separated from the core ETL job
  • 19. An Example 19#DevSAIS19 Step 4: Instantiate your validation and pull it in scope • This is what triggers the compiler to link Datasets of Articles to the .validate() method through the defined implicit class
  • 20. An Example 20#DevSAIS19 Step 5: Don’t forget Unit Tests • It is straightforward to develop concise and isolated unit tests for each validation that is developed • ScalaTest with FunSpec are used to achieve BDD-style tests
  • 21. An Example 21#DevSAIS19 Step 6: And we’re done! • We have been able to develop concise, isolated, testable code that can fit seamlessly into existing Spark jobs • Data is messy, and we have the ability to address this problem in an elegant way • “Enrich my library” has allowed us to extend Spark APIs so we can stay true to functional programming paradigms
  • 22. Agenda • Motivation • Scala’s “Enrich My Library” Pattern • An Example • Other Uses 22#DevSAIS19
  • 23. Other Uses 23#DevSAIS19 Code organization and readability • Move long blocks of related ETL code into implicit class function definitions to help organize code
  • 24. Other Uses 24#DevSAIS19 Support other common functionalities used in production systems ü Validations • Metrics Collection • Logging • Checkpointing • Notifications • …
  • 25. Disclaimer These are powerful programming constructs that can greatly increase productivity and enable the buildout of concise and elegant framework code. Overuse can lead to cryptic and esoteric systems that can cause engineers great pain and suffering. Find the right balance! 25#DevSAIS19
  • 26. Takeaways • The “Enrich My Library” programming pattern enables concise, clean, and readable code • It enabled us to create a framework that supports rapid development of new validations with a relatively small amount of code • The resulting code is isolated, testable, and easy to understand 26#DevSAIS19
  • 27. Come Work At Target • We are hiring in Data Science and Data Engineering • Solve real-world problems ranging from supply chain logistics to smart stores to personalization and so on • Offices in… o Sunnyvale, CA o Minneapolis, MN o Pittsburgh, PA o Bangalore, India 27#DevSAIS19 work somewhere you
  • 28. Acknowledgements • Thank you Spark Summit • Thank you Target • Thank you wonderful team members at Target • Thank you vibrant Spark and Scala communities 28#DevSAIS19