Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Python meetup
1. The Snake in Your Data
How Python is Used Today by Data Science Teams
Matt Price
Principal Research Engineer
2019.09.24
2. 2SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
3. 3
About ZeroFOX
It’s a Digital World. Engage Securely.
Our Mission
ZeroFOX exists to protect digital engagement
Our Story
ZeroFOX was founded with the goal of creating
customer champions
With global reach and operation centers in the
United States, United Kingdom, Chile and India,
ZeroFOX provides best in class software, support
and services to organizations of all sizes.
Most Recognized. Most Awarded.
4. 4
Social and Digital Channels
Your Organization
Domains | Executives | VIP’s | Employees | Brands | Locations
AI-Driven Analysis
Automated Analysis | Alerts | Reporting
Human-Driven Analysis
ZeroFOX OnWatch™ | ZeroFOX Alpha Team
Remediation
Takedown-as-a-Service™
Complete Digital Visibility & Protection
The ZeroFOX
Platform
Identify
Risks on social and
digital platforms
Protect
What matters to
your organization
Remediate
Threats to your brand
and business
Protection
Identification
Analysis
Remediation
5. 5SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
6. 6SLIDE
The Data Science Lifecycle
● Each stage builds on subsequent
stages
● Most effort is around data
collection efforts
● Iterative process
● Python is used throughout the
entire workflow
7. 7SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
8. 8SLIDE
ZeroFOX AI
Machine
Learning
Deep
Learning
Artificial Intelligence
NLP CV
Artificial Intelligence (AI)
The simulation of intelligent behavior
in machines
AI Techniques
Machine Learning (ML)
Study and use of algorithms and
statistical models that learn from data
Deep Learning
A technique within ML that uses
“large” Neural Networks
9. 9SLIDE
ZeroFOX Data Science Architecture
● Tied into production data ingest
● Feedback loop from analysts
● Labeling is open to the entire
company
● Architecture is optimized for quick
iterations
10. 10SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
11. 11SLIDE
Python Tooling Categories
Data manipulation
Data structures and data transformations
Data visualization
Understanding what the data is
Modeling
Teaching machines to learn the underlying patterns in the data
Deployment
Integrating with the platform and making models available to the end customer
12. 12SLIDE
Data Manipulation Tools
● Multi-dimensional arrays and matrices
● High level mathematical functions
● Fast, vectorized operations
● Multi-dimensional matrices wrapped in DataFrames
● Time series logic and operations
● Data analysis functions and tools
● CV and ML library
● Fast operations - focus on real-time video
● Low level operations
● PIL fork
● General image processing library
● High level operations
14. 14SLIDE
Data Visualization Tools
● Interactive computing via notebooks
● Kernels run code and return output
● Focus on scientific computing
● Plotting library
● Low level plotting interface
● Compatible with a number of GUI toolkits
● Built on top of matplotlib
● High level plotting interface
● Categorical variable support
● Framework for building data visualization apps
● Open source and enterprise versions
● Interactive charts
16. 16SLIDE
Modeling Tools
● Solves the labeling problem
● Enables active learning
● Programmatic workflow definitions
● Extremely flexible
prodigy
● Machine learning and data analysis library
● Built on top of NumPy, SciPy, LIBSVM, and matplotlib
● Number of various scikits available
● High level deep learning library
● Serves as an interface to lower level backends
● Tensorflow supplies low level building blocks
● Pre-defined models
● Production-focused NLP framework
● Deep learning models powered by Thinc
● Define pipeline which outputs annotated
documents
17. 17SLIDE
ZeroFOX Data Science Architecture
Prodigy
Prodigy
Scikit-learn
Prodigy
Keras + Tensorflow
spaCy
Scikit-learn
Keras + Tensorflow
spaCy
Scikit-learn
18. 18SLIDE
Deployment
● Web server and framework focused on
high performance
● Secondarily focused on ease of use
● Flask-like framework API
● Decent extension ecosystem
● Python 3.6+ (heavily relies on async/await)
● MVC web framework
● Focused on easing development of
database-driven websites
● Large extension ecosystem
● CRUD interface for administrative tasks
20. 20SLIDE
Agenda
● About ZeroFOX
● The Data Science Lifecycle
● Data Science at ZeroFOX
● Data Science Tools
● Prodigy Demo
● Q & A
21. 21SLIDE
Prodigy
● Created by Explosion.AI (Matthew Honnibal and Ines Montani)
○ Same company that develops spaCy and Thinc
● Designed to make annotating data simple but can do much more
● Is a tool (Python package) that you purchase
● Why Prodigy?
○ Solves the “hardest” problem in applied data science
○ Can programmatically define entire model workflow in a recipe
○ Out of the box support for spaCy
○ Supports computer vision annotation
○ Exports trained models as Python packages