Google Cloud Professional Data Engineer certification prepares machine learning engineers for running ML models in production. This includes DevOps tasks, such as monitoring and scaling.
7. What’s Needed to Deploy ML to
Production
● ML centric
○ Data collection
○ Data
transformation
○ Feature
engineering
○ Model building
and evaluation
● Production centric
○ Data storage
○ Scalable transformation
processes
○ Workflows
○ Integration and
deployment
○ Monitoring
○ Security and Compliance
14. Who Benefits?
● Practitioners - salaries 15% more than
non-certified (IDC/Microsoft )
● Organizations - new hire productivity,
hire advancement, streamlined hiring
● Google - supports efforts to expand
market share by creating a pool of
knowledgeable professionals
15. Google Cloud Professional Data
Engineer Exam
○ Designing data processing and storage systems
○ Migrating data warehouse
○ Operationalizing storage, processing infrastructure, and
pipeline
○ Operationalizing ML models
○ Pre-built models
○ ML architecture e.g. edge computing
○ Security and compliance
○ Scalability and portability
16. Designing Data Processing & Storage
Systems
● Data modeling
● Latency, throughput,
transactions
● Fault tolerance
● Distributed systems
● Batch and stream
processing
● Job automation and
orchestration
● Event processing
17. Operationalizing Processing and
Storage
● Storage costs and performance
● Data cleansing
● Data lifecycle management
● Provisioning resources
● Monitoring and adjusting pipelines
● Effective use of managed services
18. Operationalizing Machine Learning
Models
● Pre-built ML models as a service
● Ingesting data
● Training machine learning models
● Training and serving infrastructure
● Hardware accelerators
● ML terminology
19. Options for Operational ML
● Compute Engine
○ GPU or TPUs
○ Deep Learning VM
○ C2-standard-60 60 vCPUs 240GB 257TB
● Kubernetes Engine
○ Supports GPUs and TPUs
○ Containers with TF, PyTorch, and R
○ Job and deployment APIs
● AI Platform
○ Serverless option
○ Train, evaluate, tune models
○ TensorFlow, Scikit Learn, XGBoost
20. Study Strategy
● Follow Certification Exam Guide
○ High level domains & detailed tasks
○ https://cloud.google.com/certification/guides/data-engineer/
● Take Practice Exam
○ Good assessment but the actual test is more difficult
● Identify weakest areas
○ We often focus on some, not all domains in our work
○ You will be tested on all domains
● Perform tasks using Cloud Console and Cloud Shell
21. Exam Taking Strategy
● Timed test, know your remaining time
○ 50 multiple choice questions
○ 2 hours
○ Mark questions for review
● Read questions carefully
○ Identify key services and software
○ Identify technical requirements
● Focus on how to choose between likely options or near misses
26. Final Thoughts ...
● Certifications help
define scope of
knowledge needed
for a particular role
● They are a measure
of competence, not
expertise
● Continuous learning
What you
know
What you think
you know
Circle of Competence
You’ll need to know how to choose a storage system based on the structure of the data, the volume of data, latency requirements, and query patterns. In genearl, BigQuery is the go to servcie for data warehousing and in some cases ML. Bigtable is well suited to high volume, low latency use cases, like IoT. Cloud Firestore is a document database and a managed service subsitiute for MongoDB.