Do you know how to use StreamSets Data Collector with Google Cloud Platform (GCP)? In this session we'll explain how YaloChat designed and implemented a streaming architecture that is sustainable, operable and scalable. Discover how we deployed Data Collector to integrate GCP components such as Pub / Sub and BigQuery to achieve DataOps in the cloud
10. https://www.yalochat.com/
DataOps
“DataOps is a methodology that spans people, processes, tools, and services to enable
enterprises to rapidly, repeatedly, and reliably deliver production data from a vast array
of enterprise data sources to a vast array of enterprise data consumers.”
Getting DataOps Right
DataOps Principles:
• Continually satisfy your customer
• Orchestrate
• Monitor quality and performance
https://www.dataopsmanifesto.org/
12. https://www.yalochat.com/
Some Steps to Implement DataOps
Add data and logic tests
Use a version control system
Branch and merge
Use multiple environments
Reuse & Containerize
Parameterize your processing
https://www.datakitchen.io/content/DataKitchen_dataops_cookbook.pdf
22. https://www.yalochat.com/
Manage pipelines from a central repository
View published pipelines, filter
by type, drill down into pipeline
config
Inspect pipeline version history
View and monitor status of
pipelines
31. https://www.yalochat.com/
Compute Engine focuses on having an
infrastructure as a service, in which we have to
configure every aspect of this infrastructure and
manage our resources. This service is charged for
resource use.
https://cloud.google.com/docs/
32. https://www.yalochat.com/
Object storage system, which allows you to archive
unstructured data and large files (PB), self-
manageable and easily integrated with the other
services of the Google Cloud Platform.
https://cloud.google.com/docs/
33. https://www.yalochat.com/
Interactive database to analyze large volumes of
data with very fast response times. It manages the
infrastructure and resources automatically for fast
and efficient operation. Use a fee for use and
storage.
https://cloud.google.com/docs/
34. https://www.yalochat.com/
Cloud Pub/Sub brings the flexibility and reliability
of enterprise message-oriented middleware to the
cloud. Is a scalable, durable event ingestion and
delivery system. Delivers low-latency, durable
messaging that helps developers quickly integrate
systems.
https://cloud.google.com/docs/
36. https://www.yalochat.com/
Small steps for a great result
Assessment
• Inventory of
Information
Sources
• Identify Metrics
• Introduce BI
terminology
Identified gaps
• Technical Debt
• There aren't
defined processes
• Several efforts
separately
• Silos of data
Define action
points
• Implement a data
government.
• Implement a
DataOps
framework.
• Define the right
tools.
Align with the
Goals
• Prioritize according
to Business
• Think Outcomes
• Small deliverables
39. https://www.yalochat.com/
Lessons
It represents a cultural change
Think of the Use Cases to achieve outcomes
Start development a Data Governance
Start with a little sprints a goals and then growh
StreamSets support us to achive the goals
Looking for automation’s Ops through cloud
I recommend starting with SDC and evolving to Control Hub
41. https://www.yalochat.com/
Use Case
Implement an architecture that allows capturing events
in real time, keeping them in a message queue. It is
necessary to send the events to a historical repository
and to the Data Warehouse for further analysis.