SlideShare una empresa de Scribd logo
1 de 45
Modern Open Data Platform :
Cool Open Source Tools
Crafting your Dream Stack with the Open Data Platform
Playbook
Rahul Xavier Singh Anant Corporation
Data Engineer’s Lunch / Anant Webinar 11/07/2022
Playbook
Design
Framework
Approach
ETL / Reverse ETL
Customer Data Platforms
Components
DataOps
Agenda
We help platform owners
reach beyond their potential
to serve a global customer
base that demands
Everything, Now.
We design with our
Playbook, build with our
Framework, and manage
platforms with our Approach
so our clients
Think & Grow Big.
Customer Success
Challenge
Business
Platform
Playbook
Framework
Approach
Technology
Management
Solutions
[Data] Services Catalog
Fully Managed Service
Subscriptions
We offer Professional Services to engineer Solutions and
offer Managed Services to clients where it makes sense, after an
Assessment
7
Modern Technology is Disconnected
https://chiefmartec.com/2020/04/marketing-technology-landscape-2020-martech-5000/
Businesses want to :
- Create value
- Get the customer
- Deliver the value
- Get paid
8
Most Users Just Want / Need to …
FIND
DISCOVER
FILTER
ANALYZE
VISUALIZE
MEASURE
ACT
USE
SHARE
9
Business / Platform Dream
Enterprise
Consciousness :
- People
- Processes,
- Information
- Systems
Connected /
Synchronized.
Business has been chasing
this dream for a while. As
technologies improve, this
becomes more accessible. Image Source: Digital Business
Technology Platforms, Gartner 2016
10
Going Beyond “Reactive Manifesto” / 12 Factor
References: https://12factor.net/, https://www.reactivemanifesto.org/
- Current Business Information is
available to People in the swiftest
way possible within the bounds of
reasonable costs.
- Business Information is generally
available to the enterprise, siloed
only by security and governance.
- Data platforms make use of
appropriate resources for hot vs.
cold, raw vs. enhanced data.
- Data platforms are always
available, redundant, always
trying to achieve a RPO/RTO of
zero.
Project
Information
Client
Service
Information
Corporate
Guides
Collaborative
Documents
Assets
& Files
Corporate
Assets
Unified User Experience
Challenges of
Managing Data
Platforms in a
Growing Enterprise
Optimized Core enabled Business Modularity
This process needs
to be done in
sequence. Otherwise
we end up having to
redo the work.
Business
Silos
Standardized
Platform
Optimized
Core
Business
Modularity
Phases of Business Modularity
14
Generic Data Platform Operations
Modern
Open Data Platform
Design
Contexts
Responsibilities
Approach
Framework
Tools
17
So Many Different “Modern Stacks?”
Lots of “reference” architectures
available. They tend not to think about
the speed layer since they are focusing
on batch. What about SPEED?
18
How do you choose from the landscape?
Lots and lots of components in the
Data & AI Landscape. Which ones are
the right ones for your business?
19
Playbook for Modern Open Data Platform
Platform Design Evaluate Framework
Cloud
- Public
- Private
- Hybrid
Data
- Data:Object
- Data:Stream
- Data:Table
- Data:Index
- Processor:Batch
- Processor:Stream
DataOps
- ETL/ELT/EtLT
- Reverse ETL
- Orchestration
DevOps
- Infrastructure as
Code
- Systems
Automation
- Application CICD
Architecture (Design)
- Cloud
- Data
- DevOps
- DataOps
Engineering
- Configuration
- Scripting
- Programming
Operation
- Setup / Deploy
- Monitoring/Alerts
- Administration
User Experience
- No-Code/Low Code Apps/Form Builders
- Automatic API Generator/Platform
- Customer App/API Framework
Execute Approach
Discovery (Inventory)
- People
- Process
- Information (Objects)
- Systems (Apps)
Modern Enterprise Canvas
Workflow
Approval
Customer
Acquisition Customer
Payment
Customer
Information
Customer
Information
Customer
Information
Business
Information
Billing
Information
Zoho App
Creator
Unbounce
Zoho CRM Stripe
Zapier
Contexts
- People
- Process
- Information
- Systems
Responsibility Areas
- Products & Services
- Sales & Marketing
- Operations &
Infrastructure
- Research &
Development
- Finance &
Accounting
- Leadership &
Management
Modern Enterprise Canvas
Contexts
- People
- Process
- Information
- Systems
Responsibility Areas
- Customer
- Users
- Business
- Product Owners
- Engineering
- Developers
- Operations
- Administrators
Framework
Framework
Distributed
Realtime
Extendable / Open
Automated
Monitored / Managed
Public Cloud Native - Amazon
Public Cloud Native - Microsoft
Public Cloud Native - Google
Cool Tools:
Optimizing Distributed Data
with Cloud vs. Open Core with
Open Source Tools
Open Core Distributed Data Platforms
To create globally distributed and real time platforms, we
need to use distributed realtime technologies to build your
platform. Here are some. Which ones should you choose?
Open Core
Data Modernization / Automation / Integration
In addition to vastly scalable tools, there are also modern
innovations that can help teams automate and maximize
human capital by making data platform management easier.
Framework Components
● Major Components
○ Persistent Queues ( RAM/BUS)
○ Queue Processing & Compute ( CPU)
○ Persistent Storage (DISK/RAM)
○ Reporting Engine (Display)
○ Orchestration Framework (Motherboard)
○ Scheduler (Operating System)
● Strategies
○ Cloud Native on Google
○ Self-Managed Open Source
○ Self-Managed Commercial Source
○ Managed Commercial Source
Customers want options, so we decided to
create a Framework that can scale with
whatever Infrastructure and Software strategy
they want to use.
31
Framework
Approach
Approach
Setup
Training
Administration
Configuration
Knowledge
Approach
34
Sample STACK Outline
35
Framework
Platform
Component
s
Resources
Platform
Setup
Training
Administrati
on
Configuratio
n
Knowledge
● Components
○ Infrastructure
■ Source / Git
■ Github
■ Gitlab
■ Cloud / Public
■ AWS
■ Azure
■ GCP
■ DO
■ Orchestration
■ Terraform
■ Terraform / Atlanits
■ Configuration
■ Ansible
■ Ansible / AWX / Semaphore
○ Compute
■ Datastax / Spark
■ Datastax / Livy
■ Databricks
○ Data / Open Core
■ Datastax Enterprise
■ Cassandra
■ Search / Solr
■ Graph
■ Confluent Platform
○ Data / Cloud
■ Datastax / Astra
■ Confluent Cloud
○ Data / Open Source
■ Cassandra
■ Kafka
■ Elassandra
■ YugaByte
■ Scylla
■ Pulsar
○ Application
■ Airflow
■ Airbyte
■ Kafka Streams
■ Jupyter
■ Redash
■ Metabase
■ Superset
■ Zeppelin
Use Case:
Standard Data Fabric
37
How Distributed Data Helps Drive Enterprise
Consciousness
XDCR: Cross datacenter
replication is the
ultimate data fabric.
Resilience,
performance,
availability, and scale.
Made widely available
by Cassandra and
Couchbase
38
Modern Open Data Platform + Cool Database = Data Fabric
One cluster, many workloads.
With any other “Data Warehouse”,
this would be problematic. With
Cassandra, this is a core feature.
39
How YugaByteDB allows us to go further…
All the benefits of XDCR and ….
- More Data Density at High
Speed
- YCQL Queries to support
Non Relational / C* CQL
like queries.
- YSQL Queries to support
Relational / SQL Queries
- Transactions/Consistency
- …
40
Let’s Get Data into a Database - Easier Today
Open Source:
- Airbyte / RudderStack
makes ETL Easier and
are open source
- Kafka Connect / Pulsar
IO can convert ETL into
Streaming ETL
SaaS/PaaS:
- SaaS like Stitch/HevoData
- Supported versions of Airbyte/RudderStack
41
Once It’s There, Serve it , Do More Processing
Open Source:
- Flink / Spark / Kafka
Streams can be used
to save Analytics /
ML processed data.
- Hasura can help
serve data as
GraphQL, PostgREST
can expose REST
apis.
42
Open Source:
- Grouparoo / Airbyte ,
RudderStack are free.
Others are paid.
- You can always use
Kafka Connect /
Pulsar IO to send data
back also.
Let’s send it back via Reverse ETL!
Reverse ETL is the process of copying data from a warehouse into business applications like
CRM, analytics, and marketing automation software. You perform this process by using a
reverse ETL tool that integrates with your data source and your business SaaS tools.
- Segment Blog
https://segment.com/blog/reverse-
etl/
43
Let’s put it all together now - ONE DATA FABRIC
Cassandra isn’t the only database to
do XDCR that can enable multiple
workloads.
Yugabyte also offers a PostgreSQL
compliant Layer
44
Key Takeaways for Open Data Platforms
Don’t reinvent the wheel.
Prioritize DevOps / DataOps
Document the STACK
Identify the Objectives
- Identify the objectives so that you
know what success looks like.
- DevOps / DataOps combined with a
true agile approach allows you to
iterate your platform quickly.
- Put the data into a distributed data
store that supports SQL/CQL, and
possibly archive it into
Parquet/Iceberg (historical data)
- Get the data out to your Systems
using “Reverse ETL” tools.
Use open tools that are well
supported
45
Thank you and Dream Big.
Hire us
- Design Workshops
- Innovation Sprints
- Service Catalog
Anant.us
- Read our Playbook
- Join our Mailing List
- Read up on Data Platforms
- Watch our Videos
- Download Examples

Más contenido relacionado

La actualidad más candente

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best PracicesAndy Petrella
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...Vietnam Open Infrastructure User Group
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data GovernanceDATAVERSITY
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Willy Lulciuc
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 

La actualidad más candente (20)

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez Data Lineage with Apache Airflow using Marquez
Data Lineage with Apache Airflow using Marquez
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 

Similar a Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms

Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRBWilliam Poos
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...UA DevOps Conference
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analyticsKyle Bader
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overviewKeshav Murthy
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
OFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAININGOFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAININGsatish_kumar646
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Cloud Computing Architecture Primer
Cloud Computing Architecture PrimerCloud Computing Architecture Primer
Cloud Computing Architecture PrimerIlham Ahmed
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like ProductsVMware Tanzu
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 

Similar a Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms (20)

Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Sql 2016 2017 full
Sql 2016   2017 fullSql 2016   2017 full
Sql 2016 2017 full
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 
Sql 2017 net raf
Sql 2017  net rafSql 2017  net raf
Sql 2017 net raf
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overview
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
OFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAININGOFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAINING
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Cloud Computing Architecture Primer
Cloud Computing Architecture PrimerCloud Computing Architecture Primer
Cloud Computing Architecture Primer
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 

Más de Anant Corporation

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137Anant Corporation
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfAnant Corporation
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotAnant Corporation
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...Anant Corporation
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowAnant Corporation
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksAnant Corporation
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionAnant Corporation
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Anant Corporation
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & FutureAnant Corporation
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Anant Corporation
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsAnant Corporation
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraAnant Corporation
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionAnant Corporation
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersAnant Corporation
 

Más de Anant Corporation (20)

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
YugabyteDB Developer Tools
YugabyteDB Developer ToolsYugabyteDB Developer Tools
YugabyteDB Developer Tools
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
CL 121
CL 121CL 121
CL 121
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms

  • 1. Modern Open Data Platform : Cool Open Source Tools Crafting your Dream Stack with the Open Data Platform Playbook Rahul Xavier Singh Anant Corporation Data Engineer’s Lunch / Anant Webinar 11/07/2022
  • 2. Playbook Design Framework Approach ETL / Reverse ETL Customer Data Platforms Components DataOps Agenda
  • 3. We help platform owners reach beyond their potential to serve a global customer base that demands Everything, Now.
  • 4. We design with our Playbook, build with our Framework, and manage platforms with our Approach so our clients Think & Grow Big.
  • 6. Challenge Business Platform Playbook Framework Approach Technology Management Solutions [Data] Services Catalog Fully Managed Service Subscriptions We offer Professional Services to engineer Solutions and offer Managed Services to clients where it makes sense, after an Assessment
  • 7. 7 Modern Technology is Disconnected https://chiefmartec.com/2020/04/marketing-technology-landscape-2020-martech-5000/ Businesses want to : - Create value - Get the customer - Deliver the value - Get paid
  • 8. 8 Most Users Just Want / Need to … FIND DISCOVER FILTER ANALYZE VISUALIZE MEASURE ACT USE SHARE
  • 9. 9 Business / Platform Dream Enterprise Consciousness : - People - Processes, - Information - Systems Connected / Synchronized. Business has been chasing this dream for a while. As technologies improve, this becomes more accessible. Image Source: Digital Business Technology Platforms, Gartner 2016
  • 10. 10 Going Beyond “Reactive Manifesto” / 12 Factor References: https://12factor.net/, https://www.reactivemanifesto.org/ - Current Business Information is available to People in the swiftest way possible within the bounds of reasonable costs. - Business Information is generally available to the enterprise, siloed only by security and governance. - Data platforms make use of appropriate resources for hot vs. cold, raw vs. enhanced data. - Data platforms are always available, redundant, always trying to achieve a RPO/RTO of zero. Project Information Client Service Information Corporate Guides Collaborative Documents Assets & Files Corporate Assets Unified User Experience
  • 11. Challenges of Managing Data Platforms in a Growing Enterprise
  • 12. Optimized Core enabled Business Modularity This process needs to be done in sequence. Otherwise we end up having to redo the work.
  • 17. 17 So Many Different “Modern Stacks?” Lots of “reference” architectures available. They tend not to think about the speed layer since they are focusing on batch. What about SPEED?
  • 18. 18 How do you choose from the landscape? Lots and lots of components in the Data & AI Landscape. Which ones are the right ones for your business?
  • 19. 19 Playbook for Modern Open Data Platform Platform Design Evaluate Framework Cloud - Public - Private - Hybrid Data - Data:Object - Data:Stream - Data:Table - Data:Index - Processor:Batch - Processor:Stream DataOps - ETL/ELT/EtLT - Reverse ETL - Orchestration DevOps - Infrastructure as Code - Systems Automation - Application CICD Architecture (Design) - Cloud - Data - DevOps - DataOps Engineering - Configuration - Scripting - Programming Operation - Setup / Deploy - Monitoring/Alerts - Administration User Experience - No-Code/Low Code Apps/Form Builders - Automatic API Generator/Platform - Customer App/API Framework Execute Approach Discovery (Inventory) - People - Process - Information (Objects) - Systems (Apps)
  • 20. Modern Enterprise Canvas Workflow Approval Customer Acquisition Customer Payment Customer Information Customer Information Customer Information Business Information Billing Information Zoho App Creator Unbounce Zoho CRM Stripe Zapier Contexts - People - Process - Information - Systems Responsibility Areas - Products & Services - Sales & Marketing - Operations & Infrastructure - Research & Development - Finance & Accounting - Leadership & Management
  • 21. Modern Enterprise Canvas Contexts - People - Process - Information - Systems Responsibility Areas - Customer - Users - Business - Product Owners - Engineering - Developers - Operations - Administrators
  • 25. Public Cloud Native - Microsoft
  • 27. Cool Tools: Optimizing Distributed Data with Cloud vs. Open Core with Open Source Tools
  • 28. Open Core Distributed Data Platforms To create globally distributed and real time platforms, we need to use distributed realtime technologies to build your platform. Here are some. Which ones should you choose?
  • 29. Open Core Data Modernization / Automation / Integration In addition to vastly scalable tools, there are also modern innovations that can help teams automate and maximize human capital by making data platform management easier.
  • 30. Framework Components ● Major Components ○ Persistent Queues ( RAM/BUS) ○ Queue Processing & Compute ( CPU) ○ Persistent Storage (DISK/RAM) ○ Reporting Engine (Display) ○ Orchestration Framework (Motherboard) ○ Scheduler (Operating System) ● Strategies ○ Cloud Native on Google ○ Self-Managed Open Source ○ Self-Managed Commercial Source ○ Managed Commercial Source Customers want options, so we decided to create a Framework that can scale with whatever Infrastructure and Software strategy they want to use.
  • 35. Sample STACK Outline 35 Framework Platform Component s Resources Platform Setup Training Administrati on Configuratio n Knowledge ● Components ○ Infrastructure ■ Source / Git ■ Github ■ Gitlab ■ Cloud / Public ■ AWS ■ Azure ■ GCP ■ DO ■ Orchestration ■ Terraform ■ Terraform / Atlanits ■ Configuration ■ Ansible ■ Ansible / AWX / Semaphore ○ Compute ■ Datastax / Spark ■ Datastax / Livy ■ Databricks ○ Data / Open Core ■ Datastax Enterprise ■ Cassandra ■ Search / Solr ■ Graph ■ Confluent Platform ○ Data / Cloud ■ Datastax / Astra ■ Confluent Cloud ○ Data / Open Source ■ Cassandra ■ Kafka ■ Elassandra ■ YugaByte ■ Scylla ■ Pulsar ○ Application ■ Airflow ■ Airbyte ■ Kafka Streams ■ Jupyter ■ Redash ■ Metabase ■ Superset ■ Zeppelin
  • 37. 37 How Distributed Data Helps Drive Enterprise Consciousness XDCR: Cross datacenter replication is the ultimate data fabric. Resilience, performance, availability, and scale. Made widely available by Cassandra and Couchbase
  • 38. 38 Modern Open Data Platform + Cool Database = Data Fabric One cluster, many workloads. With any other “Data Warehouse”, this would be problematic. With Cassandra, this is a core feature.
  • 39. 39 How YugaByteDB allows us to go further… All the benefits of XDCR and …. - More Data Density at High Speed - YCQL Queries to support Non Relational / C* CQL like queries. - YSQL Queries to support Relational / SQL Queries - Transactions/Consistency - …
  • 40. 40 Let’s Get Data into a Database - Easier Today Open Source: - Airbyte / RudderStack makes ETL Easier and are open source - Kafka Connect / Pulsar IO can convert ETL into Streaming ETL SaaS/PaaS: - SaaS like Stitch/HevoData - Supported versions of Airbyte/RudderStack
  • 41. 41 Once It’s There, Serve it , Do More Processing Open Source: - Flink / Spark / Kafka Streams can be used to save Analytics / ML processed data. - Hasura can help serve data as GraphQL, PostgREST can expose REST apis.
  • 42. 42 Open Source: - Grouparoo / Airbyte , RudderStack are free. Others are paid. - You can always use Kafka Connect / Pulsar IO to send data back also. Let’s send it back via Reverse ETL! Reverse ETL is the process of copying data from a warehouse into business applications like CRM, analytics, and marketing automation software. You perform this process by using a reverse ETL tool that integrates with your data source and your business SaaS tools. - Segment Blog https://segment.com/blog/reverse- etl/
  • 43. 43 Let’s put it all together now - ONE DATA FABRIC Cassandra isn’t the only database to do XDCR that can enable multiple workloads. Yugabyte also offers a PostgreSQL compliant Layer
  • 44. 44 Key Takeaways for Open Data Platforms Don’t reinvent the wheel. Prioritize DevOps / DataOps Document the STACK Identify the Objectives - Identify the objectives so that you know what success looks like. - DevOps / DataOps combined with a true agile approach allows you to iterate your platform quickly. - Put the data into a distributed data store that supports SQL/CQL, and possibly archive it into Parquet/Iceberg (historical data) - Get the data out to your Systems using “Reverse ETL” tools. Use open tools that are well supported
  • 45. 45 Thank you and Dream Big. Hire us - Design Workshops - Innovation Sprints - Service Catalog Anant.us - Read our Playbook - Join our Mailing List - Read up on Data Platforms - Watch our Videos - Download Examples

Notas del editor

  1. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  2. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  3. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  4. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  5. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  6. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  7. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  8. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  9. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  10. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  11. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  12. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  13. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  14. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  15. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  16. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  17. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  18. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  19. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  20. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  21. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.