2. Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
Punctuality
Join the session 5 minutes prior to
the session start time. We start on
time and conclude on time!
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Silent Mode
Keep your mobile devices in silent
mode, feel free to move out of
session in case you need to attend
an urgent call.
Avoid Disturbance
Avoid unwanted chit chat during
the session.
3. Our Agenda
Introduction to GCP
A Brief Introduction to GCP
01
GCS
Exploring Cloud Storage, Playing and Understanding with
Objects, Buckets, Storage Classes etc.
02
Big Table & Big Query
What are they, key features and use cases.
03
Google Data Flow
What actually it is, what it does and uses cases.
04
Air Flow
Cloud Composer - Orchestration service with Apache
Airflow.
05
Pubsub
Picture of Pub/Sub, it’s degree and the uses cases
06
5. Evolution of Cloud Computing
Cloud computing is all about renting computing services. In making cloud
computing what it is today, five technologies played a vital role.
● Composition of
multiple independent
systems but all of
them are depicted as
a single entity to the
users.
● Purpose is to share
resources and also
use them effectively
and efficiently
● Refers to the process
of creating a virtual
layer over the
hardware which allows
the user to run multiple
instances
simultaneously on the
hardware.
● It is the interface
through which the
cloud computing
services interact
with the clients.
● Popular examples
of web 2.0 include
Google Maps,
Facebook, Twitter,
etc.
● Computing model that
defines service
provisioning
techniques for
services.
● Eg. compute services
along with other major
services such as
storage, infrastructure
Distributed Systems Virtualization Web 2.0
Utility
Computing
Service
orientation
● Acts as a reference
model for cloud
computing.
● It supports low-cost,
flexible, and evolvable
applications.
● Quality of Service
(QoS) and Software
as a Service (SaaS)
also introduced in this
model.
6. c
What is GCP
● Google Cloud is a suite of public cloud
computing services offered by Google.
● Google Cloud offers services for compute,
storage, networking, big data, machine
learning and IoT, as well as cloud
management, security and developer tools.
● Google Cloud provides a wide variety of
services for managing and getting value from
data at scale.
7. c
Why GCP
LEARN NOW
● Higher Productivity owing to Quick Access to
Innovation.
● Less Disruption When Users Adopt New
Functionality.
● Google Cloud Allows Quick Collaboration.
● Google’s Economies of Scale Let Customers
Spend Less
9. LEARN NOW
c
Google
Cloud
Storage
● Object Storage in GCP is cloud storage.
● Very Popular, flexible and inexpensive
storage service.
● Store large objects using a key-value
approach.
● Rich support to access and modify objects
using the REST API.
● Store all types of data.
10. ● It helps in setting costs for storage,
retrieval and operations.
● Choice should be based on time
period and access frequency.
● 4 Options are available for
storage-class
○ Standard, Nearline, Coldline,
and Archive
Advanced Setting
(Optional)
Choose how to control
access to objects
Choose a default
storage class for your
data.
● Geographic placement of your
data.
● Affects your costs, performance,
and availability.
● Options available for Location
Type.
○ Region, Dual-region, and
multi-region.
● Pick a global unique name
(permanent name).
● Don’t include any sensitive
information.
● Choose on how you’ll protect your
data, configure the Protection tool.
● Data Encryption Method.
● Defines a grain control to your
objects.
● Select whether or not your bucket
enforces public access prevention,
and select access control model.
● Provides 2 types of access control
to objects.
○ Fine-grained
○ Uniform
Choose where to
store your data
Name your Bucket
11. Objects and Buckets
Advanced Setting
(Optional)
Choose how to control access
to objects
Choose a default storage class
for your data.
Choose where to store your
data
Name your Bucket
● Objects are stored in Buckets.
○ Globally Unique name.
○ At Least one lower case letters, numbers,
underscores and periods.
○ Length Constraint is between 3-63.
○ Unbounded data.
○ Every bucket is associated with the project.
● Unique Key is used to identify the objects.
○ It should be unique in the bucket.
● Max Object size is 5TB.
○ Can store unlimited number of objects.
12. Storage Classes
● Data could be anything -
○ Media Files and archives.
○ Application package and logs.
○ Databases and storage devices backups.
○ Long Term Archives.
● Huge Variation in access patterns.
● It helps in optimising your cost based on your access
needs.
Duration (Min.)
Storage Class
● Archive Storage
● Coldline Storage
● Nearline Storage
● Standard.
Name
● ARCHIVE
● COLDLINE
● NEARLINE
● STANDARD
● 365 Days
● 90 Days
● 30 Days
● None
Advanced Setting
(Optional)
Choose how to control access
to objects
Choose a default storage class for
your data.
Choose where to store your
data
Name your Bucket
13. Data Encryption
Choose how to control access
to objects
Choose a default storage class
for your data.
Choose where to store your
data
Name your Bucket
Advanced Setting (Optional)
Cloud Storage
Server Side Client Side
● Google-Managed
○ Default
● CMEK - Customer Managed
○ Managed by Customers in
Cloud KMS.
● Customer-Supplied
○ Cloud does not store the
keys.
○ For storing and using it,
customer itself is
responsible.
● GCP does not aware of the
key being used.
● No involvement in
encryption and decryption.
15. LEARN NOW
c
Big Table
● A fully managed, scalable NoSQL database
service.
● Very helpful for large analytical and
operational workloads.
● Seamless scalability to match your storage
needs with zero downtime even if
reconfigured.
● Easily connect to services like BigQuery or
even any of Apache Ecosystem.
16. Key Features of Big Table
HBase
Migration
Offers
Apache
HBase to
Cloud
BigTable
migration.
High
throughput
at low
latency
Ideal for
storing large
amounts of
data.
Flexible and
Automated
Replication
Automatically
replicate
where
needed
eventual
consistency.
Cluster
Resizing
Seamless
scaling up to
millions of
reads/writes
per second.
BT with
DataProc
Spark & BQ
Bigtable,
DataProc and
BigQuery are
better
together.
17. LEARN NOW
c
Big Query
● Data warehouse to power your data-driven
innovations.
● BigQuery is Cost-effective, serverless,
multicloud.
● It is the at core of Google’s unified data cloud.
● Better than the other cloud data warehouse
alternatives.
18. Key Features of Big Query
ML and
predictive
modeling with
Big Query ML
Multicloud data
analysis with
BigQuery Omni
Interactive data
analysis with
BigQuery BI
Engine
Geospatial
analysis with
BigQuery GIS
Export BQ ML models
for online prediction
into the Vertex AI or
your own serving layer
Use standard SQL and
BQ familiar interfaces
to quickly answer
questions.
Enables users to
analyse large and
complex datasets
interactively with
sub-seconds response.
BQ GIS uniquely
combines the
serverless architecture
of BQ with native
support for geospatial
analysis.
20. LEARN NOW
c
Google -
DataFlow
● Unified stream and batch data processing which
actually is serverless, fast, and cost effective.
● Fully managed data processing service.
● Streaming data analytics with speed.
● Horizontal autoscaling of worker resources to
maximise the resources utilizations.
● OSS community-driven innovation with Apache
Beam SDK.
21. Degree of Data Flow
❖ vertical auto scaling
❖ Right fitting
❖ Smart Diagnostics
❖ Streaming Engine
❖ Horizontal Scaling
❖ Dataflow Shuffle
❖ Dataflow SQL
❖ Flexible Resources Scheduling
(FlexRs)
❖ Dataflow templates
❖ Notebooks Integrations
❖ Real-time change data capture.
❖ Inline Monitoring
❖ Customer-managed encryption
keys (CMEK).
❖ Dataflow VPC Service Control.
❖ Private IPs.
23. LEARN NOW
c
Cloud Composer -
Air Flow
● A fully managed working orchestration service
built on top of Apache Airflow.
● Apache airflow is a platform created by the
community to programmatically author,
schedule and monitor workflows.
● Frees you from lock-in and is easy to use.
● Author, schedule, and monitor pipelines that
soan across hybrid and multi-cloud
environments.
24. MULTI-CLOUD
Create workflows that connect data,
processing and services across the
clouds.
OPEN SOURCE
Gives users the freedom from lock-in
and portability.
INTEGRATED
BigQuery, Dataflow, Dataproc,
DataStore, Cloud Storage, Pub/Sub, AI
Platform
HYBRID
Orchestrates workflows that across
between on-premises and the public
cloud.
Key Features
25. PYTHON PROGRAMMING
LANGUAGE
Dynamically author and schedule
workflows with Cloud composer.
RELIABILITY
Increase reliability of your workflows
through easy-to-use charts.
FULLY MANAGED
Allows you to focus on authoring,
scheduling and monitoring your
workflows only.
NETWORKING AND SECURITY
During environment creation, it allows
and provides you a number of
configuration options.
Key Features
27. LEARN NOW
c
Google Pub/Sub
● Ingest events for streaming into BigQuery, data
lakes or operational databases.
● No-ops, secure, scalable messaging or queue
system.
● In-order and any-order at-least-once message
delivery with pull and push modes.
● Secure data with fine-grained access controls
and always-on encryption.
28. Key Features
Third-party and OSS
integrations
Provides the Integrations with splunk and
datadog for logs along with Striim and
Informatica for Data Integration.
Dead letter topics
Unables the subscriber applications
processing and put aside for offline
examinations.
Filtering
Can filter message based upon attributes,
can lead to reduction in volume delivery.
Seek and replay
Rewind your backlog to any point in time or
a snapshot, enhancing the ability to
reprocess the message.
Google Cloud–native
integrations
Integration with multiple services such as
CS and Gmail update events.
29. Open
Open APIs and client libraries in different
languages to support cross-cloud and
hybrid deployments.
No provisioning,
auto-everything
Just start by setting your quota, publish,
and consume.
Compliance and security
Offers fine-grained access controls and
end-to-end encryption.
At-least-once delivery
Synchronous, cross-zone message
replication.
Exactly-once processing
Dataflow supports reliable, expressive,
exactly-once processing of Pub/Sub
streams.
Key Features