Data Structure and Types

AWS Certified Data Analytics
(Amazon Web Service)

Knowledge Check ...
For now, ask yourself,
1. Why do we need data ?
2. Why do we need to store it efficiently? How to store it efficiently?
3. How and where to persist data? Hint: Maybe “Excel” 🤦♂️
4. What insights do we get after analyzing it?

Data Structure(s)
a data organization, management, and storage format that enables efficient
access and modification. 🤔 (boring “Wikipedia” definition)
a way of organizing the data so that it can be used efficiently. 😀

Data Type(s)
An attribute of data to indicate what type of data we are
storing or manipulating.
it tells us what kind of data we are dealing with.😀

Preparing the data … “correctly ✅” ...
This is where understanding the different
types of data and data structures comes in
handy.
There isn’t one great way of storing.
Every organization store data differently.

Initially, develop a general
idea of how all data is
being
● Generated
● Collected
● Stored
Then only,
● We can find data that is
“relevant”,
● Process it, and
● Analyze to gain insights.

Why does data matter? 🤔
Data is the most valuable commodity in the world. Data has value or can have value.
We want to store data in such a way that it will be easier to manipulate and gain
insights.

Once upon a time … 👴
Data was structured and stored across multiple
tables and managed by RDBMS.
The computational power to process the data, at
that time, was low.
Social networks, smart phones and IOT devices,
video streaming platforms … these “data
sources” were still in their early days.

Some years later … ⏩
As we become a more digital
society, the amount of data being
created and collected is growing
and accelerating significantly.
Analysis of this ever-growing data
becomes a challenge with
traditional analytical tools.
“DATA IS
EVERYWHERE”
AND IT IS UNSTRUCTURED MOSTLY

90%
of the data in the world today has been created in the last two years. 😲

Why AWS ?
Amazon Web Services (AWS)
provides a broad platform of
managed services to help you
build, secure, and seamlessly
scale end-to-end big data
applications quickly and with
ease.
We require innovation to bridge the
gap between data being generated
and data that can be analyzed
effectively.💡

But wait, What is Big Data ?
Data, so large and complex, that exceeds the processing capacity of
conventional database systems.
3 V’s of Big Data:
● Volume: refers to size of data we are dealing with.
● Variety: refers to fact that data is coming from various sources and in different
formats.
● Velocity: refers to the speed at which data is being generated.
There can be more V’s.
So, any data that crashes Excel is “Big Data”.😬

Data from where ? 🤔
Ask yourself,
● Where does data come from ?
● How is such huge data being
generated ?
● Is the data even relevant or from
valid sources ?
● Who is storing the data ?

Data sources …
IOT devices, sensors, CCTV
Social Networks and Search Engines
Stock Exchange Data
Online Shopping, Retail Data
Log files
ERPs, CRMs systems
Healthcare Industry, Insurance
Airlines Data
Financial Data
Geographical Data
SO MUCH MORE!!!

Structured , Unstructured and Semi Structured
Data
Structured data has a defined schema. This type of data is well organized.
e.g. Relational Data.
Unstructured Data has no defined schema or structural properties. It makes
up the majority of data collected. e.g. Audio/Video, Images, Binary data.
Semi Structured Data is somewhere in the middle. This data is too
unstructured for relational data but has some organizational structure. e.g.
XML data.

Source:
https://www.researchgate.net/figure/Unstructured-semi-structured-and-structured-data_fig4_236860222

Data LifeCycle
Stages:
1. Data Ingestion
2. Data Staging
3. Data Cleansing
4. Data Analytics and
Visualization
5. Data Archive

Data Ingestion: The movement of data from an external source to
another location for analysis.
Data Staging: It involves performing housekeeping tasks prior to
making data available to users.
Data Cleansing: Before data is analyzed, data cleansing detects,
corrects, and removes inaccurate data or corrupted records or
files.
Data Analytics and Visualization: The real value of data can be
extracted in this stage. Decision-makers use analytics
and visualization tools to predict customer needs, improve
operations, transform broken processes, and innovate to
compete.
Data Archiving: The AWS Cloud facilitates data archiving,
enabling IT departments to invest more time in other stages of
the data lifecycle.

Data Stores (place to keep the data) 😀

Data Integrity 🤔
it just means the accuracy,
completeness, and quality of data as it’s
maintained over time and across
formats.

Database Consistency
The database must remain in a
consistent state after any
transaction.
A consistent transaction will not
violate integrity constraints placed
on the data by the database rules.

ETL(Extract-Transform-Load)
a way to integrate data into a single location. 😀
ETL is a recurring activity (daily, weekly, monthly) of a Data Warehouse system
and needs to be agile, automated, and well documented.

Source: https://www.altexsoft.com/blog/etl-vs-elt/

ETL ... similar to ELT (Extract Load Transform)
ELT inverts the last two stages of the ETL process, meaning that after being extracted
from databases, data is loaded straight into a central repository where all
transformations occur.

Analyzing Data … (becoming “Sherlock” 🤔🤔♂️)
Understanding the real value contained
within the data and with those insights we
can make business decisions.
extracting information from data to
support decision making.

Visualizing Data ... 📈
presentation of data in a pictorial or graphical
format.
The way human brain processes information,
using charts or graphs to visualize large
amounts of complex data is easier than poring
over spreadsheets or reports.

Common Data Visualization Ways
Source: https://morphocode.com/location-time-urban-data-visualization/

“In God we trust; All others must bring data.”
- W.EDWARD DEMING

AWS provides a host of
services to address an
organization’s data
lifecycle and analytics
requirements.😌

Data Structure and Types

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Data Structure and Types

Similar a Data Structure and Types (20)

Último

Último (20)

Data Structure and Types

Notas del editor