SlideShare una empresa de Scribd logo
1 de 51
AWS Certified Data Analytics
(Amazon Web Service)
Data Structure & Types
Knowledge Check ...
For now, ask yourself,
1. Why do we need data ?
2. Why do we need to store it efficiently? How to store it efficiently?
3. How and where to persist data? Hint: Maybe “Excel” 🤦♂️
4. What insights do we get after analyzing it?
Data Structure(s)
a data organization, management, and storage format that enables efficient
access and modification. 🤔 (boring “Wikipedia” definition)
a way of organizing the data so that it can be used efficiently. 😀
Data Type(s)
An attribute of data to indicate what type of data we are
storing or manipulating.
it tells us what kind of data we are dealing with.😀
Preparing the data … “correctly ✅” ...
This is where understanding the different
types of data and data structures comes in
handy.
There isn’t one great way of storing.
Every organization store data differently.
Initially, develop a general
idea of how all data is
being
● Generated
● Collected
● Stored
Then only,
● We can find data that is
“relevant”,
● Process it, and
● Analyze to gain insights.
Why does data matter? 🤔
Data is the most valuable commodity in the world. Data has value or can have value.
We want to store data in such a way that it will be easier to manipulate and gain
insights.
Once upon a time … 👴
Data was structured and stored across multiple
tables and managed by RDBMS.
The computational power to process the data, at
that time, was low.
Social networks, smart phones and IOT devices,
video streaming platforms … these “data
sources” were still in their early days.
Some years later … ⏩
As we become a more digital
society, the amount of data being
created and collected is growing
and accelerating significantly.
Analysis of this ever-growing data
becomes a challenge with
traditional analytical tools.
“DATA IS
EVERYWHERE”
AND IT IS UNSTRUCTURED MOSTLY
90%
of the data in the world today has been created in the last two years. 😲
Why AWS ?
Amazon Web Services (AWS)
provides a broad platform of
managed services to help you
build, secure, and seamlessly
scale end-to-end big data
applications quickly and with
ease.
We require innovation to bridge the
gap between data being generated
and data that can be analyzed
effectively.💡
But wait, What is Big Data ?
Data, so large and complex, that exceeds the processing capacity of
conventional database systems.
3 V’s of Big Data:
● Volume: refers to size of data we are dealing with.
● Variety: refers to fact that data is coming from various sources and in different
formats.
● Velocity: refers to the speed at which data is being generated.
There can be more V’s.
So, any data that crashes Excel is “Big Data”.😬
Data from where ? 🤔
Ask yourself,
● Where does data come from ?
● How is such huge data being
generated ?
● Is the data even relevant or from
valid sources ?
● Who is storing the data ?
Data sources …
IOT devices, sensors, CCTV
Social Networks and Search Engines
Stock Exchange Data
Online Shopping, Retail Data
Log files
ERPs, CRMs systems
Healthcare Industry, Insurance
Airlines Data
Financial Data
Geographical Data
SO MUCH MORE!!!
Structured , Unstructured and Semi Structured
Data
Structured data has a defined schema. This type of data is well organized.
e.g. Relational Data.
Unstructured Data has no defined schema or structural properties. It makes
up the majority of data collected. e.g. Audio/Video, Images, Binary data.
Semi Structured Data is somewhere in the middle. This data is too
unstructured for relational data but has some organizational structure. e.g.
XML data.
Source:
https://www.researchgate.net/figure/Unstructured-semi-structured-and-structured-data_fig4_236860222
Data LifeCycle
Stages:
1. Data Ingestion
2. Data Staging
3. Data Cleansing
4. Data Analytics and
Visualization
5. Data Archive
Data Ingestion: The movement of data from an external source to
another location for analysis.
Data Staging: It involves performing housekeeping tasks prior to
making data available to users.
Data Cleansing: Before data is analyzed, data cleansing detects,
corrects, and removes inaccurate data or corrupted records or
files.
Data Analytics and Visualization: The real value of data can be
extracted in this stage. Decision-makers use analytics
and visualization tools to predict customer needs, improve
operations, transform broken processes, and innovate to
compete.
Data Archiving: The AWS Cloud facilitates data archiving,
enabling IT departments to invest more time in other stages of
the data lifecycle.
AWS BIG DATA ECOSYSTEM
Data Stores (place to keep the data) 😀
Data Integrity 🤔
it just means the accuracy,
completeness, and quality of data as it’s
maintained over time and across
formats.
Database Consistency
The database must remain in a
consistent state after any
transaction.
A consistent transaction will not
violate integrity constraints placed
on the data by the database rules.
ETL(Extract-Transform-Load)
a way to integrate data into a single location. 😀
ETL is a recurring activity (daily, weekly, monthly) of a Data Warehouse system
and needs to be agile, automated, and well documented.
Source: https://www.altexsoft.com/blog/etl-vs-elt/
ETL ... similar to ELT (Extract Load Transform)
ELT inverts the last two stages of the ETL process, meaning that after being extracted
from databases, data is loaded straight into a central repository where all
transformations occur.
Analyzing Data … (becoming “Sherlock” 🤔🤔♂️)
Understanding the real value contained
within the data and with those insights we
can make business decisions.
extracting information from data to
support decision making.
Visualizing Data ... 📈
presentation of data in a pictorial or graphical
format.
The way human brain processes information,
using charts or graphs to visualize large
amounts of complex data is easier than poring
over spreadsheets or reports.
Common Data Visualization Ways
Source: https://morphocode.com/location-time-urban-data-visualization/
PUTTING ALL TOGETHER
“In God we trust; All others must bring data.”
- W.EDWARD DEMING
AWS provides a host of
services to address an
organization’s data
lifecycle and analytics
requirements.😌

Más contenido relacionado

La actualidad más candente

Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseElena Lopez
 
Azure data lake for super developers radu vunvulea 2017 codecamp
Azure data lake for super developers radu vunvulea 2017 codecampAzure data lake for super developers radu vunvulea 2017 codecamp
Azure data lake for super developers radu vunvulea 2017 codecampRadu Vunvulea
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataRob Winters
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsWSO2
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowRob Winters
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeSingleStore
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster ServicesAdam Doyle
 
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data Growth
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data GrowthWebinar: Rearchitecting Storage for the Next Wave of Splunk Data Growth
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data GrowthStorage Switzerland
 
Big data analytics
Big data analyticsBig data analytics
Big data analyticsiACT Global
 
DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax
 
Big data, Cloud Computing and No SQL
Big data, Cloud Computing and No SQLBig data, Cloud Computing and No SQL
Big data, Cloud Computing and No SQLManu Cohen-Yashar
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...Databricks
 
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data winKen Taylor
 

La actualidad más candente (20)

Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Azure data lake for super developers radu vunvulea 2017 codecamp
Azure data lake for super developers radu vunvulea 2017 codecampAzure data lake for super developers radu vunvulea 2017 codecamp
Azure data lake for super developers radu vunvulea 2017 codecamp
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Big Data Storage Challenges and Solutions
Big Data Storage Challenges and SolutionsBig Data Storage Challenges and Solutions
Big Data Storage Challenges and Solutions
 
Billions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right NowBillions of Rows, Millions of Insights, Right Now
Billions of Rows, Millions of Insights, Right Now
 
Building the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free LifeBuilding the Foundation for a Latency-Free Life
Building the Foundation for a Latency-Free Life
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data Growth
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data GrowthWebinar: Rearchitecting Storage for the Next Wave of Splunk Data Growth
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data Growth
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)DataStax Enterprise in Practice (Field Notes)
DataStax Enterprise in Practice (Field Notes)
 
Azure Document Db
Azure Document DbAzure Document Db
Azure Document Db
 
Big data, Cloud Computing and No SQL
Big data, Cloud Computing and No SQLBig data, Cloud Computing and No SQL
Big data, Cloud Computing and No SQL
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
 
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data win
 

Similar a Data Structure and Types

Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investmentvijayk23x
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction ProcessingStefanie Yang
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data Shallote Dsouza
 
Introduction To Data WareHouse
Introduction To Data WareHouseIntroduction To Data WareHouse
Introduction To Data WareHouseSriniRao31
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1RojaT4
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of dataHarsha MV
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And FootballAmanda Gray
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsVrushaliSolanke
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Edwin S. Garcia
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxTake1As
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryCoert Du Plessis (杜康)
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
 

Similar a Data Structure and Types (20)

Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction Processing
 
Security issues in big data
Security issues in big data Security issues in big data
Security issues in big data
 
Introduction To Data WareHouse
Introduction To Data WareHouseIntroduction To Data WareHouse
Introduction To Data WareHouse
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
 
Questions On The And Football
Questions On The And FootballQuestions On The And Football
Questions On The And Football
 
Introduction of Data Science and Data Analytics
Introduction of Data Science and Data AnalyticsIntroduction of Data Science and Data Analytics
Introduction of Data Science and Data Analytics
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
Database
DatabaseDatabase
Database
 
Database
DatabaseDatabase
Database
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
 
Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 
Unit 2
Unit 2Unit 2
Unit 2
 

Último

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 

Último (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 

Data Structure and Types

  • 1. AWS Certified Data Analytics (Amazon Web Service)
  • 3. Knowledge Check ... For now, ask yourself, 1. Why do we need data ? 2. Why do we need to store it efficiently? How to store it efficiently? 3. How and where to persist data? Hint: Maybe “Excel” 🤦♂️ 4. What insights do we get after analyzing it?
  • 4. Data Structure(s) a data organization, management, and storage format that enables efficient access and modification. 🤔 (boring “Wikipedia” definition) a way of organizing the data so that it can be used efficiently. 😀
  • 5. Data Type(s) An attribute of data to indicate what type of data we are storing or manipulating. it tells us what kind of data we are dealing with.😀
  • 6. Preparing the data … “correctly ✅” ... This is where understanding the different types of data and data structures comes in handy. There isn’t one great way of storing. Every organization store data differently.
  • 7. Initially, develop a general idea of how all data is being ● Generated ● Collected ● Stored Then only, ● We can find data that is “relevant”, ● Process it, and ● Analyze to gain insights.
  • 8. Why does data matter? 🤔 Data is the most valuable commodity in the world. Data has value or can have value. We want to store data in such a way that it will be easier to manipulate and gain insights.
  • 9. Once upon a time … 👴 Data was structured and stored across multiple tables and managed by RDBMS. The computational power to process the data, at that time, was low. Social networks, smart phones and IOT devices, video streaming platforms … these “data sources” were still in their early days.
  • 10. Some years later … ⏩ As we become a more digital society, the amount of data being created and collected is growing and accelerating significantly. Analysis of this ever-growing data becomes a challenge with traditional analytical tools. “DATA IS EVERYWHERE” AND IT IS UNSTRUCTURED MOSTLY
  • 11. 90% of the data in the world today has been created in the last two years. 😲
  • 12. Why AWS ? Amazon Web Services (AWS) provides a broad platform of managed services to help you build, secure, and seamlessly scale end-to-end big data applications quickly and with ease. We require innovation to bridge the gap between data being generated and data that can be analyzed effectively.💡
  • 13. But wait, What is Big Data ? Data, so large and complex, that exceeds the processing capacity of conventional database systems. 3 V’s of Big Data: ● Volume: refers to size of data we are dealing with. ● Variety: refers to fact that data is coming from various sources and in different formats. ● Velocity: refers to the speed at which data is being generated. There can be more V’s. So, any data that crashes Excel is “Big Data”.😬
  • 14. Data from where ? 🤔 Ask yourself, ● Where does data come from ? ● How is such huge data being generated ? ● Is the data even relevant or from valid sources ? ● Who is storing the data ?
  • 15. Data sources … IOT devices, sensors, CCTV Social Networks and Search Engines Stock Exchange Data Online Shopping, Retail Data Log files ERPs, CRMs systems Healthcare Industry, Insurance Airlines Data Financial Data Geographical Data SO MUCH MORE!!!
  • 16. Structured , Unstructured and Semi Structured Data Structured data has a defined schema. This type of data is well organized. e.g. Relational Data. Unstructured Data has no defined schema or structural properties. It makes up the majority of data collected. e.g. Audio/Video, Images, Binary data. Semi Structured Data is somewhere in the middle. This data is too unstructured for relational data but has some organizational structure. e.g. XML data.
  • 18.
  • 19. Data LifeCycle Stages: 1. Data Ingestion 2. Data Staging 3. Data Cleansing 4. Data Analytics and Visualization 5. Data Archive
  • 20. Data Ingestion: The movement of data from an external source to another location for analysis. Data Staging: It involves performing housekeeping tasks prior to making data available to users. Data Cleansing: Before data is analyzed, data cleansing detects, corrects, and removes inaccurate data or corrupted records or files. Data Analytics and Visualization: The real value of data can be extracted in this stage. Decision-makers use analytics and visualization tools to predict customer needs, improve operations, transform broken processes, and innovate to compete. Data Archiving: The AWS Cloud facilitates data archiving, enabling IT departments to invest more time in other stages of the data lifecycle.
  • 21. AWS BIG DATA ECOSYSTEM
  • 22. Data Stores (place to keep the data) 😀
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. Data Integrity 🤔 it just means the accuracy, completeness, and quality of data as it’s maintained over time and across formats.
  • 39. Database Consistency The database must remain in a consistent state after any transaction. A consistent transaction will not violate integrity constraints placed on the data by the database rules.
  • 40. ETL(Extract-Transform-Load) a way to integrate data into a single location. 😀 ETL is a recurring activity (daily, weekly, monthly) of a Data Warehouse system and needs to be agile, automated, and well documented.
  • 42. ETL ... similar to ELT (Extract Load Transform) ELT inverts the last two stages of the ETL process, meaning that after being extracted from databases, data is loaded straight into a central repository where all transformations occur.
  • 43.
  • 44.
  • 45.
  • 46. Analyzing Data … (becoming “Sherlock” 🤔🤔♂️) Understanding the real value contained within the data and with those insights we can make business decisions. extracting information from data to support decision making.
  • 47. Visualizing Data ... 📈 presentation of data in a pictorial or graphical format. The way human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports.
  • 48. Common Data Visualization Ways Source: https://morphocode.com/location-time-urban-data-visualization/
  • 50. “In God we trust; All others must bring data.” - W.EDWARD DEMING
  • 51. AWS provides a host of services to address an organization’s data lifecycle and analytics requirements.😌

Notas del editor

  1. Array, Linked list, stack, queues are some of basic data structure. But when it comes to Big Data, we have other. Discussed Later. This is just Definition.
  2. The types of Data in the Big Data world: Structured, Unstructured and Semi Structured data. Discussed Later.
  3. Ask students, How to store it efficiently? Different Data structure and type need to be prepared differently. We cannot just adjust schema free data into Relational database. We must “prepare” the data correctly.
  4. We must develop a general idea of how all data is generated, collected and stored so that we can find data that is relevant, process it, and analyze to extract the hidden insights.
  5. We process the data to discover meaningful patterns in our data and with the information we make decisions to make our businesses more profitable and secure.
  6. In the Traditional Architecture, data was mostly collected in Structured or tabular format and handled via RDBMS (Relational Database Management Systems). But now, data is generated in unimaginable way and it is not structured as well.
  7. The amount of data that one has to process has boomed to unimaginable levels in the past decade. It’s important that organizations find ways to manage and analyze it so that they can act on the data and make important business decisions.
  8. Estimated by IBM in “2012”. Its 2021, Think how much data must have been generated in these years. Link to the post: https://www.facebook.com/IBM/posts/90-of-the-data-in-the-world-today-has-been-created-in-the-last-two-years/293229680748471/
  9. Analyzing large data sets requires significant compute capacity that can vary in size based on the amount of input data and the type of analysis. AWS provides the infrastructure and tools to tackle such large datasets with pay-as-you-go cloud computing model.
  10. Depending upon the type of data or how it is structured(the data structure), we have various kinds of databases.