Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Data Structure and Types

Cargando en…3

Eche un vistazo a continuación

1 de 51 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Data Structure and Types (20)


Más reciente (20)

Data Structure and Types

  1. 1. AWS Certified Data Analytics (Amazon Web Service)
  2. 2. Data Structure & Types
  3. 3. Knowledge Check ... For now, ask yourself, 1. Why do we need data ? 2. Why do we need to store it efficiently? How to store it efficiently? 3. How and where to persist data? Hint: Maybe “Excel” 🤦♂️ 4. What insights do we get after analyzing it?
  4. 4. Data Structure(s) a data organization, management, and storage format that enables efficient access and modification. 🤔 (boring “Wikipedia” definition) a way of organizing the data so that it can be used efficiently. 😀
  5. 5. Data Type(s) An attribute of data to indicate what type of data we are storing or manipulating. it tells us what kind of data we are dealing with.😀
  6. 6. Preparing the data … “correctly ✅” ... This is where understanding the different types of data and data structures comes in handy. There isn’t one great way of storing. Every organization store data differently.
  7. 7. Initially, develop a general idea of how all data is being ● Generated ● Collected ● Stored Then only, ● We can find data that is “relevant”, ● Process it, and ● Analyze to gain insights.
  8. 8. Why does data matter? 🤔 Data is the most valuable commodity in the world. Data has value or can have value. We want to store data in such a way that it will be easier to manipulate and gain insights.
  9. 9. Once upon a time … 👴 Data was structured and stored across multiple tables and managed by RDBMS. The computational power to process the data, at that time, was low. Social networks, smart phones and IOT devices, video streaming platforms … these “data sources” were still in their early days.
  10. 10. Some years later … ⏩ As we become a more digital society, the amount of data being created and collected is growing and accelerating significantly. Analysis of this ever-growing data becomes a challenge with traditional analytical tools. “DATA IS EVERYWHERE” AND IT IS UNSTRUCTURED MOSTLY
  11. 11. 90% of the data in the world today has been created in the last two years. 😲
  12. 12. Why AWS ? Amazon Web Services (AWS) provides a broad platform of managed services to help you build, secure, and seamlessly scale end-to-end big data applications quickly and with ease. We require innovation to bridge the gap between data being generated and data that can be analyzed effectively.💡
  13. 13. But wait, What is Big Data ? Data, so large and complex, that exceeds the processing capacity of conventional database systems. 3 V’s of Big Data: ● Volume: refers to size of data we are dealing with. ● Variety: refers to fact that data is coming from various sources and in different formats. ● Velocity: refers to the speed at which data is being generated. There can be more V’s. So, any data that crashes Excel is “Big Data”.😬
  14. 14. Data from where ? 🤔 Ask yourself, ● Where does data come from ? ● How is such huge data being generated ? ● Is the data even relevant or from valid sources ? ● Who is storing the data ?
  15. 15. Data sources … IOT devices, sensors, CCTV Social Networks and Search Engines Stock Exchange Data Online Shopping, Retail Data Log files ERPs, CRMs systems Healthcare Industry, Insurance Airlines Data Financial Data Geographical Data SO MUCH MORE!!!
  16. 16. Structured , Unstructured and Semi Structured Data Structured data has a defined schema. This type of data is well organized. e.g. Relational Data. Unstructured Data has no defined schema or structural properties. It makes up the majority of data collected. e.g. Audio/Video, Images, Binary data. Semi Structured Data is somewhere in the middle. This data is too unstructured for relational data but has some organizational structure. e.g. XML data.
  17. 17. Source:
  18. 18. Data LifeCycle Stages: 1. Data Ingestion 2. Data Staging 3. Data Cleansing 4. Data Analytics and Visualization 5. Data Archive
  19. 19. Data Ingestion: The movement of data from an external source to another location for analysis. Data Staging: It involves performing housekeeping tasks prior to making data available to users. Data Cleansing: Before data is analyzed, data cleansing detects, corrects, and removes inaccurate data or corrupted records or files. Data Analytics and Visualization: The real value of data can be extracted in this stage. Decision-makers use analytics and visualization tools to predict customer needs, improve operations, transform broken processes, and innovate to compete. Data Archiving: The AWS Cloud facilitates data archiving, enabling IT departments to invest more time in other stages of the data lifecycle.
  21. 21. Data Stores (place to keep the data) 😀
  22. 22. Data Integrity 🤔 it just means the accuracy, completeness, and quality of data as it’s maintained over time and across formats.
  23. 23. Database Consistency The database must remain in a consistent state after any transaction. A consistent transaction will not violate integrity constraints placed on the data by the database rules.
  24. 24. ETL(Extract-Transform-Load) a way to integrate data into a single location. 😀 ETL is a recurring activity (daily, weekly, monthly) of a Data Warehouse system and needs to be agile, automated, and well documented.
  25. 25. Source:
  26. 26. ETL ... similar to ELT (Extract Load Transform) ELT inverts the last two stages of the ETL process, meaning that after being extracted from databases, data is loaded straight into a central repository where all transformations occur.
  27. 27. Analyzing Data … (becoming “Sherlock” 🤔🤔♂️) Understanding the real value contained within the data and with those insights we can make business decisions. extracting information from data to support decision making.
  28. 28. Visualizing Data ... 📈 presentation of data in a pictorial or graphical format. The way human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports.
  29. 29. Common Data Visualization Ways Source:
  31. 31. “In God we trust; All others must bring data.” - W.EDWARD DEMING
  32. 32. AWS provides a host of services to address an organization’s data lifecycle and analytics requirements.😌

Notas del editor

  • Array, Linked list, stack, queues are some of basic data structure. But when it comes to Big Data, we have other. Discussed Later. This is just Definition.
  • The types of Data in the Big Data world: Structured, Unstructured and Semi Structured data. Discussed Later.
  • Ask students, How to store it efficiently? Different Data structure and type need to be prepared differently. We cannot just adjust schema free data into Relational database. We must “prepare” the data correctly.

  • We must develop a general idea of how all data is generated, collected and stored so that we can find data that is relevant, process it, and analyze to extract the hidden insights.
  • We process the data to discover meaningful patterns in our data and with the information we make decisions to make our businesses more profitable and secure.
  • In the Traditional Architecture, data was mostly collected in Structured or tabular format and handled via RDBMS (Relational Database Management Systems). But now, data is generated in unimaginable way and it is not structured as well.
  • The amount of data that one has to process has boomed to unimaginable levels in the past decade. It’s important that organizations find ways to manage and analyze it so that they can act on the data and make important business decisions.
  • Estimated by IBM in “2012”. Its 2021, Think how much data must have been generated in these years. Link to the post:
  • Analyzing large data sets requires significant compute capacity that can vary in size based on the amount of input data and the type of analysis. AWS provides the infrastructure and tools to tackle such large datasets with pay-as-you-go cloud computing model.
  • Depending upon the type of data or how it is structured(the data structure), we have various kinds of databases.