SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Introduction to Basic Data
Analytics Tools
Sa-ad Mahmud
What is Data Analytics?
Data analytics is the science of analyzing raw data in order to make
conclusions about that information.
Data Analytics Pipeline
Collect Refine Store Analyze Presentation
Data Acquisition
How To Collect Data!
★ REST API
★ From end users
★ Web scrape
★ Email and cloud storage
★ Client’s server
Requests
A library for making HTTP
requests in Python.
Tool for Data Acquisition:
Key Features:
• Keep-alive & Connection Pooling
• Sessions with Cookie Persistence
BeautifulSoup
A library for parsing HTML
and XML documents.
Tool for Data Acquisition:
Key Features:
• Multiple parser support (e.g., lxml,
html5lib, and others)
• Creates parse tree which is easy to
navigate
Flask, Flask-RESTPlus
and Swagger UI
Flask is a micro web framework
written in Python.
Flask-RESTPlus is an extension for
Flask that adds support for
quickly building REST APIs. It
automatically documents the APIs
which is visible in Swagger UI.
Tools for Data Acquisition:
Data Pre-Processing and
Storage
How To Clean Data!
★ Remove duplicate
★ Validate
★ Handle missing data
★ Fix errors
★ Filter outliers
How To Store Data!
★ RDBMS
★ ORM
Pandas
A library for data
manipulation and analysis.
Tool for Data Manipulation:
Key Features:
• Loading data into in-memory data
objects from different file formats.
• Data alignment and integrated
handling of missing data.
SQLAlchemy
SQLAlchemy is a popular
SQL toolkit and Object
Relational Mapper.
Tool for Database Operations:
Key Features:
• Function-based query construction.
• Multiple database support (e.g.,
SQLite, Postgresql, MySQL, Oracle,
MS-SQL, Firebird, Sybase and
others).
Data Analysis
How To Analyze Data!
★ Five number summary
(maximum, minimum, median,
1st quartile, 3rd quartile)
★ Average
★ Standard Deviation
★ Ratio
★ Interval
★ Trends
★ Aggregate and group by
★ Regression
★ Clustering
R and RStudio
R is a popular
programming language
for data analysis. RStudio
is an IDE for R.
Tools for Data Analysis:
Original Classes Clusters by k-means
Data Presentation
How To Visualize Data!
★ Charts
○ Line
○ Bar
○ Pie
○ Scatter
★ Graphs
★ Maps
○ Bubble
○ Polygon
★ Dashboards
Plotly
An interactive graphing
library.
Tools for Data Visualization:
Matplotlib
A plotting library for
Python.
Apache Superset
A Data Visualization and Data
Exploration Platform.
Tool for Data Visualization:
Key Features:
• It supports all the data sources that support SQL
Alchemy and supports querying using SQL.
• Superset allows sharing dashboards.
• It comes with security features like Authentication,
User Management and Roles.
Other Notable Tools
★ Excel
★ Tableau Public
★ Grafana
★ Microsoft Power BI
★ And many more . . .
Challenges
1. Poor quality data
2. Data privacy and security
3. Weak infrastructure
4. Data from multiple sources
5. Scaling data analysis
Links
Flask: https://flask.palletsprojects.com/en/2.0.x/
Flask-RESTPlus: https://flask-restplus.readthedocs.io/en/stable/
SQLAlchemy: https://www.sqlalchemy.org/
R Programming Language: https://www.r-project.org/
k-means Clustering: https://en.wikipedia.org/wiki/K-means_clustering
Plotly: https://plotly.com/
Superset Docs: https://superset.apache.org/docs/intro
Presentation GitHub Link: https://github.com/saadrumon/basic-data-analytics-tools-presentation.git
Thank You
Any Questions?
“Information is the oil of the 21st century, and analytics is the combustion engine.”
- Peter Sondergaard

Más contenido relacionado

La actualidad más candente

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
Ana Jofre
 

La actualidad más candente (20)

Data Visualization.pptx
Data Visualization.pptxData Visualization.pptx
Data Visualization.pptx
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Statistics and data science
Statistics and data scienceStatistics and data science
Statistics and data science
 
R programming
R programmingR programming
R programming
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
 
Data analytics
Data analyticsData analytics
Data analytics
 
Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data Analytics
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?What Is Unstructured Data And Why Is It So Important To Businesses?
What Is Unstructured Data And Why Is It So Important To Businesses?
 
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Outliers
OutliersOutliers
Outliers
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data analytics vs. Data analysis
Data analytics vs. Data analysisData analytics vs. Data analysis
Data analytics vs. Data analysis
 

Similar a Introduction to basic data analytics tools

Similar a Introduction to basic data analytics tools (20)

Advanced Data Analytics techniques .pptx
Advanced Data Analytics techniques .pptxAdvanced Data Analytics techniques .pptx
Advanced Data Analytics techniques .pptx
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Python for ML
Python for MLPython for ML
Python for ML
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Apache Druid 101
Apache Druid 101Apache Druid 101
Apache Druid 101
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
Internship.pptx
Internship.pptxInternship.pptx
Internship.pptx
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 

Más de Nascenia IT

Más de Nascenia IT (20)

AI Tools for Productivity: Exploring Prompt Engineering and Key Features
AI Tools for Productivity: Exploring Prompt Engineering and Key FeaturesAI Tools for Productivity: Exploring Prompt Engineering and Key Features
AI Tools for Productivity: Exploring Prompt Engineering and Key Features
 
Communication workshop in nascenia
Communication workshop in nasceniaCommunication workshop in nascenia
Communication workshop in nascenia
 
The Art of Statistical Deception
The Art of Statistical DeceptionThe Art of Statistical Deception
The Art of Statistical Deception
 
করোনায় কী করি!
করোনায় কী করি!করোনায় কী করি!
করোনায় কী করি!
 
GDPR compliance expectations from the development team
GDPR compliance expectations from the development teamGDPR compliance expectations from the development team
GDPR compliance expectations from the development team
 
Writing Clean Code
Writing Clean CodeWriting Clean Code
Writing Clean Code
 
History & Introduction of Neural Network and use of it in Computer Vision
History & Introduction of Neural Network and use of it in Computer VisionHistory & Introduction of Neural Network and use of it in Computer Vision
History & Introduction of Neural Network and use of it in Computer Vision
 
Ruby on Rails: Coding Guideline
Ruby on Rails: Coding GuidelineRuby on Rails: Coding Guideline
Ruby on Rails: Coding Guideline
 
iphone 11 new features
iphone 11 new featuresiphone 11 new features
iphone 11 new features
 
Software quality assurance and cyber security
Software quality assurance and cyber securitySoftware quality assurance and cyber security
Software quality assurance and cyber security
 
Job Market Scenario For Freshers
Job Market Scenario For Freshers Job Market Scenario For Freshers
Job Market Scenario For Freshers
 
Modern Frontend Technologies (BEM, Retina)
Modern Frontend Technologies (BEM, Retina)Modern Frontend Technologies (BEM, Retina)
Modern Frontend Technologies (BEM, Retina)
 
CSS for Developers
CSS for DevelopersCSS for Developers
CSS for Developers
 
Big commerce app development
Big commerce app developmentBig commerce app development
Big commerce app development
 
Integrating QuickBooks Desktop with Rails Application
Integrating QuickBooks Desktop with Rails ApplicationIntegrating QuickBooks Desktop with Rails Application
Integrating QuickBooks Desktop with Rails Application
 
Shopify
ShopifyShopify
Shopify
 
TypeScript: Basic Features and Compilation Guide
TypeScript: Basic Features and Compilation GuideTypeScript: Basic Features and Compilation Guide
TypeScript: Basic Features and Compilation Guide
 
Clean code
Clean codeClean code
Clean code
 
Ruby conf 2016 - Secrets of Testing Rails 5 Apps
Ruby conf 2016 - Secrets of Testing Rails 5 AppsRuby conf 2016 - Secrets of Testing Rails 5 Apps
Ruby conf 2016 - Secrets of Testing Rails 5 Apps
 
COREXIT: Microsoft’s new cross platform framework
COREXIT: Microsoft’s new cross platform frameworkCOREXIT: Microsoft’s new cross platform framework
COREXIT: Microsoft’s new cross platform framework
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Introduction to basic data analytics tools

  • 1. Introduction to Basic Data Analytics Tools Sa-ad Mahmud
  • 2. What is Data Analytics? Data analytics is the science of analyzing raw data in order to make conclusions about that information. Data Analytics Pipeline Collect Refine Store Analyze Presentation
  • 4. How To Collect Data! ★ REST API ★ From end users ★ Web scrape ★ Email and cloud storage ★ Client’s server
  • 5. Requests A library for making HTTP requests in Python. Tool for Data Acquisition: Key Features: • Keep-alive & Connection Pooling • Sessions with Cookie Persistence
  • 6. BeautifulSoup A library for parsing HTML and XML documents. Tool for Data Acquisition: Key Features: • Multiple parser support (e.g., lxml, html5lib, and others) • Creates parse tree which is easy to navigate
  • 7. Flask, Flask-RESTPlus and Swagger UI Flask is a micro web framework written in Python. Flask-RESTPlus is an extension for Flask that adds support for quickly building REST APIs. It automatically documents the APIs which is visible in Swagger UI. Tools for Data Acquisition:
  • 9. How To Clean Data! ★ Remove duplicate ★ Validate ★ Handle missing data ★ Fix errors ★ Filter outliers How To Store Data! ★ RDBMS ★ ORM
  • 10. Pandas A library for data manipulation and analysis. Tool for Data Manipulation: Key Features: • Loading data into in-memory data objects from different file formats. • Data alignment and integrated handling of missing data.
  • 11. SQLAlchemy SQLAlchemy is a popular SQL toolkit and Object Relational Mapper. Tool for Database Operations: Key Features: • Function-based query construction. • Multiple database support (e.g., SQLite, Postgresql, MySQL, Oracle, MS-SQL, Firebird, Sybase and others).
  • 13. How To Analyze Data! ★ Five number summary (maximum, minimum, median, 1st quartile, 3rd quartile) ★ Average ★ Standard Deviation ★ Ratio ★ Interval ★ Trends ★ Aggregate and group by ★ Regression ★ Clustering
  • 14. R and RStudio R is a popular programming language for data analysis. RStudio is an IDE for R. Tools for Data Analysis: Original Classes Clusters by k-means
  • 16. How To Visualize Data! ★ Charts ○ Line ○ Bar ○ Pie ○ Scatter ★ Graphs ★ Maps ○ Bubble ○ Polygon ★ Dashboards
  • 17. Plotly An interactive graphing library. Tools for Data Visualization: Matplotlib A plotting library for Python.
  • 18. Apache Superset A Data Visualization and Data Exploration Platform. Tool for Data Visualization: Key Features: • It supports all the data sources that support SQL Alchemy and supports querying using SQL. • Superset allows sharing dashboards. • It comes with security features like Authentication, User Management and Roles.
  • 19. Other Notable Tools ★ Excel ★ Tableau Public ★ Grafana ★ Microsoft Power BI ★ And many more . . .
  • 20. Challenges 1. Poor quality data 2. Data privacy and security 3. Weak infrastructure 4. Data from multiple sources 5. Scaling data analysis
  • 21. Links Flask: https://flask.palletsprojects.com/en/2.0.x/ Flask-RESTPlus: https://flask-restplus.readthedocs.io/en/stable/ SQLAlchemy: https://www.sqlalchemy.org/ R Programming Language: https://www.r-project.org/ k-means Clustering: https://en.wikipedia.org/wiki/K-means_clustering Plotly: https://plotly.com/ Superset Docs: https://superset.apache.org/docs/intro Presentation GitHub Link: https://github.com/saadrumon/basic-data-analytics-tools-presentation.git
  • 22. Thank You Any Questions? “Information is the oil of the 21st century, and analytics is the combustion engine.” - Peter Sondergaard