SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Data Science
Harnessing Open Data for high impact solutions
About:Me
Mohd Izhar Firdaus Ismail
- Current: Solution Architect @ ABYRES Enterprise
Technologies Sdn Bhd
- Open Source Activist & (self-proclaimed) Hacker, Open Data
Advocate, Fedora Ambassador, Data Architect, Data Engineer,
Consultant, Python Programmer, Analyst, Trainer, and bunch of
other hats ;-)
- Contributing to Open Source projects for over 8 years
- Over 6 years building systems related to data, content,
information and knowledge management
- http://linkedin.com/in/kagesenshi
Disclaimer:
Some people call me a data scientist,
But I don't consider myself one (yet)
(( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point ))
But I do work a lot with data: designing application, infrastructure,
algorithms, processes and pipelines for big data workload – from data
acquisition to visualization
"Real" Data Scientists
are one heck of a super(wo)man
Infographic source: MarketingDistillery.com
Open Data Apps Around The World
What you can do with quality Open Data
(and a glimpse of what nice stuff other people have ^.^)
Data.gov (United States)
- One of the earliest Government Open
Data initiative
- Over 159576 dataset from all over US
government agencies (as of 14th
Aug
2015)
- NGOs such as Code For America
building apps using data from it
- Companies leveraging on data for
their own startups and business
Data.gov : Alternative Fuels Station Locator
Benefit / Impact:
Help individuals
locate nearby
alternative fuel
stations (electric,
hydrogen, biodiesel,
etc)
Data from:
US Department of
Energy
Data.gov : Climate.com
Benefit / Impact:
Help farmers plan their
farming activities based
on weather conditions
Data from:
- National Weather
Service,
- US Geological Survey
- National Aeronautics
and Space
Administration
Data.gov : College Affordability and Transparency Center
Benefit / Impact:
Enable students to make
informed decision on choosing
where to further their studies
based on their budget
Data from:
Department of Education –
National Center for Education
Statistics
Data.gov.uk (United Kingdom)
- 1st
ranking in international
Open Data Initiative (ODI)'s
Open Data Barometer
- Over 22946 dataset (as of
14th
Aug 2015)
- 378 apps (as of 14th
Aug
2015)
Data.gov.uk : CrimeInEngland.co.uk
Benefit / Impact
Enable citizen to be
more aware of crime
rate in their area, and
take necessary
measures
Data from:
UK HomeOffice
Data.gov.uk : WhereDoesMyMoneyGo.org
Benefit / Impact
Better government
transparency. More
informed citizens on
tax spendings.
Data from:
UK Her Majesty
Treasury
Getting Started
Some tips for beginners
Bulk of your data
related work would
be in cleaning data
- Excel to JSON/CSV
- PDF to JSON/CSV
- Unstructured to structured
- Joining multiple data sources into one, where
joining key is not obvious
- Normalizing duplicates, errors, typos, language, etc
- Dealing with inconsistent schema of historical data
- Extracting more features of data points
- Enriching data with more useful information (eg: long,lat)
- Dealing with data that was poorly collected
- Dealing with aggregated data that is not quite useful
- Real-life data is a mess: SNAFU ;-)
Analytic Tools & Platform
Plenty Open Source Tools available
- Simple data and analysis can be done without the need of complex Big Data
ecosystem. A ${YourFavouriteLanguage} executable is usually more than
enough to transform, clean, explore data to get initial insights and understanding
- I speak mostly in snake language, so naturally I prefer Python stuff ;-)
– Python is a strong language in scientific computing due to its history in mathematics, its
rich open source library ecosystem, and its simplicity for rapid experimentation
– Pandas, numpy, scipy, pymapreduce, xlrd, pyexcel, scikit, luigi, vaderSentiment, etc
- D3.js is highly recommended for development of data driven visualizations for
web
– Plenty of other javascript libraries to help render beautiful diagrams
My Personal
Favourites :
IPython Notebook & Python libraries
Apache Zeppelin, PySpark
& Python libs
"Small" data
"Big data"
Hortonworks HDP Sandbox
(Pig, Hive, Spark, and friends)
Amazon EMR
(large cluster to crunch your data)
Goodluck!!
And most importantly,
Have Fun!!
Izhar Firdaus <izhar@abyres.net>
http://linkedin.com/in/kagesenshi

Más contenido relacionado

La actualidad más candente

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Vignesh Prajapati
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 

La actualidad más candente (20)

Big data and data science overview
Big data and data science overviewBig data and data science overview
Big data and data science overview
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
Data Science
Data ScienceData Science
Data Science
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Data science
Data scienceData science
Data science
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business NeedsBig Data: The 6 Key Skills Every Business Needs
Big Data: The 6 Key Skills Every Business Needs
 
data science
data sciencedata science
data science
 

Similar a Data Science: Harnessing Open Data for High Impact Solutions

EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
European Data Forum
 

Similar a Data Science: Harnessing Open Data for High Impact Solutions (20)

Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
 
Department of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data DashboardsDepartment of Commerce App Challenge: Big Data Dashboards
Department of Commerce App Challenge: Big Data Dashboards
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.euData management plans – EUDAT Best practices and case study | www.eudat.eu
Data management plans – EUDAT Best practices and case study | www.eudat.eu
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptx
 
Big Data: Big Issues for IP
Big Data: Big Issues for IPBig Data: Big Issues for IP
Big Data: Big Issues for IP
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open DataODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
ODI Node Vienna: Best Practise Beispiele für: Open Innovation mittels Open Data
 
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data ApproachBriefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data Approach
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
 
Dart ord the citizen's persepctive-20141107
Dart ord the citizen's persepctive-20141107Dart ord the citizen's persepctive-20141107
Dart ord the citizen's persepctive-20141107
 
data analytics lecture2.pptx
data analytics lecture2.pptxdata analytics lecture2.pptx
data analytics lecture2.pptx
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
On Big Data
On Big DataOn Big Data
On Big Data
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Data Science: Harnessing Open Data for High Impact Solutions

  • 1. Data Science Harnessing Open Data for high impact solutions
  • 2. About:Me Mohd Izhar Firdaus Ismail - Current: Solution Architect @ ABYRES Enterprise Technologies Sdn Bhd - Open Source Activist & (self-proclaimed) Hacker, Open Data Advocate, Fedora Ambassador, Data Architect, Data Engineer, Consultant, Python Programmer, Analyst, Trainer, and bunch of other hats ;-) - Contributing to Open Source projects for over 8 years - Over 6 years building systems related to data, content, information and knowledge management - http://linkedin.com/in/kagesenshi
  • 3. Disclaimer: Some people call me a data scientist, But I don't consider myself one (yet) (( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point )) But I do work a lot with data: designing application, infrastructure, algorithms, processes and pipelines for big data workload – from data acquisition to visualization
  • 4. "Real" Data Scientists are one heck of a super(wo)man Infographic source: MarketingDistillery.com
  • 5.
  • 6. Open Data Apps Around The World What you can do with quality Open Data (and a glimpse of what nice stuff other people have ^.^)
  • 7. Data.gov (United States) - One of the earliest Government Open Data initiative - Over 159576 dataset from all over US government agencies (as of 14th Aug 2015) - NGOs such as Code For America building apps using data from it - Companies leveraging on data for their own startups and business
  • 8. Data.gov : Alternative Fuels Station Locator Benefit / Impact: Help individuals locate nearby alternative fuel stations (electric, hydrogen, biodiesel, etc) Data from: US Department of Energy
  • 9. Data.gov : Climate.com Benefit / Impact: Help farmers plan their farming activities based on weather conditions Data from: - National Weather Service, - US Geological Survey - National Aeronautics and Space Administration
  • 10. Data.gov : College Affordability and Transparency Center Benefit / Impact: Enable students to make informed decision on choosing where to further their studies based on their budget Data from: Department of Education – National Center for Education Statistics
  • 11. Data.gov.uk (United Kingdom) - 1st ranking in international Open Data Initiative (ODI)'s Open Data Barometer - Over 22946 dataset (as of 14th Aug 2015) - 378 apps (as of 14th Aug 2015)
  • 12. Data.gov.uk : CrimeInEngland.co.uk Benefit / Impact Enable citizen to be more aware of crime rate in their area, and take necessary measures Data from: UK HomeOffice
  • 13. Data.gov.uk : WhereDoesMyMoneyGo.org Benefit / Impact Better government transparency. More informed citizens on tax spendings. Data from: UK Her Majesty Treasury
  • 14. Getting Started Some tips for beginners
  • 15. Bulk of your data related work would be in cleaning data - Excel to JSON/CSV - PDF to JSON/CSV - Unstructured to structured - Joining multiple data sources into one, where joining key is not obvious - Normalizing duplicates, errors, typos, language, etc - Dealing with inconsistent schema of historical data - Extracting more features of data points - Enriching data with more useful information (eg: long,lat) - Dealing with data that was poorly collected - Dealing with aggregated data that is not quite useful - Real-life data is a mess: SNAFU ;-)
  • 16. Analytic Tools & Platform Plenty Open Source Tools available - Simple data and analysis can be done without the need of complex Big Data ecosystem. A ${YourFavouriteLanguage} executable is usually more than enough to transform, clean, explore data to get initial insights and understanding - I speak mostly in snake language, so naturally I prefer Python stuff ;-) – Python is a strong language in scientific computing due to its history in mathematics, its rich open source library ecosystem, and its simplicity for rapid experimentation – Pandas, numpy, scipy, pymapreduce, xlrd, pyexcel, scikit, luigi, vaderSentiment, etc - D3.js is highly recommended for development of data driven visualizations for web – Plenty of other javascript libraries to help render beautiful diagrams
  • 17. My Personal Favourites : IPython Notebook & Python libraries Apache Zeppelin, PySpark & Python libs "Small" data "Big data" Hortonworks HDP Sandbox (Pig, Hive, Spark, and friends) Amazon EMR (large cluster to crunch your data)
  • 18. Goodluck!! And most importantly, Have Fun!! Izhar Firdaus <izhar@abyres.net> http://linkedin.com/in/kagesenshi