SlideShare una empresa de Scribd logo
1 de 87
Getting comfortable with data 
@ritvvijparrikh, Data Designer, pykih.com 
d|Bootcamp, http://delhi.dbootcamp.org, September 5, 2014
About me 
I help organisations make sense (visual or otherwise) of data. 
2005 University: Neural Net based Market Prediction Stock Market Data 
2006 Software Developer at Amdocs 
2011 
Design Lead 
Amdocs Sales Team to AT&T 
Product Manager at samhita.org 
Founded TracksGiving - analytics for charities 
2013 Founded pykih - data visualisation 
Telecom data for AT&T 
Donation data 
FirstPost 
Journalism++ Cologne / Datawrapper 
Microsoft 
visual.ly 
NarendraModi.in
How developers look at data? 
v/s 
How do journalists look at data?
How developers look at data? 
v/s 
How do journalists look at data?
Article in English 
WHO does not marvel at the 
prospect of India going to the polls? 
Starting on April 7th, illiterate 
villagers and destitute slum-dwellers 
will have an equal say alongside 
Mumbai’s millionaires in picking their 
government. Almost 815m citizens 
are eligible to cast their ballots in 
nine phases of voting over five 
weeks—the largest collective 
democratic act in history. 
! 
But who does not also deplore the 
fecklessness and venality of India’s 
politicians? The country is teeming 
with problems, but a decade under a 
coalition led by the Congress party 
has left it rudderless. Growth…
English article on Genomics 
Assumption: Unknown domain 
Genomics is a discipline in 
genetics that applies recombinant 
DNA, DNA sequencing methods, 
and bioinformatics to sequence, 
assemble, and analyze the 
function and structure of genomes 
(the complete set of DNA within a 
single cell of an organism).[1][2] 
Advances in genomics have 
triggered a revolution in discovery-based 
research to understand 
even the most complex biological 
systems such as brain.[3] The field 
includes efforts to determine the 
entire DNA sequence of organisms 
and fine-scale genetic mapping. 
The field also includes studies of 
intragenomic phenomena …
Conclusion 
We are comfortable with 
unknown material in English
Objective: Be comfortable with unknown data sets
Objectives 
Can we build Data Comprehension skills?
Objectives 
• What is data made up of?! 
• Data File Formats! 
• Where is the story worthy data! 
• Data Types! 
• Properties of Data! 
• Insights / Recipes for stories! 
• Data Aggregation! 
• Basic Spreadsheet Functions
About me 
Data on Glass Manufacturing Factory Floor in German Language 
Unknown domain. Unknown language. 
We still modelled the data correctly.
Let’s dive into basics 
What is data made up of
Where does data come from? 
Human! 
Actions / experiences 
Wind
Where does data come from? 
Documented Data 
Human! 
Actions / experiences 
Wind Documenting
Where does data come from? 
Documented Data Insights 
Human! 
Actions / experiences 
Wind Documenting Sea Travel
Where does data come from? 
Documented Data Insights 
Human! 
Actions / experiences 
Wind Documenting Sea Travel 
What am I doing Twitter Sentiment about Budget
Where does data come from? 
Documented Data Insights 
Human! 
Actions / experiences 
Wind Documenting 
What am I doing Twitter 
Sea Travel 
Sentiment about Budget 
Vote Election Commission Political Change
Where does data come from? 
Human! 
Actions / experiences 
Documented Data Insights 
Wind Documenting 
What am I doing Twitter 
Sea Travel 
Sentiment about Budget 
Vote Election Commission Political Change 
State Dept. Wires Wikileaks Backdoor Foreign Policy
What has changed? 
Human 
Actions / experiences 
Documented Data Insights 
Wind Documenting 
What am I doing Twitter 
Sea Travel 
Sentiment about Budget 
Vote Election Commission Political Change 
State Dept. Wires Wikileaks Backdoor Foreign Policy
Technology 
Human 
Actions / experiences 
Documented Data Insights 
Wind Documenting 
Sea Travel 
What am I doing Twitter Sentiment about Budget 
Vote Election Commission Political Change 
State Dept. Wires Wikileaks Backdoor Foreign Policy
Struggling with 
Human 
Actions / experiences 
Documented Data Insights 
Grasp 
Wind Documenting 
Sea Travel 
What am I doing Twitter Sentiment about Budget 
Vote Election Commission Political Change 
State Dept. Wires Wikileaks Backdoor Foreign Policy
Struggling with 
Human 
Actions / experiences 
Documented Data Insights 
Story 
Grasp 
Wind Documenting 
Sea Travel 
What am I doing Twitter Sentiment about Budget 
Vote Election Commission Political Change 
State Dept. Wires Wikileaks Backdoor Foreign Policy
What is data made up of? 
Human 
Actions / experiences 
Documented Data Insights 
Wind Documenting 
Domain ! 
Human context 
Meta data! 
How is it stored 
Sea Travel
Data Comprehension 
Human 
Actions / experiences 
Documented Data Insights 
Wind Documenting 
Domain ! 
Human context 
Meta data! 
How is it stored 
Grammar of the Data 
Sea Travel
Documented Data Insights 
Let’s dive into basics 
Data Formats 
Human! 
Actions / experiences
How is the data stored? 
Format is a pre-defined structure in which 1s’ 
and 0s’ are stored to for a software to read it.
How is the data stored? 
! 
! 
Data Designed for! 
Data and ! 
Formatting! 
Humans 
Designed for! 
Machine
Machine readable data is for us 
! 
! 
Data Designed for! 
Data and ! 
Formatting! 
Humans 
Designed for! 
Machine 
Our objective to discover story in data. Formatting will unnecessarily come in the way.
Tabular v/s Document data 
Designed for! 
Humans 
Designed for! 
Machine 
Tabular 
Document
Scraping / API Integration 
Designed for! 
Humans 
Designed for! 
Machine 
Tabular 
Scrape / API 
Document 
New Terms: PDF Scraping. Web Scraping. API Integration. 
Developer
Machine readable Tabular data formats 
Designed for! 
Humans 
Designed for! 
Machine 
Tabular 
Scrape / API 
Document
* separated values files 
| (pipe) acts as a delimiters allowing us to identify columns 
new lines 
help identify 
rows 
Extend this concept, and you get ! 
! 
Comma Separated Value files! 
Pipe Separated Value files! 
Semicolon Separated Value files! 
Tab Separated Value files …
FYI - Data Formats 
Designed for! 
Humans 
Designed for! 
Machine 
Tabular 
Scrape / API 
Document
Let’s open a Government Data Set
Whom was this created for? 
Designed for! 
Machine 
Document 
Designed for! 
Humans 
Tabular
Whom was this created for? 
Designed for 
Humans 
Designed for! 
Machine 
Tabular 
Document 
Horizontal
Machine readable data is for us 
! 
! 
Data Designed for! 
Data and ! 
Formatting! 
Humans 
Designed for! 
Machine 
Our objective to discover story in data. Formatting will unnecessarily come in the way. 
Recap
What we want is 
Vertical
Documented Data Insights 
Let’s dive into basics 
Human! 
Actions / experiences 
Where is the find story-worthy data
Where is all the story worthy data sitting? 
• data.gov.in! 
• RBI.org.in! 
• mospi.nic.in ! 
• planningcommission.nic.in! 
• unicef.org/statistics! 
• indiabudget.nic.in! 
• ncrb.nic.in! 
• mha.nic.in! 
• dise.in! 
• World Bank! 
• Oxfam! 
• IMF! 
• World Health Organisation! 
• …
It could also be in… 
• data.gov.in! 
• RBI.org.in! 
• mospi.nic.in ! 
• planningcommission.nic.in! 
• unicef.org/statistics! 
• indiabudget.nic.in! 
• ncrb.nic.in! 
• mha.nic.in! 
• dise.in! 
• World Bank! 
• Oxfam! 
• IMF! 
• World Health Organisation! 
• … 
• Tweets! 
• Stock Market! 
• Politician’s speeches! 
• Other news articles! 
• Wiki Leaks! 
• Police FIR reports! 
• Survey! 
• Blogs! 
• Cell phone tower logs
Documented Data Insights 
Let’s dive into basics 
Grammar of the Data 
Human! 
Actions / experiences
Datatype 
Journalist 
• String! 
• Number
Datatype 
Journalist Developer 
• String! 
• Number 
• String! 
• Number! 
• Decimal / Float / Scientific ! 
• Boolean! 
• Date! 
• Date Time! 
• Time
Datatype 
String Hello 
Number 3 
Float 3.03 
Boolean Yes / No, True / False 
Date 3 Feb 2014 
Date Time 3 Feb 2014 1 am 
Time 1am 
Blank / Empty / Null
Datatypes in Google Spreadsheet
Formatting 
Things you do to make the data more Human readable.!
Formatting 
Things you do to make the data more Human readable.! 
Data Formated Data 
3 3% 
3.03 $3.03 
34950683 3,49,50,683 
34950683 34.950683 Million 
Rounding Up 35 Million
Formatting in Google Spreadsheets
Formatting is for presentation purposes only 
Stay away from tools that do not format for presentation only. E.g. Round, Currency.
What if Formatting is not used for presentation? 
Things you do to make the data more Human readable.! 
Data Data type Formated Data Data type 
3 Number 3% String 
3.03 Float $3.03 String 
34950683 Number 3,49,50,683 String 
34950683 Number 34.950683 Million String 
Rounding Up Number 35 Million String
Properties of Data 
Quantitative ! 
! 
• is things you ADD e.g. number of sandwiches ! 
! 
Qualitative ! 
! 
• that tell you ATTRIBUTES e.g. staleness of sandwich, veg or non-veg
Properties of Data … 
Quantitative ! 
! 
• e.g. number of sandwiches ! 
• Always a number 
Qualitative ! 
! 
• e.g. staleness of sandwich, veg or non-veg! 
• May or may not be a number e.g. number of days ago it was manufactured!
Properties of Data … 
Quantitative ! 
! 
• e.g. number of sandwiches ! 
• Always a number! 
• Objective: ADD! 
Qualitative ! 
! 
• e.g. staleness of sandwich, veg or non-veg! 
• May or may not be a number e.g. number of days ago it was manufactured! 
• Objective: Quality / Health
Properties of Data … 
Geospatial! 
! 
Terms! 
! 
• Countries! 
• States / Regions! 
• Districts / Counties! 
• Taluka! 
• Cities! 
• Latitude Longitude! 
! 
Need for Standardisation! 
! 
• India = Bharat = Republic of India = Hindustan! 
! 
Standards! 
! 
• ISO2 Codes
Properties of Data … 
Timeseries! 
! 
Terms! 
! 
• Year! 
• Month - Year! 
• Date! 
• Date / Time! 
• Time! 
• Day of the Week! 
• Hour
Properties of data … 
Exercise 
Sentiment Qualitative 
Number of 
tweets Quantitative 
Day Timeseries
Properties of Data … 
Source: http://www.bbc.co.uk/news/business-15748696
Properties of Data … 
Health of 
Economy Qualitative 
Size of 
Economy Quantitative 
Countries Geospatial 
Years Timeseries 
Source: http://www.bbc.co.uk/news/business-15748696
Properties of Data … 
Health of 
Economy Qualitative 
Size of 
Economy Quantitative 
Countries Geospatial 
Years Timeseries 
Debt Relational
Properties of Data … 
Relational data 
friends friends 
Joe Ram Zoe 
exports 
India exports 
US 
Goa 
Pune B’lore 
Hubli
Properties of Data … 
Relational data
Properties of Data … 
Even Railway fares are a relationship.
Properties of Data … 
Hierarchical data 
Source http://www.pykih.com/data-journalism/election-counting-day-app-for-firstpost
Properties of Data … 
Hierarchical data is any data that has a tree 
Journalist 
• CEO - VP - Managers - ….! 
• Prime Minister - Cabinet - …! 
• Country - State - City - Zipcode
Properties of Data … 
Hierarchical data is any data that has a tree 
Journalist 
• CEO - VP - Managers - ….! 
• Prime Minister - Cabinet - … 
Developer 
• Product Hierarchy! 
• Distribution of funds! 
• Flow of Ganga into various 
tributaries
Properties of Data … 
Unique Example 
Source: http://dadaviz.com/i/794
I have $10 to spend in a day 
Is it more? Is it less?
Data when compared makes sense 
Everything is relative 
December 2012 
iPhone division revenue 
for Quarter was $24.4 B Fact
Data when compared makes sense 
Everything is relative 
December 2012 
iPhone division revenue 
for Quarter was $24.4 B Fact 
Story 
December 2012 
Entire Microsoft’s revenue 
for same Quarter was $20.9 B
Comparisons must have a baseline 
What is the common denominator 
Source: http://www.statista.com/chart/2628/police-firearms-discharges/
Let’s dive into basics 
Recipes for stories 
Human! 
Actions / experiences Documented Data Insights
India gives maximum citizenship to people from ____? 
I would assume it is Bangladesh or Nepal. 
But since Bangladesh’s base is higher… 
it should be Bangladesh.
India gives maximum citizenship to people from ____? 
Source: http://164.100.47.132/LssNew/psearch/QResult16.aspx?qref=1153
India gives maximum citizenship to people from ____? 
Pakistan
India gives maximum citizenship to people from ____? 
Source:http://factchecker.in/pakistanis-get-maximum-indian-citizenship/
What did we do? 
Hypothesis Testing 
Source: http://factchecker.in/category/fact-check/
Often you have data but no hypothesis… 
In such a case, you will explore 
the data set to find patterns and insights. 
Census Dashboard - http://www.pykih.com/data-journalism/india-census
Perspectives
Perspectives -> Stories
Two is better than one
Two is better than one 
If you plot crime in UP across last 10 years, 
all you get is a LINE chart.
Two is better than one 
If you plot crime in UP across last 10 years, 
all you get is a LINE chart. 
+ 
Political parties ruling UP in same period 
= 
Story
When you see Political Speeches as Speeches
When you see Political Speeches as Data
Data is simply documented human actions / experience. 
Focus on understanding the Grammar behind data.
Fun fact: The word pykih came to us 
in a CAPTCHA. That’s the day we 
decided that till we do good work it 
does not matter what we are called. 
We are at @pykih

Más contenido relacionado

Similar a Getting comfortable with Data

Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with RStephen Withington
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?Tuan Yang
 
Python 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute BeginnersPython 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute BeginnersSai Linn Thu
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for developmentSara-Jayne Terp
 
Data visualisation as a campaign tool for change
Data visualisation as a campaign tool for changeData visualisation as a campaign tool for change
Data visualisation as a campaign tool for changeLittle Web Giants
 
[Slides] What Do We Do with All This Big Data by Altimeter Group
[Slides] What Do We Do with All This Big Data by Altimeter Group[Slides] What Do We Do with All This Big Data by Altimeter Group
[Slides] What Do We Do with All This Big Data by Altimeter GroupAltimeter, a Prophet Company
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
Noticing the Nuance: Designing intelligent systems that can understand semant...
Noticing the Nuance: Designing intelligent systems that can understand semant...Noticing the Nuance: Designing intelligent systems that can understand semant...
Noticing the Nuance: Designing intelligent systems that can understand semant...Elizabeth Murnane
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptxSamiksha880257
 
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...News Leaders Association's NewsTrain
 
Rogier brussee what is data science, and what does it have to do with agri fo...
Rogier brussee what is data science, and what does it have to do with agri fo...Rogier brussee what is data science, and what does it have to do with agri fo...
Rogier brussee what is data science, and what does it have to do with agri fo...Rogier Brussee
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversUNCResearchHub
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
What-Do-We-Do-with-All-This-Big-Data-Altimeter-Group
What-Do-We-Do-with-All-This-Big-Data-Altimeter-GroupWhat-Do-We-Do-with-All-This-Big-Data-Altimeter-Group
What-Do-We-Do-with-All-This-Big-Data-Altimeter-GroupSusan Etlinger
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 

Similar a Getting comfortable with Data (20)

Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
Python 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute BeginnersPython 101 for Data Science to Absolute Beginners
Python 101 for Data Science to Absolute Beginners
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
Data visualisation as a campaign tool for change
Data visualisation as a campaign tool for changeData visualisation as a campaign tool for change
Data visualisation as a campaign tool for change
 
[Slides] What Do We Do with All This Big Data by Altimeter Group
[Slides] What Do We Do with All This Big Data by Altimeter Group[Slides] What Do We Do with All This Big Data by Altimeter Group
[Slides] What Do We Do with All This Big Data by Altimeter Group
 
Big Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique BruxellesBig Data introduction - Café Numérique Bruxelles
Big Data introduction - Café Numérique Bruxelles
 
Internet of Things (2015)
Internet of Things (2015)Internet of Things (2015)
Internet of Things (2015)
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
Noticing the Nuance: Designing intelligent systems that can understand semant...
Noticing the Nuance: Designing intelligent systems that can understand semant...Noticing the Nuance: Designing intelligent systems that can understand semant...
Noticing the Nuance: Designing intelligent systems that can understand semant...
 
Final_Bigdata_pret
Final_Bigdata_pretFinal_Bigdata_pret
Final_Bigdata_pret
 
Unit 1 (DSBDA) PD.pptx
Unit 1 (DSBDA)  PD.pptxUnit 1 (DSBDA)  PD.pptx
Unit 1 (DSBDA) PD.pptx
 
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
Bringing a data mindset to your reporting - Brant Houston - Illinois NewsTrai...
 
Rogier brussee what is data science, and what does it have to do with agri fo...
Rogier brussee what is data science, and what does it have to do with agri fo...Rogier brussee what is data science, and what does it have to do with agri fo...
Rogier brussee what is data science, and what does it have to do with agri fo...
 
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem SolversTurning Data into Infographics: An Interactive Workshop for Problem Solvers
Turning Data into Infographics: An Interactive Workshop for Problem Solvers
 
Gettind data used
Gettind data usedGettind data used
Gettind data used
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
What-Do-We-Do-with-All-This-Big-Data-Altimeter-Group
What-Do-We-Do-with-All-This-Big-Data-Altimeter-GroupWhat-Do-We-Do-with-All-This-Big-Data-Altimeter-Group
What-Do-We-Do-with-All-This-Big-Data-Altimeter-Group
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 

Más de Ritvvij Parrikh

DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataRitvvij Parrikh
 
Visualizing Data Journalism (HasGeek Fifth Elephant)
Visualizing Data Journalism (HasGeek Fifth Elephant)Visualizing Data Journalism (HasGeek Fifth Elephant)
Visualizing Data Journalism (HasGeek Fifth Elephant)Ritvvij Parrikh
 
Offline Advertisements Analytics Dashboard
Offline Advertisements Analytics DashboardOffline Advertisements Analytics Dashboard
Offline Advertisements Analytics DashboardRitvvij Parrikh
 
Google Analytics Dashboard Design
Google Analytics Dashboard DesignGoogle Analytics Dashboard Design
Google Analytics Dashboard DesignRitvvij Parrikh
 
Google Analytics Dashboard designed as an Infographic
Google Analytics Dashboard designed as an InfographicGoogle Analytics Dashboard designed as an Infographic
Google Analytics Dashboard designed as an InfographicRitvvij Parrikh
 
JARVIS:BI for FMCG Sales Managers
JARVIS:BI for FMCG Sales ManagersJARVIS:BI for FMCG Sales Managers
JARVIS:BI for FMCG Sales ManagersRitvvij Parrikh
 
Payroll Giving Management with TracksGiving
Payroll Giving Management with TracksGivingPayroll Giving Management with TracksGiving
Payroll Giving Management with TracksGivingRitvvij Parrikh
 
9 ways how cause marketing can help you achieve your marketing objectives.
9 ways how cause marketing can help you achieve your marketing objectives.9 ways how cause marketing can help you achieve your marketing objectives.
9 ways how cause marketing can help you achieve your marketing objectives.Ritvvij Parrikh
 
How TracksGiving can help you implement your campaigning software up quicker ...
How TracksGiving can help you implement your campaigning software up quicker ...How TracksGiving can help you implement your campaigning software up quicker ...
How TracksGiving can help you implement your campaigning software up quicker ...Ritvvij Parrikh
 

Más de Ritvvij Parrikh (10)

DataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census dataDataMeet 4: Data cleaning & census data
DataMeet 4: Data cleaning & census data
 
Visualizing Data Journalism (HasGeek Fifth Elephant)
Visualizing Data Journalism (HasGeek Fifth Elephant)Visualizing Data Journalism (HasGeek Fifth Elephant)
Visualizing Data Journalism (HasGeek Fifth Elephant)
 
Offline Advertisements Analytics Dashboard
Offline Advertisements Analytics DashboardOffline Advertisements Analytics Dashboard
Offline Advertisements Analytics Dashboard
 
Google Analytics Dashboard Design
Google Analytics Dashboard DesignGoogle Analytics Dashboard Design
Google Analytics Dashboard Design
 
Dashboard fhub
Dashboard fhubDashboard fhub
Dashboard fhub
 
Google Analytics Dashboard designed as an Infographic
Google Analytics Dashboard designed as an InfographicGoogle Analytics Dashboard designed as an Infographic
Google Analytics Dashboard designed as an Infographic
 
JARVIS:BI for FMCG Sales Managers
JARVIS:BI for FMCG Sales ManagersJARVIS:BI for FMCG Sales Managers
JARVIS:BI for FMCG Sales Managers
 
Payroll Giving Management with TracksGiving
Payroll Giving Management with TracksGivingPayroll Giving Management with TracksGiving
Payroll Giving Management with TracksGiving
 
9 ways how cause marketing can help you achieve your marketing objectives.
9 ways how cause marketing can help you achieve your marketing objectives.9 ways how cause marketing can help you achieve your marketing objectives.
9 ways how cause marketing can help you achieve your marketing objectives.
 
How TracksGiving can help you implement your campaigning software up quicker ...
How TracksGiving can help you implement your campaigning software up quicker ...How TracksGiving can help you implement your campaigning software up quicker ...
How TracksGiving can help you implement your campaigning software up quicker ...
 

Último

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Getting comfortable with Data

  • 1. Getting comfortable with data @ritvvijparrikh, Data Designer, pykih.com d|Bootcamp, http://delhi.dbootcamp.org, September 5, 2014
  • 2. About me I help organisations make sense (visual or otherwise) of data. 2005 University: Neural Net based Market Prediction Stock Market Data 2006 Software Developer at Amdocs 2011 Design Lead Amdocs Sales Team to AT&T Product Manager at samhita.org Founded TracksGiving - analytics for charities 2013 Founded pykih - data visualisation Telecom data for AT&T Donation data FirstPost Journalism++ Cologne / Datawrapper Microsoft visual.ly NarendraModi.in
  • 3. How developers look at data? v/s How do journalists look at data?
  • 4. How developers look at data? v/s How do journalists look at data?
  • 5. Article in English WHO does not marvel at the prospect of India going to the polls? Starting on April 7th, illiterate villagers and destitute slum-dwellers will have an equal say alongside Mumbai’s millionaires in picking their government. Almost 815m citizens are eligible to cast their ballots in nine phases of voting over five weeks—the largest collective democratic act in history. ! But who does not also deplore the fecklessness and venality of India’s politicians? The country is teeming with problems, but a decade under a coalition led by the Congress party has left it rudderless. Growth…
  • 6. English article on Genomics Assumption: Unknown domain Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism).[1][2] Advances in genomics have triggered a revolution in discovery-based research to understand even the most complex biological systems such as brain.[3] The field includes efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping. The field also includes studies of intragenomic phenomena …
  • 7. Conclusion We are comfortable with unknown material in English
  • 8. Objective: Be comfortable with unknown data sets
  • 9. Objectives Can we build Data Comprehension skills?
  • 10. Objectives • What is data made up of?! • Data File Formats! • Where is the story worthy data! • Data Types! • Properties of Data! • Insights / Recipes for stories! • Data Aggregation! • Basic Spreadsheet Functions
  • 11. About me Data on Glass Manufacturing Factory Floor in German Language Unknown domain. Unknown language. We still modelled the data correctly.
  • 12. Let’s dive into basics What is data made up of
  • 13. Where does data come from? Human! Actions / experiences Wind
  • 14. Where does data come from? Documented Data Human! Actions / experiences Wind Documenting
  • 15. Where does data come from? Documented Data Insights Human! Actions / experiences Wind Documenting Sea Travel
  • 16. Where does data come from? Documented Data Insights Human! Actions / experiences Wind Documenting Sea Travel What am I doing Twitter Sentiment about Budget
  • 17. Where does data come from? Documented Data Insights Human! Actions / experiences Wind Documenting What am I doing Twitter Sea Travel Sentiment about Budget Vote Election Commission Political Change
  • 18. Where does data come from? Human! Actions / experiences Documented Data Insights Wind Documenting What am I doing Twitter Sea Travel Sentiment about Budget Vote Election Commission Political Change State Dept. Wires Wikileaks Backdoor Foreign Policy
  • 19. What has changed? Human Actions / experiences Documented Data Insights Wind Documenting What am I doing Twitter Sea Travel Sentiment about Budget Vote Election Commission Political Change State Dept. Wires Wikileaks Backdoor Foreign Policy
  • 20. Technology Human Actions / experiences Documented Data Insights Wind Documenting Sea Travel What am I doing Twitter Sentiment about Budget Vote Election Commission Political Change State Dept. Wires Wikileaks Backdoor Foreign Policy
  • 21. Struggling with Human Actions / experiences Documented Data Insights Grasp Wind Documenting Sea Travel What am I doing Twitter Sentiment about Budget Vote Election Commission Political Change State Dept. Wires Wikileaks Backdoor Foreign Policy
  • 22. Struggling with Human Actions / experiences Documented Data Insights Story Grasp Wind Documenting Sea Travel What am I doing Twitter Sentiment about Budget Vote Election Commission Political Change State Dept. Wires Wikileaks Backdoor Foreign Policy
  • 23. What is data made up of? Human Actions / experiences Documented Data Insights Wind Documenting Domain ! Human context Meta data! How is it stored Sea Travel
  • 24. Data Comprehension Human Actions / experiences Documented Data Insights Wind Documenting Domain ! Human context Meta data! How is it stored Grammar of the Data Sea Travel
  • 25. Documented Data Insights Let’s dive into basics Data Formats Human! Actions / experiences
  • 26. How is the data stored? Format is a pre-defined structure in which 1s’ and 0s’ are stored to for a software to read it.
  • 27. How is the data stored? ! ! Data Designed for! Data and ! Formatting! Humans Designed for! Machine
  • 28. Machine readable data is for us ! ! Data Designed for! Data and ! Formatting! Humans Designed for! Machine Our objective to discover story in data. Formatting will unnecessarily come in the way.
  • 29. Tabular v/s Document data Designed for! Humans Designed for! Machine Tabular Document
  • 30. Scraping / API Integration Designed for! Humans Designed for! Machine Tabular Scrape / API Document New Terms: PDF Scraping. Web Scraping. API Integration. Developer
  • 31. Machine readable Tabular data formats Designed for! Humans Designed for! Machine Tabular Scrape / API Document
  • 32. * separated values files | (pipe) acts as a delimiters allowing us to identify columns new lines help identify rows Extend this concept, and you get ! ! Comma Separated Value files! Pipe Separated Value files! Semicolon Separated Value files! Tab Separated Value files …
  • 33. FYI - Data Formats Designed for! Humans Designed for! Machine Tabular Scrape / API Document
  • 34. Let’s open a Government Data Set
  • 35. Whom was this created for? Designed for! Machine Document Designed for! Humans Tabular
  • 36. Whom was this created for? Designed for Humans Designed for! Machine Tabular Document Horizontal
  • 37. Machine readable data is for us ! ! Data Designed for! Data and ! Formatting! Humans Designed for! Machine Our objective to discover story in data. Formatting will unnecessarily come in the way. Recap
  • 38. What we want is Vertical
  • 39. Documented Data Insights Let’s dive into basics Human! Actions / experiences Where is the find story-worthy data
  • 40. Where is all the story worthy data sitting? • data.gov.in! • RBI.org.in! • mospi.nic.in ! • planningcommission.nic.in! • unicef.org/statistics! • indiabudget.nic.in! • ncrb.nic.in! • mha.nic.in! • dise.in! • World Bank! • Oxfam! • IMF! • World Health Organisation! • …
  • 41. It could also be in… • data.gov.in! • RBI.org.in! • mospi.nic.in ! • planningcommission.nic.in! • unicef.org/statistics! • indiabudget.nic.in! • ncrb.nic.in! • mha.nic.in! • dise.in! • World Bank! • Oxfam! • IMF! • World Health Organisation! • … • Tweets! • Stock Market! • Politician’s speeches! • Other news articles! • Wiki Leaks! • Police FIR reports! • Survey! • Blogs! • Cell phone tower logs
  • 42. Documented Data Insights Let’s dive into basics Grammar of the Data Human! Actions / experiences
  • 43. Datatype Journalist • String! • Number
  • 44. Datatype Journalist Developer • String! • Number • String! • Number! • Decimal / Float / Scientific ! • Boolean! • Date! • Date Time! • Time
  • 45. Datatype String Hello Number 3 Float 3.03 Boolean Yes / No, True / False Date 3 Feb 2014 Date Time 3 Feb 2014 1 am Time 1am Blank / Empty / Null
  • 46. Datatypes in Google Spreadsheet
  • 47. Formatting Things you do to make the data more Human readable.!
  • 48. Formatting Things you do to make the data more Human readable.! Data Formated Data 3 3% 3.03 $3.03 34950683 3,49,50,683 34950683 34.950683 Million Rounding Up 35 Million
  • 49. Formatting in Google Spreadsheets
  • 50. Formatting is for presentation purposes only Stay away from tools that do not format for presentation only. E.g. Round, Currency.
  • 51. What if Formatting is not used for presentation? Things you do to make the data more Human readable.! Data Data type Formated Data Data type 3 Number 3% String 3.03 Float $3.03 String 34950683 Number 3,49,50,683 String 34950683 Number 34.950683 Million String Rounding Up Number 35 Million String
  • 52. Properties of Data Quantitative ! ! • is things you ADD e.g. number of sandwiches ! ! Qualitative ! ! • that tell you ATTRIBUTES e.g. staleness of sandwich, veg or non-veg
  • 53. Properties of Data … Quantitative ! ! • e.g. number of sandwiches ! • Always a number Qualitative ! ! • e.g. staleness of sandwich, veg or non-veg! • May or may not be a number e.g. number of days ago it was manufactured!
  • 54. Properties of Data … Quantitative ! ! • e.g. number of sandwiches ! • Always a number! • Objective: ADD! Qualitative ! ! • e.g. staleness of sandwich, veg or non-veg! • May or may not be a number e.g. number of days ago it was manufactured! • Objective: Quality / Health
  • 55. Properties of Data … Geospatial! ! Terms! ! • Countries! • States / Regions! • Districts / Counties! • Taluka! • Cities! • Latitude Longitude! ! Need for Standardisation! ! • India = Bharat = Republic of India = Hindustan! ! Standards! ! • ISO2 Codes
  • 56. Properties of Data … Timeseries! ! Terms! ! • Year! • Month - Year! • Date! • Date / Time! • Time! • Day of the Week! • Hour
  • 57. Properties of data … Exercise Sentiment Qualitative Number of tweets Quantitative Day Timeseries
  • 58. Properties of Data … Source: http://www.bbc.co.uk/news/business-15748696
  • 59. Properties of Data … Health of Economy Qualitative Size of Economy Quantitative Countries Geospatial Years Timeseries Source: http://www.bbc.co.uk/news/business-15748696
  • 60. Properties of Data … Health of Economy Qualitative Size of Economy Quantitative Countries Geospatial Years Timeseries Debt Relational
  • 61. Properties of Data … Relational data friends friends Joe Ram Zoe exports India exports US Goa Pune B’lore Hubli
  • 62. Properties of Data … Relational data
  • 63. Properties of Data … Even Railway fares are a relationship.
  • 64. Properties of Data … Hierarchical data Source http://www.pykih.com/data-journalism/election-counting-day-app-for-firstpost
  • 65. Properties of Data … Hierarchical data is any data that has a tree Journalist • CEO - VP - Managers - ….! • Prime Minister - Cabinet - …! • Country - State - City - Zipcode
  • 66. Properties of Data … Hierarchical data is any data that has a tree Journalist • CEO - VP - Managers - ….! • Prime Minister - Cabinet - … Developer • Product Hierarchy! • Distribution of funds! • Flow of Ganga into various tributaries
  • 67. Properties of Data … Unique Example Source: http://dadaviz.com/i/794
  • 68. I have $10 to spend in a day Is it more? Is it less?
  • 69. Data when compared makes sense Everything is relative December 2012 iPhone division revenue for Quarter was $24.4 B Fact
  • 70. Data when compared makes sense Everything is relative December 2012 iPhone division revenue for Quarter was $24.4 B Fact Story December 2012 Entire Microsoft’s revenue for same Quarter was $20.9 B
  • 71. Comparisons must have a baseline What is the common denominator Source: http://www.statista.com/chart/2628/police-firearms-discharges/
  • 72. Let’s dive into basics Recipes for stories Human! Actions / experiences Documented Data Insights
  • 73. India gives maximum citizenship to people from ____? I would assume it is Bangladesh or Nepal. But since Bangladesh’s base is higher… it should be Bangladesh.
  • 74. India gives maximum citizenship to people from ____? Source: http://164.100.47.132/LssNew/psearch/QResult16.aspx?qref=1153
  • 75. India gives maximum citizenship to people from ____? Pakistan
  • 76. India gives maximum citizenship to people from ____? Source:http://factchecker.in/pakistanis-get-maximum-indian-citizenship/
  • 77. What did we do? Hypothesis Testing Source: http://factchecker.in/category/fact-check/
  • 78. Often you have data but no hypothesis… In such a case, you will explore the data set to find patterns and insights. Census Dashboard - http://www.pykih.com/data-journalism/india-census
  • 81. Two is better than one
  • 82. Two is better than one If you plot crime in UP across last 10 years, all you get is a LINE chart.
  • 83. Two is better than one If you plot crime in UP across last 10 years, all you get is a LINE chart. + Political parties ruling UP in same period = Story
  • 84. When you see Political Speeches as Speeches
  • 85. When you see Political Speeches as Data
  • 86. Data is simply documented human actions / experience. Focus on understanding the Grammar behind data.
  • 87. Fun fact: The word pykih came to us in a CAPTCHA. That’s the day we decided that till we do good work it does not matter what we are called. We are at @pykih