Udit Poddar, Data Scientist at SocialCops, gave this presentation on " Data-Driven Decision Making in Indian Agriculture: the Present and the Future" at the Fifth Elephant, 2016
12. Input Survey
Data and challenges
Data Acquisition
Data lying in difficult
government servers
Data Curation
Single file for each
geography and crop
13. Challenges and Solutions
Data Curation
Available in PDF files or
poorly formatted multiple
Excel files
• India’s population to
cross China’s by
2022
• Population to grow
up to 1.7b by 2050
• PDF parsing using
image recognition
• Data cleaning using
pattern recognition
• Scalable scripts for
automating
thousands of filesIntegrated Village wise Reports of disease
outbreak in India
Bi-weekly
14. Challenges and Solutions
• India’s population to
cross China’s by
2022
• Population to grow
up to 1.7b by 2050
Data Stitching
• Phonetic Matching
• Fuzzy Logic Matches
15%
The match we got the
first time we matched
Census Village Names
to DISE Village Names