Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
A novel approach to big data veracity using crowd-sourcing techniques
1. BIG DATA and VERACITY:
A novel approach to data
veracity using crowd-sourcing
techniques
Samarth Bhargav, Bhoomika Agarwal,
Abhiram Ravikumar and Vrishabh DN
April 18, 2014
Presented at BMS Institute of Technology, Bangalore
2. Introduction
Big Data
● What is Big Data?
● The 3 traditional V’s
o Volume
o Velocity
o Variety
● Fourth V
● Crowdsourcing
Volume
VarietyVelocity
Veracity
3. The 4 Vs of Big Data
Source: http://well-managed-business-intelligence.blogspot.in/2012/06/big-data-fourth.html
5. ● Digitizing one word at a time
● Utilize the 10 seconds spent by humans, productively
● Digitizing old books - herculean task for computers
● An efficient alternative to OCR
● Workflow - entry, multiple-checks, verify, upload
● 20 years of The New York Times Daily was digitized in
just a couple of months
reCAPTCHA
6. ● “Enrich Google Maps with your local knowledge”
● The Google Map Maker project
● Data used by Google Maps and Google Earth
● Projects like PhotoSphere and StreetView use huge
contributions from the masses
● Workflow
○ add/edit places
○ verified by a moderator
○ cross-referenced and updated
Google Maps
7. WIKIPEDIA
● Termed as the “mother of all encyclopedias”
● Hosts an immense pool of data, multi-linguistic in nature
and entirely community driven
● Run by donations from all over the world (crowdfunding)
● Dynamic and constantly updated, thus scores big over
traditional encyclopedias
● Unbiased and high-quality
information
● Data-verification and
validation done instantly
by both experts and
general public
8. DUOLINGO
● Learn a language and translate the Web
● Entirely free and crowd-driven
● Luis van Ahn - ESP games and reCAPTCHA
● Workflow
o website to be translated is uploaded
o broken into parts & given to students
o students translate the doc during learning procedure
o translated doc returned to owner
● Win-win situation for both students and corporates
● Popular on both web as well as mobile platforms
9. Amazon Mechanical Turk
● Use of artificial intelligence to run businesses
● HITs enable machine learning concepts
● Workflow
o Requester places task on the site or through API
o Provider picks a suitable task
o Payments made through Amazon gift certificates
● Advantages include
o Quality assurance
o Scalability options
o Lower cost
10. Analysis
● Handling data IS important
● Google FLU tracker
● KickStarter and CosmoQuest
● Lot of scope and wide opportunities
11. Repercussions
● Senator Kennedy’s story
● FCRA (Fair Credit Reporting Act)
● Crowds unaware of data-acquisition
● Confidential data and security-leaks to be
addressed with care
12. Conclusion
Crowdsourcing
model
Volume Velocity Variety Veracity
Google Maps terabytes high low medium
Duolingo terabytes medium high high
reCAPTCHA petabytes very high very high very high
Amazon Turk petabytes medium very high high
Wikipedia petabytes medium high very high