SlideShare a Scribd company logo
1 of 21
Agenda
• What is Big data?
• Some BIG facts
• Objective
• Sources
• 3 V’s of Big data
• 3 + 1 V’s of Big data
• Technologies
• Opportunities
• Major Players
• Questions
• Conclusion
What is Big data?

Data

Big Data
What is Big data?

Data

Big Data
Some BIG facts
• 90% of the data in the world today has been created in the
last two years alone
• IDC Forecasting: The global universe of data will double
every two years, reaching 40,000 exabytes or 40 trillion GB
by 2020
• The Large Hadron Collider near Geneva, Switzerland, will
produce about 15 petabytes of data per year.
• Ancestry.com, the genealogy site, stores around 2.5
petabytes of data.
• The Internet Archive stores around 2 petabytes of data, and
is growing at a rate of 20 terabytes per month.
Some BIG facts – What happens everyday?
• The New York Stock Exchange generates about one
terabyte of new trade data
• Zynga processes 1 Petabyte of content
• 30 billion pieces of content were added to Facebook
• 2 billion videos are watched in Youtube
• 2.5 quintillion bytes of data is created
Some BIG facts – What happens every minute?

Courtesy: http://practicalanalytics.files.wordpress.com
Big data – Objective

Effectively store, manage and analyze all
the data to create meaningful information
out of it
Big data – Sources
Big data – 3 V’s of Big data

Courtesy: bigdatablog.emc.com
Big data – 3 + 1 V’s of Big data

Courtesy: http://www.datasciencecentral.com/
Big data - Volume

Volumes are in:
• Terabytes
• Exabytes
• Petabytes
• Zetabytes

Courtesy: http://www.datasciencecentral.com/
Big data - Volume

Name

Value

1 GB
1 Terabyte (TB)

1024 GB

1 Petabyte (PB)

1,048,576 GB

1 Exabyte (EB)

1,073,741,824 GB

1 Zeta byte (ZB)

1,099,511,627,776 GB

1 Yottabyte (YB)

Courtesy: http://www.datasciencecentral.com/

1,073,741,824 bytes

1,125,899,906,842,624 GB
Big data - Velocity

• Live Stream
• Real time
• Batch

Courtesy: http://www.datasciencecentral.com/
Big data - Variety

• Structured (Tables)
• Unstructured (Tweets, SMSes)
• Semi-structured (Logfiles, RFID)

Courtesy: http://www.datasciencecentral.com/
Big data - Veracity

• This kind of data is often
overlooked
• It is now considered as
important as 3 V’s of Big Data
• Effort to clean up data is rather
not given importance
• Poor data quality costs the U.S.
economy around $3.1 trillions a
year

Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
Big data Technologies
Technologies & Solution providers:
• Storage (MS SqlServer, Apache Hadoop, Mongo DB)
• Processing (MapReduce, Impala)
• Analytics (SAS, R, Business Intelligence)
• Integration (Flume, Sqoop)
Big data - Opportunities
•
•
•
•
•

Storage
Processing
Analytics
Integration
Solution
Big data – Major Players
Big data – Questions?
Big data – Thank you !!!

More Related Content

Viewers also liked

Case 3.1 - Big data big rewards
Case 3.1 - Big data big rewardsCase 3.1 - Big data big rewards
Case 3.1 - Big data big rewardsniz73
 
Week 3 Case 1 : Big Data Big Reward
Week 3 Case 1 :  Big Data Big RewardWeek 3 Case 1 :  Big Data Big Reward
Week 3 Case 1 : Big Data Big Rewarddyadelm
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
 

Viewers also liked (7)

Big data big rewards
Big data big rewardsBig data big rewards
Big data big rewards
 
Case 3.1 - Big data big rewards
Case 3.1 - Big data big rewardsCase 3.1 - Big data big rewards
Case 3.1 - Big data big rewards
 
Week 3 Case 1 : Big Data Big Reward
Week 3 Case 1 :  Big Data Big RewardWeek 3 Case 1 :  Big Data Big Reward
Week 3 Case 1 : Big Data Big Reward
 
Case study 8
Case study 8Case study 8
Case study 8
 
Big Data Analytics MIS presentation
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Welcome to big data

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData Blueprint
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015Zohar Elkayam
 
Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
Briefing Room 20161213 - ep019 - Red Hat - Modern Business StorageBriefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
Briefing Room 20161213 - ep019 - Red Hat - Modern Business StorageDez Blanchfield
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoopcneudecker
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxvarun453331
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Gyorgy balogh modern_big_data_technologies_sec_world_2014
Gyorgy balogh modern_big_data_technologies_sec_world_2014Gyorgy balogh modern_big_data_technologies_sec_world_2014
Gyorgy balogh modern_big_data_technologies_sec_world_2014LogDrill
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 finalAmjid Ali
 

Similar to Welcome to big data (20)

DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 
Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
Briefing Room 20161213 - ep019 - Red Hat - Modern Business StorageBriefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
Briefing Room 20161213 - ep019 - Red Hat - Modern Business Storage
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
big data
big data big data
big data
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Big data
Big dataBig data
Big data
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Big data
Big dataBig data
Big data
 
BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
 
Big data
Big dataBig data
Big data
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Gyorgy balogh modern_big_data_technologies_sec_world_2014
Gyorgy balogh modern_big_data_technologies_sec_world_2014Gyorgy balogh modern_big_data_technologies_sec_world_2014
Gyorgy balogh modern_big_data_technologies_sec_world_2014
 
Big data 2017 final
Big data 2017   finalBig data 2017   final
Big data 2017 final
 

More from Saravanan Subburayal

More from Saravanan Subburayal (6)

Devops as a service
Devops as a serviceDevops as a service
Devops as a service
 
Machine learning
Machine learningMachine learning
Machine learning
 
Azure series 2 creating a cloud service - web role
Azure series 2   creating a cloud service - web roleAzure series 2   creating a cloud service - web role
Azure series 2 creating a cloud service - web role
 
Fluent validation
Fluent validationFluent validation
Fluent validation
 
Asp.Net MVC3 - Basics
Asp.Net MVC3 - BasicsAsp.Net MVC3 - Basics
Asp.Net MVC3 - Basics
 
Cloud - Azure – an introduction
Cloud -  Azure – an introductionCloud -  Azure – an introduction
Cloud - Azure – an introduction
 

Recently uploaded

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Welcome to big data

  • 1.
  • 2. Agenda • What is Big data? • Some BIG facts • Objective • Sources • 3 V’s of Big data • 3 + 1 V’s of Big data • Technologies • Opportunities • Major Players • Questions • Conclusion
  • 3. What is Big data? Data Big Data
  • 4. What is Big data? Data Big Data
  • 5. Some BIG facts • 90% of the data in the world today has been created in the last two years alone • IDC Forecasting: The global universe of data will double every two years, reaching 40,000 exabytes or 40 trillion GB by 2020 • The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year. • Ancestry.com, the genealogy site, stores around 2.5 petabytes of data. • The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.
  • 6. Some BIG facts – What happens everyday? • The New York Stock Exchange generates about one terabyte of new trade data • Zynga processes 1 Petabyte of content • 30 billion pieces of content were added to Facebook • 2 billion videos are watched in Youtube • 2.5 quintillion bytes of data is created
  • 7. Some BIG facts – What happens every minute? Courtesy: http://practicalanalytics.files.wordpress.com
  • 8. Big data – Objective Effectively store, manage and analyze all the data to create meaningful information out of it
  • 9. Big data – Sources
  • 10. Big data – 3 V’s of Big data Courtesy: bigdatablog.emc.com
  • 11. Big data – 3 + 1 V’s of Big data Courtesy: http://www.datasciencecentral.com/
  • 12. Big data - Volume Volumes are in: • Terabytes • Exabytes • Petabytes • Zetabytes Courtesy: http://www.datasciencecentral.com/
  • 13. Big data - Volume Name Value 1 GB 1 Terabyte (TB) 1024 GB 1 Petabyte (PB) 1,048,576 GB 1 Exabyte (EB) 1,073,741,824 GB 1 Zeta byte (ZB) 1,099,511,627,776 GB 1 Yottabyte (YB) Courtesy: http://www.datasciencecentral.com/ 1,073,741,824 bytes 1,125,899,906,842,624 GB
  • 14. Big data - Velocity • Live Stream • Real time • Batch Courtesy: http://www.datasciencecentral.com/
  • 15. Big data - Variety • Structured (Tables) • Unstructured (Tweets, SMSes) • Semi-structured (Logfiles, RFID) Courtesy: http://www.datasciencecentral.com/
  • 16. Big data - Veracity • This kind of data is often overlooked • It is now considered as important as 3 V’s of Big Data • Effort to clean up data is rather not given importance • Poor data quality costs the U.S. economy around $3.1 trillions a year Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
  • 17. Big data Technologies Technologies & Solution providers: • Storage (MS SqlServer, Apache Hadoop, Mongo DB) • Processing (MapReduce, Impala) • Analytics (SAS, R, Business Intelligence) • Integration (Flume, Sqoop)
  • 18. Big data - Opportunities • • • • • Storage Processing Analytics Integration Solution
  • 19. Big data – Major Players
  • 20. Big data – Questions?
  • 21. Big data – Thank you !!!

Editor's Notes

  1. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  2. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  3. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  4. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  5. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
  6. Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety. Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management. Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age. As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored. It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.