SlideShare una empresa de Scribd logo
1 de 15
Research Issues in the Big Data
and its Challenges
Dr.A.Kathirvel, Professor & Head
Vivekanandha College of Engg for Women
(Autonomous)
23.09.2013
Big Data Every Where!
• Lots of data is being collected
and warehoused
– Web data, e-commerce
– purchases at department/
grocery stores
– Bank/Credit Card
transactions
– Social Network
How much data?
• Google processes 20 PB a day (2008)
• Wayback Machine has 3 PB + 100 TB/month (3/2009)
• Facebook has 2.5 PB of user data + 15 TB/day (4/2009)
• eBay has 6.5 PB of user data + 50 TB/day (5/2009)
• CERN’s Large Hydron Collider (LHC) generates 15 PB a
year
640K ought to be
enough for anybody.
Explosion in Quantity of Data
Explosion in Quantity of Data
Our Data-driven World
• Science
– Data bases from astronomy, genomics, environmental data, transportation
data, …
• Humanities and Social Sciences
– Scanned books, historical documents, social interactions data, new technology
like GPS …
• Business & Commerce
– Corporate sales, stock market transactions, census, airline traffic, …
• Entertainment
– Internet images, Hollywood movies, MP3 files, …
• Medicine
– MRI & CT scans, patient records, …
Big Data Characteristics
How big is the Big Data?
- What is big today maybe not big tomorrow
Big Data Vectors (4Vs)
- Any data that can challenge our current technology in
some manner can consider as Big Data
- Volume
- Communication
- Speed of Generating
- Meaningful Analysis
"Big Data are high-volume, high-velocity, high-variety, and/or high-value
information assets that require new forms of processing to enable enhanced
decision making, insight discovery and process optimization”
Gartner 2012
Big Data Characteristics
Big Data Vectors (4Vs)
- High-volume
amount of data
- High-velocity
Speed rate in collecting or acquiring or generating or processing of data
- High-variety
different data type such as audio, video, image data (mostly unstructured data)
- High-value
cost
Cost Problem (example)
Cost of processing 1 Petabyte of data with 1000 node?
1 PB = 1015
B = 1 million gigabytes = 1 thousand terabytes
- 9 hours for each node to process 500GB at rate of 15MB/S
- 15*60*60*9 = 486000MB ~ 500 GB
- 1000 * 9 * 0.34$ = 3060$ for single run
- 1 PB = 1000000 / 500 = 2000 * 9 =
18000 h /24 = 750 Day
- The cost for 1000 cloud node each
processing 1PB
2000 * 3060$ = 6,120,000$
Importance of Big Data
- Government
In 2012, the Obama administration announced the Big Data Research
and Development Initiative
84 different big data programs spread across six departments
- Private Sector
- Walmart handles more than 1 million customer transactions every hour,
which is imported into databases estimated to contain more than
2.5 petabytes of data
- Facebook handles 40 billion photos from its user base.
- Falcon Credit Card Fraud Detection System protects 2.1 billion active
accounts world-wide
- Science
- Large Synoptic Survey Telescope will generate
140 Terabyte of data every 5 days.
- Large Hardon Colider 13 Petabyte data produced in 2010
- Medical computation like decoding human Genome
- Social science revolution
- New way of science (Microscope example)
Importance of Big Data
• Job
- The U.S. could face a shortage by 2018 of 140,000 to 190,000 people with "deep
analytical talent" and of 1.5 million people capable of analyzing data in ways that
enable business decisions. (McKinsey & Co)
- Big Data industry is worth more than $100 billion
growing at almost 10% a year (roughly twice as fast as the software business)
 Technology Player in this field
 Oracle
 Exadata
 Microsoft
 HDInsight Server
 IBM
 Netezza
Some Challenges in Big Data
 Big Data Integration is Multidisciplinary
Less than 10% of Big Data world are genuinely relational
Meaningful data integration in the real, messy, schema-less and
complex Big Data world of database and semantic web using
multidisciplinary and multi-technology methode
 The Billion Triple Challenge
Web of data contain 31 billion RDf triples, that 446million of them are
RDF links, 13 Billion government data, 6 Billion geographic data, 4.6
Billion Publication and Media data, 3 Billion life science data
BTC 2011, Sindice 2011
 The Linked Open Data Ripper
Mapping, Ranking, Visualization, Key Matching, Snappiness
 Demonstrate the Value of Semantics: let data integration drive DBMS
technology
Large volumes of heterogeneous data, like link data and RDF
Implementation of Big Data
Platforms for Large-scale Data Analysis
• Parallel DBMS technologies
– Proposed in late eighties
– Matured over the last two decades
– Multi-billion dollar industry: Proprietary DBMS Engines intended as
Data Warehousing solutions for very large enterprises
• Map Reduce
– pioneered by Google
– popularized by Yahoo! (Hadoop)
Implementation of Big Data
MapReduce
• Overview:
– Data-parallel programming model
– An associated parallel and
distributed
implementation for commodity clusters
• Pioneered by Google
– Processes 20 PB of data per day
• Popularized by open-source Hadoop
– Used by Yahoo!, Facebook,
Amazon, and the list is growing …
Parallel DBMS technologies
 Popularly used for more than two decades
 Research Projects: Gamma, Grace, …
 Commercial: Multi-billion dollar
industry but access to only a privileged
few
 Relational Data Model
 Indexing
 Familiar SQL interface
 Advanced query optimization
 Well understood and studied
Conclusion
2013 2020
x50
• As of 2009, the entire World Wide Web was estimated to contain close to 500
exabytes. This is a half zettabyte.
• The total amount of global data is expected to grow to 2.7 zettabytes during 2013.
This is 48% up from 2012.
• The term big data used by different vendors this may refer to the technology
which includes tools and processes that an organization requires to handle the
large amounts of data and storage facilities.
• Though the potential of analytics and Big Data is clear, one of the challenges
noticed is a significant shortage of data scientists with deep analytical training in
data discovery, predictive modeling, open source statistical solutions, visualization
skills and business acumen to be able to frame and interpret analyses.
Toney Hey, Stwart Tansley and Kristin Tolle, “The
Fourth Paradigm Data-Intensive Scientific Discovery”,
Microsoft Press, 2009.
Book Referred

Más contenido relacionado

La actualidad más candente

Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Big data
Big dataBig data
Big datahsn99
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social mediaSupriya Radhakrishna
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
Latest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaLatest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaHeru Sutadi
 
The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldPYA, P.C.
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data typesPro Guide
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data PresentationMatthew Urdan
 

La actualidad más candente (20)

Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big data
Big dataBig data
Big data
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Latest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaLatest Update Bigdata in indonesia
Latest Update Bigdata in indonesia
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
The Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient WorldThe Pros and Cons of Big Data in an ePatient World
The Pros and Cons of Big Data in an ePatient World
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
 
Chapter 4 what is data and data types
Chapter 4  what is data and data typesChapter 4  what is data and data types
Chapter 4 what is data and data types
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 

Similar a Research issues in the big data and its Challenges

Big data and Internet
Big data and InternetBig data and Internet
Big data and InternetSanoj Kumar
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptxkalai75
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01nayanbhatia2
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxtangyechloe
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxVaishnavGhadge1
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageSteven Ramage
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big dataVedanand Singh
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processingPranav Gontalwar
 

Similar a Research issues in the big data and its Challenges (20)

Big data and Internet
Big data and InternetBig data and Internet
Big data and Internet
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
 
Big data
Big dataBig data
Big data
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data ankita1
Big data ankita1Big data ankita1
Big data ankita1
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docxBIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
 
Our big data
Our big dataOur big data
Our big data
 
Big Data et eGovernment
Big Data et eGovernmentBig Data et eGovernment
Big Data et eGovernment
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven RamageGeospatial Intelligence Middle East 2013_Big Data_Steven Ramage
Geospatial Intelligence Middle East 2013_Big Data_Steven Ramage
 
Special issues on big data
Special issues on big dataSpecial issues on big data
Special issues on big data
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processing
 

Más de Kathirvel Ayyaswamy

22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTUREKathirvel Ayyaswamy
 
20CS2021-Distributed Computing module 2
20CS2021-Distributed Computing module 220CS2021-Distributed Computing module 2
20CS2021-Distributed Computing module 2Kathirvel Ayyaswamy
 
Recent Trends in IoT and Sustainability
Recent Trends in IoT and SustainabilityRecent Trends in IoT and Sustainability
Recent Trends in IoT and SustainabilityKathirvel Ayyaswamy
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network SecurityKathirvel Ayyaswamy
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network SecurityKathirvel Ayyaswamy
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network SecurityKathirvel Ayyaswamy
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security 18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security Kathirvel Ayyaswamy
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network SecurityKathirvel Ayyaswamy
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network SecurityKathirvel Ayyaswamy
 
20CS024 Ethics in Information Technology
20CS024 Ethics in Information Technology20CS024 Ethics in Information Technology
20CS024 Ethics in Information TechnologyKathirvel Ayyaswamy
 

Más de Kathirvel Ayyaswamy (20)

22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
22cs201 COMPUTER ORGANIZATION AND ARCHITECTURE
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
18CS3040_Distributed Systems
18CS3040_Distributed Systems18CS3040_Distributed Systems
18CS3040_Distributed Systems
 
20CS2021-Distributed Computing module 2
20CS2021-Distributed Computing module 220CS2021-Distributed Computing module 2
20CS2021-Distributed Computing module 2
 
18CS3040 Distributed System
18CS3040 Distributed System	18CS3040 Distributed System
18CS3040 Distributed System
 
20CS2021 Distributed Computing
20CS2021 Distributed Computing 20CS2021 Distributed Computing
20CS2021 Distributed Computing
 
20CS2021 DISTRIBUTED COMPUTING
20CS2021 DISTRIBUTED COMPUTING20CS2021 DISTRIBUTED COMPUTING
20CS2021 DISTRIBUTED COMPUTING
 
18CS3040 DISTRIBUTED SYSTEMS
18CS3040 DISTRIBUTED SYSTEMS18CS3040 DISTRIBUTED SYSTEMS
18CS3040 DISTRIBUTED SYSTEMS
 
Recent Trends in IoT and Sustainability
Recent Trends in IoT and SustainabilityRecent Trends in IoT and Sustainability
Recent Trends in IoT and Sustainability
 
20CS2008 Computer Networks
20CS2008 Computer Networks 20CS2008 Computer Networks
20CS2008 Computer Networks
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security 18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security
 
20CS2008 Computer Networks
20CS2008 Computer Networks20CS2008 Computer Networks
20CS2008 Computer Networks
 
20CS2008 Computer Networks
20CS2008 Computer Networks 20CS2008 Computer Networks
20CS2008 Computer Networks
 
20CS024 Ethics in Information Technology
20CS024 Ethics in Information Technology20CS024 Ethics in Information Technology
20CS024 Ethics in Information Technology
 

Último

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Último (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

Research issues in the big data and its Challenges

  • 1. Research Issues in the Big Data and its Challenges Dr.A.Kathirvel, Professor & Head Vivekanandha College of Engg for Women (Autonomous) 23.09.2013
  • 2. Big Data Every Where! • Lots of data is being collected and warehoused – Web data, e-commerce – purchases at department/ grocery stores – Bank/Credit Card transactions – Social Network
  • 3. How much data? • Google processes 20 PB a day (2008) • Wayback Machine has 3 PB + 100 TB/month (3/2009) • Facebook has 2.5 PB of user data + 15 TB/day (4/2009) • eBay has 6.5 PB of user data + 50 TB/day (5/2009) • CERN’s Large Hydron Collider (LHC) generates 15 PB a year 640K ought to be enough for anybody.
  • 5. Explosion in Quantity of Data Our Data-driven World • Science – Data bases from astronomy, genomics, environmental data, transportation data, … • Humanities and Social Sciences – Scanned books, historical documents, social interactions data, new technology like GPS … • Business & Commerce – Corporate sales, stock market transactions, census, airline traffic, … • Entertainment – Internet images, Hollywood movies, MP3 files, … • Medicine – MRI & CT scans, patient records, …
  • 6. Big Data Characteristics How big is the Big Data? - What is big today maybe not big tomorrow Big Data Vectors (4Vs) - Any data that can challenge our current technology in some manner can consider as Big Data - Volume - Communication - Speed of Generating - Meaningful Analysis "Big Data are high-volume, high-velocity, high-variety, and/or high-value information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization” Gartner 2012
  • 7. Big Data Characteristics Big Data Vectors (4Vs) - High-volume amount of data - High-velocity Speed rate in collecting or acquiring or generating or processing of data - High-variety different data type such as audio, video, image data (mostly unstructured data) - High-value cost
  • 8. Cost Problem (example) Cost of processing 1 Petabyte of data with 1000 node? 1 PB = 1015 B = 1 million gigabytes = 1 thousand terabytes - 9 hours for each node to process 500GB at rate of 15MB/S - 15*60*60*9 = 486000MB ~ 500 GB - 1000 * 9 * 0.34$ = 3060$ for single run - 1 PB = 1000000 / 500 = 2000 * 9 = 18000 h /24 = 750 Day - The cost for 1000 cloud node each processing 1PB 2000 * 3060$ = 6,120,000$
  • 9. Importance of Big Data - Government In 2012, the Obama administration announced the Big Data Research and Development Initiative 84 different big data programs spread across six departments - Private Sector - Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data - Facebook handles 40 billion photos from its user base. - Falcon Credit Card Fraud Detection System protects 2.1 billion active accounts world-wide - Science - Large Synoptic Survey Telescope will generate 140 Terabyte of data every 5 days. - Large Hardon Colider 13 Petabyte data produced in 2010 - Medical computation like decoding human Genome - Social science revolution - New way of science (Microscope example)
  • 10. Importance of Big Data • Job - The U.S. could face a shortage by 2018 of 140,000 to 190,000 people with "deep analytical talent" and of 1.5 million people capable of analyzing data in ways that enable business decisions. (McKinsey & Co) - Big Data industry is worth more than $100 billion growing at almost 10% a year (roughly twice as fast as the software business)  Technology Player in this field  Oracle  Exadata  Microsoft  HDInsight Server  IBM  Netezza
  • 11. Some Challenges in Big Data  Big Data Integration is Multidisciplinary Less than 10% of Big Data world are genuinely relational Meaningful data integration in the real, messy, schema-less and complex Big Data world of database and semantic web using multidisciplinary and multi-technology methode  The Billion Triple Challenge Web of data contain 31 billion RDf triples, that 446million of them are RDF links, 13 Billion government data, 6 Billion geographic data, 4.6 Billion Publication and Media data, 3 Billion life science data BTC 2011, Sindice 2011  The Linked Open Data Ripper Mapping, Ranking, Visualization, Key Matching, Snappiness  Demonstrate the Value of Semantics: let data integration drive DBMS technology Large volumes of heterogeneous data, like link data and RDF
  • 12. Implementation of Big Data Platforms for Large-scale Data Analysis • Parallel DBMS technologies – Proposed in late eighties – Matured over the last two decades – Multi-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions for very large enterprises • Map Reduce – pioneered by Google – popularized by Yahoo! (Hadoop)
  • 13. Implementation of Big Data MapReduce • Overview: – Data-parallel programming model – An associated parallel and distributed implementation for commodity clusters • Pioneered by Google – Processes 20 PB of data per day • Popularized by open-source Hadoop – Used by Yahoo!, Facebook, Amazon, and the list is growing … Parallel DBMS technologies  Popularly used for more than two decades  Research Projects: Gamma, Grace, …  Commercial: Multi-billion dollar industry but access to only a privileged few  Relational Data Model  Indexing  Familiar SQL interface  Advanced query optimization  Well understood and studied
  • 14. Conclusion 2013 2020 x50 • As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half zettabyte. • The total amount of global data is expected to grow to 2.7 zettabytes during 2013. This is 48% up from 2012. • The term big data used by different vendors this may refer to the technology which includes tools and processes that an organization requires to handle the large amounts of data and storage facilities. • Though the potential of analytics and Big Data is clear, one of the challenges noticed is a significant shortage of data scientists with deep analytical training in data discovery, predictive modeling, open source statistical solutions, visualization skills and business acumen to be able to frame and interpret analyses.
  • 15. Toney Hey, Stwart Tansley and Kristin Tolle, “The Fourth Paradigm Data-Intensive Scientific Discovery”, Microsoft Press, 2009. Book Referred