SlideShare una empresa de Scribd logo
1 de 43
Big Data and Me Bhupesh Bansal Feb 3, 2012
Relational Model Architecture Reference :  http:// www.slideshare.net / adorepump / voldemort-nosql
Linkedin 2006 Reference :  http://www.slideshare.net/linkedin/linked-in-javaone-2008-tech-session-comm
Relational model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Specialized Systems Architecture Reference :  http:// www.slideshare.net / adorepump / voldemort-nosql
Linkedin 2007 Reference :  http://www.slideshare.net/linkedin/linked-in-javaone-2008-tech-session-comm
Specialized systems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Batch Driven Architecture Reference :   http:// www.slideshare.net / bhupeshbansal /hadoop-user-group-jan2010
Motivation I : Big Data  02/06/12 Reference :  algo2.iti.kit.edu/.../fopraext/index.html
Motivation II: Data Driven Features
Motivation III: Makes Money  02/06/12 Proprietary & Confidential
Motivation IV: Big Data is cool 02/06/12
Reference : http:// www.slideshare.net / BenSiscovick /the-business-of-big-data-ia-ventures-8577588
Big Data Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why is this hard ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Some good news !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What works !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What doesn’t works !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Storage ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Publish/Subscribe ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Real time analysis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Batch computation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open source: Other ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
My personal picks !! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Jeff Dean’s Thoughts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Reference  : http:// www.slideshare.net / xlight /google-designs-lessons-and-advice-from-building-large-distributed-systems
How Voldemort was born ? Reference : 1)  http://www.slideshare.net/bhupeshbansal/hadoop-user-group-jan2010 2)  http://www.slideshare.net/adorepump/voldemort-nosql
Why NoSQL ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inspired By Amazon Dynamo & Memcached ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ACID Vs CAP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
Consistent Hashing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
R+W > N  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
Versioning & Conflict Resolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Vector Clock ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Proprietary & Confidential 02/06/12
Client API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Voldemort Physical Deployment
 
Read-only storage engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Reference :   http:// www.slideshare.net / bhupeshbansal /hadoop-user-group-jan2010
What do we use Hadoop/Voldemort for ? Proprietary & Confidential 02/06/12
Batch Driven Architecture Reference :   http:// www.slideshare.net / bhupeshbansal /hadoop-user-group-jan2010
Data Flow Driven Architecture Reference : http:// sna-projects.com /blog/2011/08/ kafka /
Questions

Más contenido relacionado

La actualidad más candente

Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
Cloudera, Inc.
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 

La actualidad más candente (20)

File Context
File ContextFile Context
File Context
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) Model
 
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBaseComparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBase
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
 
Meetup ml spark_ppt
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_ppt
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Allyourbase
AllyourbaseAllyourbase
Allyourbase
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
Speed it up and Spark it up at Intel
Speed it up and Spark it up at IntelSpeed it up and Spark it up at Intel
Speed it up and Spark it up at Intel
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 

Similar a Bhupeshbansal bigdata

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
guest18a0f1
 

Similar a Bhupeshbansal bigdata (20)

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
Web20expo Scalable Web Arch
Web20expo Scalable Web ArchWeb20expo Scalable Web Arch
Web20expo Scalable Web Arch
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark an...
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for BeginnersThe Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
The Open Source and Cloud Part of Oracle Big Data Cloud Service for Beginners
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Agile data warehousing
Agile data warehousingAgile data warehousing
Agile data warehousing
 
No sql
No sqlNo sql
No sql
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Bhupeshbansal bigdata

Notas del editor

  1. Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
  2. Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
  3. Statistical learning as the ultimate agile development tool (Peter Norvig), “business logic” through data rather than code
  4. No Joins Across data domains due to APIs Within data domains due to performance Natural operation: getAll(id…) Latency: if you want to call 30 services on your main pages, they better be quick (30 * 20ms = 600ms)
  5. - Strong Consistency: all clients see the same view, even in presence of updates - High Availability: all clients can find some replica of the data, even in the presence of failures Partition-tolerance: the system properties hold even when the system is partitioned high availability : Mantra for websites Better to deal with inconsistencies, because their primary need is to scale well to allow for a smooth user experience.
  6. Hashing .. Why do we need it ?? Basic problem : Clients need to know which data is where ?? Many ways of solving it Central configuration Hashing Linear hashing works : issue is when cluster is dynamic ?? KeyHash –node IDmapping change for a lot of entries When you add new slots Consistent hashing : preserves key –Node mapping for most of the keys and only change the minimal amount needed How to do it ?? Number of partitions ---------------------------- Arbitrary , each node is allocated many partitions (better load balancing and fault tolerance) Few hundreds to few thousands .. Key  partition mapping is fixed and only ownership of partitions can change
  7. Give example of read and writes with vector clocks Pros and cons vs paxos and 2pc User can supply strategy for handling cases where v1 and v2 are not comparable.
  8. Fancy way of doing Optimistic locking
  9. Very simple APIS NO Range Scans .. . No iterator on KeySet / Entry SET : Very hard to fix performance Have plans to provide such an iterator
  10. Explain about partitions Make things fast by removing slow things, not by tuning HTTP client not performant Separate caching layer
  11. Transfer time: 30 minutes Can max out a gb network, so be careful
  12. Example: member data--does not make sense to repeatedly join positions, emails, groups, etc. Explain about joins How to better model in java? Json like data model
  13. Questions, comments, etc