SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
Solr
The Search First NoSQL Database
• Mark Miller: Cloudera
employee, Lucene PMC
member, Apache member
• Started playing with
Lucene in 2006
• Lucene committer since
2008
• Solr committer since 2009
Who Am I?
My Dog
Big Data is getting Bigger
• The total Big Data market reached $11.4 billion in 2012
• The Big Data market is projected to reach $18.1 billion in
2013, an annual growth of 61%
• On pace to exceed $47 billion by 2017.
3 basic needs
• Storage
• Processing
• Search
Two Standouts in
the Big Data Market
•Hadoop
•NoSQL
Ultimately, the NoSQL market is largely up for
grabs. Each NoSQL database has its related
strengths and weaknesses, and no one NoSQL
database currently “does it all.” Big Data
practitioners must take a number of factors into
consideration when selecting a NoSQL database
to facilitate large-scale transactional workloads,
including scalability, performance, security, and
ease-of-development.
Big Data Vendor Revenue and Market Forecast
(Wikibon)
RMDBS
• The classic way to store your data.
• ACID is great, transactions are cool, SQL is well
known and understood.
• Scaling is *hard*, but possible (see Facebook’s
MySQL cluster)
• ‘impedance mismatch’ sucks
Search
• Search has been moving from an expensive,
complicated option to an affordable and more easy
necessity.
• Lot’s of data begs for the ability to process it, store it,
and search it.
Enterprise Search
Engines
• Verity - acquired by Autonomy in 2005
• FAST - acquired by Microsoft in 2008
• Endeca - acquired by Oracle in 2011
• Autonomy - acquired by HP in 2011
• Vivisimo - acquired by IBM in 2012
NoSQL
• Not Only SQL rather than ‘No SQL’
• Except that makes little sense...
• “when ‘NoSQL’ is applied to a database, it refers to
an ill- defined set of mostly open-source databases,
mostly developed in the early 21st century, and
mostly not using SQL.” - NoSQL Distilled
NoSQL
• Key-Value
• Columnar
• Document
• Graph
In the beginning..
• BerkeleyDB (1991?)
• Lotus Notes (1989?)
• Bayou (1996?)
In the beginning of
the modern era...
• BigTable (Google) (started in 2004, paper in 2006)
• Dynamo (Amazon) (paper in 2007)
Derivatives
• Dynamo: Cassandra, CouchDB, Voldemort, Riak
• BigTable: Cassandra, HBase, Redis, HyperTable,
Accumulo
Also...
• AppEngine storage built on BigTable
• DynamoDB - based on the principles of Dynamo
When it comes to NoSQL,
Open Source rules the
roost.
• I won’t be talking about any solution that is not
based on Open Source - only because those
solutions are not popular.
• "there’s a notion that NoSQL is an open-source
phenomenon.” - NoSQL Distilled
The 2013 Future of Open
Source Survey Results
Black Duck and North Bridge
What’s Popular?
• NoSQL database proliferation - NoSQL databases are
a dime a dozen. Why?
• Which solutions should we look at?
indeed.com
• Indeed.com is an employment-related metasearch
engine for job listings
• Indeed is the #1 job site worldwide, with over 100
million unique visitors per month. Indeed is available
in more than 50 countries and 26 languages,
covering 94% of global GDP.
http://db-engines.com
• DB-Engines is an initiative to collect and present
information on database management systems
(DBMS). In addition to established relational DBMS,
systems and concepts of the growing NoSQL area
are emphasized.
• The DB-Engines Ranking is a list of DBMS ranked by
their current popularity. The list is updated monthly.
Popular Search Job
Trends
Popular Search
Solutions (DB-Engines)
Popular NoSQL Job
Trends
Let’s get some
context
Compare to Java
Add in Oracle...
NoSQL Database
Types
• Key-Value
• Column Family
• Document
• Graph
I’m going to ignore
Graph...everyone
else seems to...
Popular NoSQL
Document Stores
(DB-Rankings)
Key-Value Stores
Columnar Stores
The Full Popularity
Contest
In case you forgot,
Oracle is in the
NoSQL game...
• Oracle NoSQL
CAP Theorem
The CAP theorem, also known as Brewer's theorem,
states that it is impossible for a distributed computer
system to simultaneously provide all three of the
following guarantees:
• Consistency (all nodes see the same data at the
same time)
• Availability (a guarantee that every request
receives a response about whether it was
successful or failed)
• Partition tolerance (the system continues to
operate despite arbitrary message loss or failure of
part of the system)
CAP
Architectures
• For NoSQL, generally boils down to AP or CP. CA
does not support partition tolerance.
• You have to trade off consistency versus availability.
• AP favors availability over consistency - the is the
eventually consistent architecture.
• CP favors consistency over availability.
• Of course, there is a continuum between AP and CP.
Key Design
Decisions
• Data Model - how is the data stored/accessed
• Distribution Model - how is the data distributed
• Conflict Resolution - how is it ensured that the same
update ‘wins’ on each node.
Data Model
• key -> value (opaque)
• key -> document
• column oriented
Distributed Model
• Roughly, how is data distributed across the cluster?
• Sharding, replication, etc
Data Versioning and
Consistency
• Essentially, how is data kept consistent across nodes?
• Sequential consistency—ensuring that all nodes
apply operations in the same order.
• Update consistency and read consistency.
• Data Model - bson - binary json format
• Distributed Model - sharded asynchronous master/
slave replication.
• Data Versioning and Consistency - Master / Slave, per
table write lock
MongoDB Search
• Built in text search. I think of it like RBDMS built in
full text search - major feature gaps with dedicated
full text search engines, and likely major
performance gaps.
• Common to sit a search engine next to MongoDB
• Data Model - column based, like BigTable
• Distributed Updates - similar to Dynamo, consistent
hashing, master-master
• Data Versioning and Consistency - timestamps
Cassandra Search
• Lucandra
• Solandra
• DataStax Enterprise Search (Solr fields must be
strings)
• Data Model - Column Store
• Distribution Model - regions served by region
servers.
• Versioning and Consistency - strongly consistent
HBase Search
• HBasene (dead?)
• HBASE-SEARCH, HBASE-3529 (dead?)
• Solbase
• Lily
• Riak is a NoSQL database implementing the
principles from Amazon's Dynamo paper
• Data Model - stores key/value pairs in a high level
namespace called a bucket.
• Data Versioning and Consistency - Riak uses a data
structure called a vector clock to reason about
causality and staleness of stored values. (Can also
use timestamps). Last write wins, or client resolves
conflict.
Riak Search
• Riak Search - custom search engine, Solr-like API
• Yokozuna
Yokozuna Author Enumerates
Common Reasons Custom Search
has Failed
• Pretends to be lucene/solr
• Lack of analyzer/language/features
• Bad performance/resource usage for certain queries
• Basho is not in the business of search
• CouchDB’s data format is JSON stored as documents
(self-contained records with no intrinsic
relationships), grouped into “database” namespaces.
• Conflicts are left to the application to resolve at write
time. CouchDB arbitrarily, but deterministically,
determines a winner and tracks a conflict. The client
must then resolve the conflict.
CouchDB Search
• CouchDB-Lucene
• Seems people usually just sit a search engine next to
CouchDB
• Redis is an open-source, networked, in-memory, key-
value data store with optional durability.
• Memcached is a general-purpose distributed memory
caching system
• Redis-Search
Adding Search to
NoSQL
• Hard to do without a lot of compromise
• Build your own, or use Lucene or Lucene based
solution
• Nothing has yet set the world on fire...
Adding NoSQL to
Search
• Search solutions are generally already a Document
based NoSQL solution.
• Seems a lot easier to do then the reverse
• Nothing has yet set the world on fire...
Solr NoSQL
Features
• Realtime-Get
• Update Durability
• Atomic Compare and Set
• Versioning and optimistic locking
Schemaless?
• NoSQL databases are generally ‘schemaless’
• In some ways, convenient, in others ways not.
• Implicit schema moves to application code.
• Can’t optimize based on types.
• Note: some are calling ‘guessed’ schemas
schemaless.
• Most similar to the MongoDB architecture
• A CP system, though currently, eventually consistent.
• The architecture supports adding strong consistency
options.
SolrCloud
• The length of time an inconsistency is present is
called the inconsistency window.
• SolrCloud has a very small inconsistency window.
Data Model
• key -> document
• Optionally, column oriented
Contact Info
• @heismark
• markrmiller@gmail.com

Más contenido relacionado

La actualidad más candente

CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersNiko Neugebauer
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using FabricKarthik .P.R
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiRemote MySQL DBA
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at ShowyouJohn Muellerleile
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With RiakJohn Lynch
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...Lucidworks
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ CliqzHendrik Muhs
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark PresentationStephen Borg
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Mydbops
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangChen Zhang
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark Summit
 

La actualidad más candente (20)

CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup Mumbai
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 

Destacado

Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeleylucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudTobias Schmidt
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.ashish0x90
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr CloudCominvent AS
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 

Destacado (20)

Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloud
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
Data models
Data modelsData models
Data models
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Different data models
Different data modelsDifferent data models
Different data models
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 

Similar a Solr cloud the 'search first' nosql database extended deep dive

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singhMayank Singh
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 

Similar a Solr cloud the 'search first' nosql database extended deep dive (20)

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
No SQL
No SQLNo SQL
No SQL
 
Drop acid
Drop acidDrop acid
Drop acid
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
MongoDB
MongoDBMongoDB
MongoDB
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
Revision
RevisionRevision
Revision
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 

Más de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

Más de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Último

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 

Último (20)

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 

Solr cloud the 'search first' nosql database extended deep dive

  • 1. Solr The Search First NoSQL Database
  • 2. • Mark Miller: Cloudera employee, Lucene PMC member, Apache member • Started playing with Lucene in 2006 • Lucene committer since 2008 • Solr committer since 2009 Who Am I?
  • 4. Big Data is getting Bigger • The total Big Data market reached $11.4 billion in 2012 • The Big Data market is projected to reach $18.1 billion in 2013, an annual growth of 61% • On pace to exceed $47 billion by 2017.
  • 5. 3 basic needs • Storage • Processing • Search
  • 6. Two Standouts in the Big Data Market •Hadoop •NoSQL
  • 7. Ultimately, the NoSQL market is largely up for grabs. Each NoSQL database has its related strengths and weaknesses, and no one NoSQL database currently “does it all.” Big Data practitioners must take a number of factors into consideration when selecting a NoSQL database to facilitate large-scale transactional workloads, including scalability, performance, security, and ease-of-development. Big Data Vendor Revenue and Market Forecast (Wikibon)
  • 8. RMDBS • The classic way to store your data. • ACID is great, transactions are cool, SQL is well known and understood. • Scaling is *hard*, but possible (see Facebook’s MySQL cluster) • ‘impedance mismatch’ sucks
  • 9. Search • Search has been moving from an expensive, complicated option to an affordable and more easy necessity. • Lot’s of data begs for the ability to process it, store it, and search it.
  • 10. Enterprise Search Engines • Verity - acquired by Autonomy in 2005 • FAST - acquired by Microsoft in 2008 • Endeca - acquired by Oracle in 2011 • Autonomy - acquired by HP in 2011 • Vivisimo - acquired by IBM in 2012
  • 11. NoSQL • Not Only SQL rather than ‘No SQL’ • Except that makes little sense... • “when ‘NoSQL’ is applied to a database, it refers to an ill- defined set of mostly open-source databases, mostly developed in the early 21st century, and mostly not using SQL.” - NoSQL Distilled
  • 13. In the beginning.. • BerkeleyDB (1991?) • Lotus Notes (1989?) • Bayou (1996?)
  • 14. In the beginning of the modern era... • BigTable (Google) (started in 2004, paper in 2006) • Dynamo (Amazon) (paper in 2007)
  • 15. Derivatives • Dynamo: Cassandra, CouchDB, Voldemort, Riak • BigTable: Cassandra, HBase, Redis, HyperTable, Accumulo
  • 16. Also... • AppEngine storage built on BigTable • DynamoDB - based on the principles of Dynamo
  • 17. When it comes to NoSQL, Open Source rules the roost. • I won’t be talking about any solution that is not based on Open Source - only because those solutions are not popular. • "there’s a notion that NoSQL is an open-source phenomenon.” - NoSQL Distilled
  • 18. The 2013 Future of Open Source Survey Results Black Duck and North Bridge
  • 19. What’s Popular? • NoSQL database proliferation - NoSQL databases are a dime a dozen. Why? • Which solutions should we look at?
  • 20. indeed.com • Indeed.com is an employment-related metasearch engine for job listings • Indeed is the #1 job site worldwide, with over 100 million unique visitors per month. Indeed is available in more than 50 countries and 26 languages, covering 94% of global GDP.
  • 21. http://db-engines.com • DB-Engines is an initiative to collect and present information on database management systems (DBMS). In addition to established relational DBMS, systems and concepts of the growing NoSQL area are emphasized. • The DB-Engines Ranking is a list of DBMS ranked by their current popularity. The list is updated monthly.
  • 28. NoSQL Database Types • Key-Value • Column Family • Document • Graph
  • 29. I’m going to ignore Graph...everyone else seems to...
  • 34.
  • 35. In case you forgot, Oracle is in the NoSQL game... • Oracle NoSQL
  • 36. CAP Theorem The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it was successful or failed) • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  • 37. CAP
  • 38. Architectures • For NoSQL, generally boils down to AP or CP. CA does not support partition tolerance. • You have to trade off consistency versus availability. • AP favors availability over consistency - the is the eventually consistent architecture. • CP favors consistency over availability. • Of course, there is a continuum between AP and CP.
  • 39. Key Design Decisions • Data Model - how is the data stored/accessed • Distribution Model - how is the data distributed • Conflict Resolution - how is it ensured that the same update ‘wins’ on each node.
  • 40. Data Model • key -> value (opaque) • key -> document • column oriented
  • 41. Distributed Model • Roughly, how is data distributed across the cluster? • Sharding, replication, etc
  • 42. Data Versioning and Consistency • Essentially, how is data kept consistent across nodes? • Sequential consistency—ensuring that all nodes apply operations in the same order. • Update consistency and read consistency.
  • 43. • Data Model - bson - binary json format • Distributed Model - sharded asynchronous master/ slave replication. • Data Versioning and Consistency - Master / Slave, per table write lock
  • 44. MongoDB Search • Built in text search. I think of it like RBDMS built in full text search - major feature gaps with dedicated full text search engines, and likely major performance gaps. • Common to sit a search engine next to MongoDB
  • 45. • Data Model - column based, like BigTable • Distributed Updates - similar to Dynamo, consistent hashing, master-master • Data Versioning and Consistency - timestamps
  • 46. Cassandra Search • Lucandra • Solandra • DataStax Enterprise Search (Solr fields must be strings)
  • 47. • Data Model - Column Store • Distribution Model - regions served by region servers. • Versioning and Consistency - strongly consistent
  • 48. HBase Search • HBasene (dead?) • HBASE-SEARCH, HBASE-3529 (dead?) • Solbase • Lily
  • 49. • Riak is a NoSQL database implementing the principles from Amazon's Dynamo paper • Data Model - stores key/value pairs in a high level namespace called a bucket. • Data Versioning and Consistency - Riak uses a data structure called a vector clock to reason about causality and staleness of stored values. (Can also use timestamps). Last write wins, or client resolves conflict.
  • 50. Riak Search • Riak Search - custom search engine, Solr-like API • Yokozuna
  • 51. Yokozuna Author Enumerates Common Reasons Custom Search has Failed • Pretends to be lucene/solr • Lack of analyzer/language/features • Bad performance/resource usage for certain queries • Basho is not in the business of search
  • 52. • CouchDB’s data format is JSON stored as documents (self-contained records with no intrinsic relationships), grouped into “database” namespaces. • Conflicts are left to the application to resolve at write time. CouchDB arbitrarily, but deterministically, determines a winner and tracks a conflict. The client must then resolve the conflict.
  • 53. CouchDB Search • CouchDB-Lucene • Seems people usually just sit a search engine next to CouchDB
  • 54. • Redis is an open-source, networked, in-memory, key- value data store with optional durability. • Memcached is a general-purpose distributed memory caching system • Redis-Search
  • 55. Adding Search to NoSQL • Hard to do without a lot of compromise • Build your own, or use Lucene or Lucene based solution • Nothing has yet set the world on fire...
  • 56. Adding NoSQL to Search • Search solutions are generally already a Document based NoSQL solution. • Seems a lot easier to do then the reverse • Nothing has yet set the world on fire...
  • 57. Solr NoSQL Features • Realtime-Get • Update Durability • Atomic Compare and Set • Versioning and optimistic locking
  • 58. Schemaless? • NoSQL databases are generally ‘schemaless’ • In some ways, convenient, in others ways not. • Implicit schema moves to application code. • Can’t optimize based on types. • Note: some are calling ‘guessed’ schemas schemaless.
  • 59. • Most similar to the MongoDB architecture • A CP system, though currently, eventually consistent. • The architecture supports adding strong consistency options.
  • 60. SolrCloud • The length of time an inconsistency is present is called the inconsistency window. • SolrCloud has a very small inconsistency window.
  • 61. Data Model • key -> document • Optionally, column oriented
  • 62.
  • 63. Contact Info • @heismark • markrmiller@gmail.com