SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Adventures in Discoverability with C* and Solr

Patricia Gorla, Systems Engineer
@patriciagorla
@o19s

#CASSANDRAEU

CASSANDRASUMMITEU
About Me
• Solr
• Cassandra
• Information retrieval

Paul Hostetler - phostetler.com

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?

All ancient Greek philosophers?

Coordinates of Stagira?

All cities within 100km of Stagira

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple

Complex

Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

All ancient Greek philosophers?

Coordinates of Stagira?
select coord
where name = “Stagira”;

All cities within 100km of Stagira

#CASSANDRAEU

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
select birthPlace
where name = “Aristotle”;

Complex
All ancient Greek philosophers?
create index on tag;
select *
where tag = “Greek philosophy”;

Coordinates of Stagira?
select coord
where name = “Stagira”;

#CASSANDRAEU

All cities within 100km of Stagira
???

CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace?
q=Aristotle&fl=birthPlace

All ancient Greek philosophers?
q=Greek philosophy

Coordinates of Stagira?
q=Stagira&fl=point

All cities within 100km of Stagira
q=*:*&fq={!geofilt pt=40.530,
23.752 sfield=point d=100}

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search
• Google Site Search
• MySQL ‘like’ statements

Seth Casteel - littlefriendsphoto.com

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Search

• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting

#CASSANDRAEU

and many more!

CASSANDRASUMMITEU
Inverted Index
[1] Pleasure in the job puts perfection in the work.
[2] Education is the best provision for the journey to old age.
[3] If some animals are good at hunting and others are suitable
for hunting, then the gods must clearly smile on hunting.
[4] It is the mark of an educated mind to be able to entertain a
thought without absorbing it.

Term

Freq

Documents

education

[2] [4]

hunting

3

[3]

perfection
#CASSANDRAEU

2
1

[1]
CASSANDRASUMMITEU
Defining Field Types
<fieldType name="text_general" class="solr.TextField" >
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory” ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
#CASSANDRAEU

CASSANDRASUMMITEU
Index-side Analysis

Punctuation
Stop Words
Lowercase
Stemming

#CASSANDRAEU

Hope is a waking dream.
Hope is a waking dream
Hope
waking dream
hope
waking dream
hope
wake dream

CASSANDRASUMMITEU
Query-side Analysis

Punctuation
Stop Words
Lowercase
Stemming
Synonyms
#CASSANDRAEU

Hope is a waking dream.
Hope is a waking dream
Hope
waking dream
hope
waking dream
hope
wake dream
hope
wake dream
desire awake wish
CASSANDRASUMMITEU
Faceting
facet_fields: {
tags: [hunting: 1, education: 2, work: 2],
locations: [Stagira: 5, Chalcis: 3]
}

#CASSANDRAEU

CASSANDRASUMMITEU
Approaches to Distribution
• Pay Google more $$
• MySQL ‘like’ shards
• Master/Slave replication
• SolrCloud

#CASSANDRAEU

CASSANDRASUMMITEU
“Distributed search is hard.”

#CASSANDRAEU

CASSANDRASUMMITEU
Solr + Cassandra: Datastax Enterprise
• Full-text search
• Tokenization
• Stemming
• Date ranges
• Aggregation
• High Availability
• Distributed Nature

#CASSANDRAEU

CASSANDRASUMMITEU
Examining DBpedia.org Datasets
Person
<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person .
<http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en .
<http://dbpedia.org/ontology/birthPlace> Stagira
<http://dbpedia.org/ontology/deathPlace> Chalcis .

Place
<http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> .
<http://dbpedia.org/resource/Stagira#lat> "40.5916667" .
<http://dbpedia.org/resource/Stagira#long> "23.7947222" .
<http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en .

#CASSANDRAEU

CASSANDRASUMMITEU
“Love is a single soul
inhabiting two bodies.”
#CASSANDRAEU

CASSANDRASUMMITEU
Querying for data
curl http://localhost:8983/solr/solr.person/q=Aristotle

#CASSANDRAEU

CASSANDRASUMMITEU
Filtering by location
curl http://localhost:8983/solr/solr.location/select?q=*:*
&spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point
d=100}

#CASSANDRAEU

CASSANDRASUMMITEU
“There is no great genius without a
mixture of madness.”
#CASSANDRAEU

CASSANDRASUMMITEU
Schema.xml
Unified Schema
<fields>
<field name="id" type="string" />
<field name="name" type="text" />
<dynamicField name="*Date" type="date" />
<dynamicField name="*Place" type="text" />
<dynamicField name="*Point" type="location" />
<dynamicField name="*_tag"

type="text" />

</fields>

#CASSANDRAEU

CASSANDRASUMMITEU
Upload to DSE, Create Core
curl http://localhost:8983/solr/resource/solr.location/schema.xml 
--data-binary @solr/location_schema.xml 
-H 'Content-type:text/xml; charset=utf-8 '

curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml 
--data-binary @solr/location_solrconfig.xml 
-H 'Content-type:text/xml; charset=utf-8 '
curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location

#CASSANDRAEU

CASSANDRASUMMITEU
Cassandra Schema
Location Schema

gc_grace_seconds=864000 AND

cqlsh:solr> DESC COLUMNFAMILY location;

read_repair_chance=0.100000 AND

CREATE TABLE location (

replicate_on_write='true' AND

id text PRIMARY KEY,
"_docBoost" text,
"_dynFld" text,
location text,

populate_io_cache_on_flush='false' AND
compaction={'class':
'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression':
'SnappyCompressor'};

name text,
solr_query text,
tags text
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND

CREATE INDEX
solr_location__docBoost_index ON location
("_docBoost");
...
CREATE INDEX
solr_location_solr_query_index ON
location (solr_query);

dclocal_read_repair_chance=0.000000 AND
#CASSANDRAEU

CASSANDRASUMMITEU
What Changes
Solr

Cassandra

• No multiValued fields

• No composite columns

• No JOIN*

• No counter columns

#CASSANDRAEU

CASSANDRASUMMITEU
Bringing it All Together
•

Fault tolerant, available
search

#CASSANDRAEU

CASSANDRASUMMITEU
“Thank you.”

#CASSANDRAEU

CASSANDRASUMMITEU
THANK YOU

pgorla@opensourceconnections.com
@patriciagorla
@o19s
All information, including slides, are on http://github.com/pgorla/million-books
#CASSANDRAEU

CASSANDRASUMMITEU

Más contenido relacionado

Más de DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Más de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

C* Summit EU 2013: One Million Books: Adventures in Discoverability with Cassandra and Solr

  • 1. Adventures in Discoverability with C* and Solr Patricia Gorla, Systems Engineer @patriciagorla @o19s #CASSANDRAEU CASSANDRASUMMITEU
  • 2. About Me • Solr • Cassandra • Information retrieval Paul Hostetler - phostetler.com #CASSANDRAEU CASSANDRASUMMITEU
  • 3. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? All ancient Greek philosophers? Coordinates of Stagira? All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  • 4. How Do I Find What I’m Looking For? Simple Complex Aristotle’s birthplace? select birthPlace where name = “Aristotle”; All ancient Greek philosophers? Coordinates of Stagira? select coord where name = “Stagira”; All cities within 100km of Stagira #CASSANDRAEU CASSANDRASUMMITEU
  • 5. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? select birthPlace where name = “Aristotle”; Complex All ancient Greek philosophers? create index on tag; select * where tag = “Greek philosophy”; Coordinates of Stagira? select coord where name = “Stagira”; #CASSANDRAEU All cities within 100km of Stagira ??? CASSANDRASUMMITEU
  • 6. How Do I Find What I’m Looking For? Simple Aristotle’s birthplace? q=Aristotle&fl=birthPlace All ancient Greek philosophers? q=Greek philosophy Coordinates of Stagira? q=Stagira&fl=point All cities within 100km of Stagira q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  • 7. Approaches to Search • Google Site Search • MySQL ‘like’ statements Seth Casteel - littlefriendsphoto.com #CASSANDRAEU CASSANDRASUMMITEU
  • 8. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU CASSANDRASUMMITEU
  • 9. Approaches to Search • Full-text search • Ranking (Scoring) • Tokenization • Stemming • Faceting #CASSANDRAEU and many more! CASSANDRASUMMITEU
  • 10. Inverted Index [1] Pleasure in the job puts perfection in the work. [2] Education is the best provision for the journey to old age. [3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting. [4] It is the mark of an educated mind to be able to entertain a thought without absorbing it. Term Freq Documents education [2] [4] hunting 3 [3] perfection #CASSANDRAEU 2 1 [1] CASSANDRASUMMITEU
  • 11. Defining Field Types <fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> #CASSANDRAEU CASSANDRASUMMITEU
  • 12. Index-side Analysis Punctuation Stop Words Lowercase Stemming #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream CASSANDRASUMMITEU
  • 13. Query-side Analysis Punctuation Stop Words Lowercase Stemming Synonyms #CASSANDRAEU Hope is a waking dream. Hope is a waking dream Hope waking dream hope waking dream hope wake dream hope wake dream desire awake wish CASSANDRASUMMITEU
  • 14. Faceting facet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3] } #CASSANDRAEU CASSANDRASUMMITEU
  • 15. Approaches to Distribution • Pay Google more $$ • MySQL ‘like’ shards • Master/Slave replication • SolrCloud #CASSANDRAEU CASSANDRASUMMITEU
  • 16. “Distributed search is hard.” #CASSANDRAEU CASSANDRASUMMITEU
  • 17. Solr + Cassandra: Datastax Enterprise • Full-text search • Tokenization • Stemming • Date ranges • Aggregation • High Availability • Distributed Nature #CASSANDRAEU CASSANDRASUMMITEU
  • 18. Examining DBpedia.org Datasets Person <http://xmlns.com/foaf/0.1/name> "Aristotle"@en . <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person . <http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en . <http://dbpedia.org/ontology/birthPlace> Stagira <http://dbpedia.org/ontology/deathPlace> Chalcis . Place <http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> . <http://dbpedia.org/resource/Stagira#lat> "40.5916667" . <http://dbpedia.org/resource/Stagira#long> "23.7947222" . <http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en . #CASSANDRAEU CASSANDRASUMMITEU
  • 19. “Love is a single soul inhabiting two bodies.” #CASSANDRAEU CASSANDRASUMMITEU
  • 20. Querying for data curl http://localhost:8983/solr/solr.person/q=Aristotle #CASSANDRAEU CASSANDRASUMMITEU
  • 21. Filtering by location curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100} #CASSANDRAEU CASSANDRASUMMITEU
  • 22. “There is no great genius without a mixture of madness.” #CASSANDRAEU CASSANDRASUMMITEU
  • 23. Schema.xml Unified Schema <fields> <field name="id" type="string" /> <field name="name" type="text" /> <dynamicField name="*Date" type="date" /> <dynamicField name="*Place" type="text" /> <dynamicField name="*Point" type="location" /> <dynamicField name="*_tag" type="text" /> </fields> #CASSANDRAEU CASSANDRASUMMITEU
  • 24. Upload to DSE, Create Core curl http://localhost:8983/solr/resource/solr.location/schema.xml --data-binary @solr/location_schema.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml --data-binary @solr/location_solrconfig.xml -H 'Content-type:text/xml; charset=utf-8 ' curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location #CASSANDRAEU CASSANDRASUMMITEU
  • 25. Cassandra Schema Location Schema gc_grace_seconds=864000 AND cqlsh:solr> DESC COLUMNFAMILY location; read_repair_chance=0.100000 AND CREATE TABLE location ( replicate_on_write='true' AND id text PRIMARY KEY, "_docBoost" text, "_dynFld" text, location text, populate_io_cache_on_flush='false' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; name text, solr_query text, tags text ) WITH COMPACT STORAGE AND bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment='' AND CREATE INDEX solr_location__docBoost_index ON location ("_docBoost"); ... CREATE INDEX solr_location_solr_query_index ON location (solr_query); dclocal_read_repair_chance=0.000000 AND #CASSANDRAEU CASSANDRASUMMITEU
  • 26. What Changes Solr Cassandra • No multiValued fields • No composite columns • No JOIN* • No counter columns #CASSANDRAEU CASSANDRASUMMITEU
  • 27. Bringing it All Together • Fault tolerant, available search #CASSANDRAEU CASSANDRASUMMITEU
  • 29. THANK YOU pgorla@opensourceconnections.com @patriciagorla @o19s All information, including slides, are on http://github.com/pgorla/million-books #CASSANDRAEU CASSANDRASUMMITEU