SlideShare a Scribd company logo
1 of 21
Download to read offline
Oracle Open World
Data-and-Compute-Intensive
processing
Use Case: Lucene Domain Index
Marcelo F. Ochoa
Fac. Cs. Exactas - UNICEN - Tandil - Argentina
Agenda
Data-and-Compute Intensive Search
What is Lucene?
What is Lucene Domain Index?
Performance
Application integration
Demo
Future plans
Data-and-Compute Intensive Search
Data-and-Compute Intensive Search
Strategies
Middle-tier-based search engines
Google Appliance
SES
Nutch
Solr
Database-embedded search engines
Oracle Text
Lucene Domain Index (Lucene OJVM)
Middle-tier-based Search Engines
Benefits
Simple (crawler mode)
Medium complexity (Solr
WS)
Out off the box solution
(crawler)
No application code is
necessary to integrate
(crawler)
Medium out off the box
solution (Solr WS)
Drawbacks
Updated are slow (usually
monthly, weekly)
A lot a wasted traffic
You can not index pages
based on database
information which requires
login (crawler mode)
Indexing tables requires
triggers, batch process or a
persistent layer to transfer
modifications
Database-embedded Search Engines
Benefits
Fastest update
No extra coding its
necessary, SQL access
Ready to use to any
language, PHP, Phyton, .Net
You can index tables
Changes are automatically
notified
No network traffic
No network marshalling
Drawbacks
A little slow down of Java
execution compared to a Sun
JDK JVM
What is Lucene?
Open Source Information Retrieval (IR) Library with extensible
APIs
Top level Apache project
Is the core component of Apache Solr and Nutch projects
100% Java
Around 800 classes
47.000 lines of code
33.000 lines of test
78.000 lines at contrib area
Can search and Index any textual data
Can scales to millions of pages or records
Provides fuzzy search, proximity search, range queries, ...
Wildcards: single and multiple characters, anywhere in the search
words
What is Lucene Domain Index?
An Embebed version of Lucene IR library running inside Oracle
OJVM
37 new Java Classes and a new PLSQL Object Type
A new domain index for Oracle Databases using Data Cartridge
API (ODCI)
A new Store implementation for Lucene (OJVMDirectory) which
replaces a traditional filesystem by Secure BLOB
Two new SQL operators lcontains() and lscore()
An orthogonal/uptodate Lucene solution for any programming
language, especially Java, Ruby, Python, PHP and .Net, currently
latest production version - 2.3.2.
Benefits
Benefits added to Oracle Applications
No network round trip for indexing Oracle tables
A fault tolerant, transactional and scalable storage for Lucene
inverted index
Small Lucene index structure
Support for IOT
Support for indexing joined tables using default User Data Store
Support for indexing virtual columns
Support for order by, filter by and in-line pagination operations at
Index level layer
Support padding/formatting for Text/Date/Time/Number
But more important than above is
Easily to adapt for a new functionality
Performance Test Suite
Corpus: XML Spanish WikiPedia dump:
Total documents: 1.056.163 - 2,67 Gb
Average size per document: 2.533 bytes
Lucene Index size:
10 BLOB/files
808Mb total.
5 fields (title,revisionDate,comment,text)
Table structure (XMLDB):
pages
title:VARCHAR2
id:NUMBER
pages_revisions
id:NUMBER
revisionDate:TIMESTAMP
comment:VARCHAR2
text:CLOB
1
n
..
Middle-tier-based approach
Requires transfer all database table data to the middle tier
A middle tier application performs this query:
SELECT /*+ DYNAMIC_SAMPLING(0) RULE NOCACHE(PAGES) */ PAGES.rowid,
extractValue(object_value,'/page/title') "title",extractValue(object_value,
'/page/revision/comment') "comment",extract(object_value,
'/page/revision/text/text()') "text",extractValue(object_value,
'/page/revision/timestamp') "revisionDate"
FROM
ESWIKI.PAGES where PAGES.rowid in
(select rid from (select rowid rid,rownum rn from ENWIKI.PAGES) where rn>=1 and rn<=300)
For 300 rows SQLTrace reports:
bytes sent via SQL*Net to client 3.245.358
bytes received via SQL*Net from client 1.785.912
SQL*Net roundtrips to/from client 2.383
Total indexing time for 33.912 rows 824 seconds
SQL
Application
External
Indexer
Database-embedded approach
Index Definition, Lucene Domain Index syntax:
SQL> ALTER SESSION SET sql_trace=true;
SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 8';
SQL> create index pages_lidx_all on pages p (value(p))
indextype is Lucene.LuceneIndex
parameters('PopulateIndex:false;DefaultColumn:text;
SyncMode:Deferred;LogLevel:WARNING;
Analyzer:org.apache.lucene.analysis.SpanishWikipediaAnalyzer;
ExtraCols:extractValue(object_value,''/page/title'') "title",
extractValue(object_value,''/page/revision/comment'') "comment",
extract(object_value,''/page/revision/text/text()'') "text",
extractValue(object_value,''/page/revision/timestamp'') "revisionDate";
FormatCols:revisionDate(day);IncludeMasterColumn:false;
LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW
CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING');
After creating an Index is necessary to submit changes for indexing
This can be done using:
DECLARE
ridlist sys.ODCIRidList;
BEGIN
select rid BULK COLLECT INTO ridlist
from (select rowid rid,rownum rn from pages) where rn>=1 and rn<=300;
LuceneDomainIndex.enqueueChange(USER||'.PAGES_LIDX_ALL',ridlist,'insert');
END;
For 300 rows SQLTrace reports:
bytes sent via SQL*Net to client 1.301
bytes received via SQL*Net from client 1.354
SQL*Net roundtrips to/from client 4
Total indexing time for 33.912 rows 346 seconds
Database-embedded approach
Jav
aJD
BC
Cal
ls
SQ
Stored Procedure Call
Application
Client
CPU / Network usage during indexing
External indexing (824 s.)
Integrated indexing (346 s.)
User
Sys
CPU Load
Nice
User
Sys
CPU Load
Nice
Receiver
Transmiter
Network
Server side
Receiver
Transmiter
Network
Client side
Client side
Server side
CPU/IO for Database-embedded indexing
This information was taken with SQL_TRACE=true indexing 3.000
rows inside OJVM.
Most time is spent in full table scan
In addition, middle-tier indexing of 3.000 rows would require sending
34Mb of data over the network.
58% 43,2%
Application integration: Fuzzy searches
Old implementation (without Lucene):
select p.id,p.first_Name,p.last_Name,p.nationality,p.sex,p.type_Document,
p.number_Document,p.civil_State,p.date_Birth,p.mail,g.organization_id ,
fnmatchperson(p.first_name, p.last_name, 'John Doe') as suma
from person p left join (select * from guest where organization_id = 67) g on g.person_id = p.id
where p.state = 1 and fnmatchperson(p.first_name, p.last_name, 'John Doe') >= 50 order by suma desc
Lucene implementation:
select /*+ DOMAIN_INDEX_SORT */
p.id,p.first_Name,p.last_Name,p.nationality,p.sex,p.type_Document,
p.number_Document,p.civil_State,p.date_Birth,p.mail,g.organization_id ,
lscore(1) as suma from person p left join
(select * from guest where organization_id = 67) g on g.person_id = p.id
where p.state = 1 and lcontains(p.first_name, 'rownum:[1 TO 20] AND John~ Doe~',1) > 0
where "John Doe " is searched as "John~ Doe~ " to provide partial
match.
~ Lucene operator uses Levenshtein Distance, or Edit Distance
algorithm. http://en.wikipedia.org/wiki/Levenshtein_distance
Execution plan for both queries
fnMatch solution
Lucene solution
Key points
Only one extra index is required:
create index person_lidx on person(first_name)
indextype is lucene.LuceneIndex
parameters('SyncMode:OnLine;LogLevel:ALL;
AutoTuneMemory:true;IncludeMasterColumn:false;DefaultOperator:OR;
DefaultColumn:name_str;Analyzer:org.apache.lucene.analysis.
SimpleAnalyzer;
ExtraCols:first_name||'' ''||last_name "name_str"');
Simple to adapt, only one class was modified to provide partial
match.
String split[] = firstLastName.split(" ");
sql3 = "";
for(int i = 0; i < split.length; i++){
sql3 += /*"'" +*/ split[i].toLowerCase().trim() + "~ ";
}
sql3 = sql3.substring(0, sql3.length()-1);//le quita la coma
/*sql += ", fnmatchperson(p.first_name, p.last_name, " + sql3 + ") as suma ";*/
/*sql3 = " and fnmatchperson(p.first_name, p.last_name, " + sql3 + ") >= 50 " ;*/
sql3 = "and lcontains(p.first_name, 'rownum:[1 TO 20] AND "+sql3 +"',1) > 0 ";
Key points, cont.
Less network traffic
In the above example, around of 20% of the rows are discarded
by the filter operation
"GUEST"."PERSON_ID"(+)="P"."ID" AND ORGANIZATION_ID"(+)=67"
"P"."STATE"=1
In Solr implementation a new row on person table imply:
N bytes of SQLNet +283 bytes for HTTP Post method
Faster updates
Compared to Solr approach we send 283 bytes less which
means faster operations.
Compared to middle tier approach, once a new row is added to
the table it is ready to be included in the next query in the
example shown this is critical constraint
Minimal application code impact
only a new index
only a rewrite where condition is needed to replace fnMatch
Future plans
Add Faceted search, may be using ODCI aggregate functions or
Pipeline Tables
Strong committing to latest Lucene production release, once 2.4
version will be released, we will test inside OJVM
Add ODCI Extensible Optimizer Interface to better dialogue with
the Oracle SQL Engine
A slave session which collects query from different parallel
session to reduce memory foot print and to provides highest hit
ratios
A JMX interface to monitor Lucene Domain Index using Sun's
JMX console
Useful links
Lucene Project:
http://lucene.apache.org/java/docs/index.html
Lucene Oracle Integration
http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
Forum, Peer to Peer Support
http://sourceforge.net/forum/forum.php?forum_id=187896
Download Binary Distribution (10g/11g)
http://sourceforge.net/project/showfiles.php?group_id=56183&package_id=255524
CVS Access
cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login
cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
http://dbprism.cvs.sourceforge.net/dbprism/ojvm/
Q & A
Thank you

More Related Content

What's hot

Data Migration in Database
Data Migration in DatabaseData Migration in Database
Data Migration in DatabaseJingun Jung
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2Massimo Cenci
 
Case_Study_-_Advanced_Oracle_PLSQL
Case_Study_-_Advanced_Oracle_PLSQLCase_Study_-_Advanced_Oracle_PLSQL
Case_Study_-_Advanced_Oracle_PLSQLZiemowit Jankowski
 
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Massimo Cenci
 
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...Massimo Cenci
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Massimo Cenci
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questionsNaveen P
 
Tools, not only for Oracle RAC
Tools, not only for Oracle RACTools, not only for Oracle RAC
Tools, not only for Oracle RACMarkus Flechtner
 
Oracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi ThreadedOracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi ThreadedMarkus Flechtner
 
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Massimo Cenci
 
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
 
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Massimo Cenci
 
Whitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql ServerWhitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql ServerShahzad
 
Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1Massimo Cenci
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark Anyscale
 
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Massimo Cenci
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionDatabricks
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLéopold Gault
 

What's hot (20)

Data Migration in Database
Data Migration in DatabaseData Migration in Database
Data Migration in Database
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2
 
Case_Study_-_Advanced_Oracle_PLSQL
Case_Study_-_Advanced_Oracle_PLSQLCase_Study_-_Advanced_Oracle_PLSQL
Case_Study_-_Advanced_Oracle_PLSQL
 
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
 
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
 
Oracle DBA interview_questions
Oracle DBA interview_questionsOracle DBA interview_questions
Oracle DBA interview_questions
 
Tools, not only for Oracle RAC
Tools, not only for Oracle RACTools, not only for Oracle RAC
Tools, not only for Oracle RAC
 
Oracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi ThreadedOracle 12c Multi Process Multi Threaded
Oracle 12c Multi Process Multi Threaded
 
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
 
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
 
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
Data Warehouse and Business Intelligence - Recipe 4 - Staging area - how to v...
 
Whitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql ServerWhitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql Server
 
Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1Data Warehouse and Business Intelligence - Recipe 1
Data Warehouse and Business Intelligence - Recipe 1
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
 
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 Days
 

Viewers also liked

Oracle Cloud Café - Cloud backup et Disaster recovery
Oracle Cloud Café - Cloud backup et Disaster recoveryOracle Cloud Café - Cloud backup et Disaster recovery
Oracle Cloud Café - Cloud backup et Disaster recoverySorathaya Sirimanotham
 
Backup & recovery with rman
Backup & recovery with rmanBackup & recovery with rman
Backup & recovery with rmanitsabidhussain
 
10 Problems with your RMAN backup script
10 Problems with your RMAN backup script10 Problems with your RMAN backup script
10 Problems with your RMAN backup scriptYury Velikanov
 
Introduction to Oracle RMAN, backup and recovery tool.
Introduction to Oracle RMAN, backup and recovery tool.Introduction to Oracle RMAN, backup and recovery tool.
Introduction to Oracle RMAN, backup and recovery tool.guest5ac6fb
 
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud ServiceOracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud ServiceJean-Philippe PINTE
 
Manage and Monitor Oracle Applications in the Cloud
Manage and Monitor Oracle Applications in the CloudManage and Monitor Oracle Applications in the Cloud
Manage and Monitor Oracle Applications in the CloudBob Rhubart
 
Rman Presentation
Rman PresentationRman Presentation
Rman PresentationRick van Ek
 
Oracle Application Performance Monitoring Cloud Service 소개
Oracle Application Performance Monitoring Cloud Service 소개Oracle Application Performance Monitoring Cloud Service 소개
Oracle Application Performance Monitoring Cloud Service 소개Mee Nam Lee
 
Oracle vm Disaster Recovery Solutions
Oracle vm Disaster Recovery SolutionsOracle vm Disaster Recovery Solutions
Oracle vm Disaster Recovery SolutionsJohan Louwers
 
Virtual Compute Appliance Oracle IaaS
Virtual Compute Appliance Oracle IaaS Virtual Compute Appliance Oracle IaaS
Virtual Compute Appliance Oracle IaaS Fran Navarro
 
Oracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On Review
Oracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On ReviewOracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On Review
Oracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On ReviewRevelation Technologies
 
Cloud watch on hrms solutions
Cloud watch on hrms solutionsCloud watch on hrms solutions
Cloud watch on hrms solutionsCapgemini
 
Application Performance Cloud Service
Application Performance Cloud ServiceApplication Performance Cloud Service
Application Performance Cloud ServiceMee Nam Lee
 

Viewers also liked (14)

Oracle cloud portal
Oracle cloud portalOracle cloud portal
Oracle cloud portal
 
Oracle Cloud Café - Cloud backup et Disaster recovery
Oracle Cloud Café - Cloud backup et Disaster recoveryOracle Cloud Café - Cloud backup et Disaster recovery
Oracle Cloud Café - Cloud backup et Disaster recovery
 
Backup & recovery with rman
Backup & recovery with rmanBackup & recovery with rman
Backup & recovery with rman
 
10 Problems with your RMAN backup script
10 Problems with your RMAN backup script10 Problems with your RMAN backup script
10 Problems with your RMAN backup script
 
Introduction to Oracle RMAN, backup and recovery tool.
Introduction to Oracle RMAN, backup and recovery tool.Introduction to Oracle RMAN, backup and recovery tool.
Introduction to Oracle RMAN, backup and recovery tool.
 
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud ServiceOracle Cloud Storage Service & Oracle Database Backup Cloud Service
Oracle Cloud Storage Service & Oracle Database Backup Cloud Service
 
Manage and Monitor Oracle Applications in the Cloud
Manage and Monitor Oracle Applications in the CloudManage and Monitor Oracle Applications in the Cloud
Manage and Monitor Oracle Applications in the Cloud
 
Rman Presentation
Rman PresentationRman Presentation
Rman Presentation
 
Oracle Application Performance Monitoring Cloud Service 소개
Oracle Application Performance Monitoring Cloud Service 소개Oracle Application Performance Monitoring Cloud Service 소개
Oracle Application Performance Monitoring Cloud Service 소개
 
Oracle vm Disaster Recovery Solutions
Oracle vm Disaster Recovery SolutionsOracle vm Disaster Recovery Solutions
Oracle vm Disaster Recovery Solutions
 
Virtual Compute Appliance Oracle IaaS
Virtual Compute Appliance Oracle IaaS Virtual Compute Appliance Oracle IaaS
Virtual Compute Appliance Oracle IaaS
 
Oracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On Review
Oracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On ReviewOracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On Review
Oracle Compute Cloud Service vs. Amazon Web Services EC2 : A Hands-On Review
 
Cloud watch on hrms solutions
Cloud watch on hrms solutionsCloud watch on hrms solutions
Cloud watch on hrms solutions
 
Application Performance Cloud Service
Application Performance Cloud ServiceApplication Performance Cloud Service
Application Performance Cloud Service
 

Similar to Data-and-Compute-Intensive processing Use Case: Lucene Domain Index

ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf Conference
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsIvan Shcheklein
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologiesBat Programmer
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streamsRadu Tudoran
 
HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)akirahiguchi
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraDataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsEiti Kimura
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming OverviewBrian Hughes
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineChris Adkin
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architectureAjeet Singh
 
Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016Metosin Oy
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx政宏 张
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsSingleStore
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveIlyas F ☁☁☁
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Anyscale
 

Similar to Data-and-Compute-Intensive processing Use Case: Lucene Domain Index (20)

ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologies
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)HandlerSocket plugin for MySQL (English)
HandlerSocket plugin for MySQL (English)
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Informix Data Streaming Overview
Informix Data Streaming OverviewInformix Data Streaming Overview
Informix Data Streaming Overview
 
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow EngineScaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
Scaling out SSIS with Parallelism, Diving Deep Into The Dataflow Engine
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016Wieldy remote apis with Kekkonen - ClojureD 2016
Wieldy remote apis with Kekkonen - ClojureD 2016
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 

Recently uploaded

Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 

Recently uploaded (20)

Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 

Data-and-Compute-Intensive processing Use Case: Lucene Domain Index

  • 1. Oracle Open World Data-and-Compute-Intensive processing Use Case: Lucene Domain Index Marcelo F. Ochoa Fac. Cs. Exactas - UNICEN - Tandil - Argentina
  • 2. Agenda Data-and-Compute Intensive Search What is Lucene? What is Lucene Domain Index? Performance Application integration Demo Future plans
  • 3. Data-and-Compute Intensive Search Data-and-Compute Intensive Search Strategies Middle-tier-based search engines Google Appliance SES Nutch Solr Database-embedded search engines Oracle Text Lucene Domain Index (Lucene OJVM)
  • 4. Middle-tier-based Search Engines Benefits Simple (crawler mode) Medium complexity (Solr WS) Out off the box solution (crawler) No application code is necessary to integrate (crawler) Medium out off the box solution (Solr WS) Drawbacks Updated are slow (usually monthly, weekly) A lot a wasted traffic You can not index pages based on database information which requires login (crawler mode) Indexing tables requires triggers, batch process or a persistent layer to transfer modifications
  • 5. Database-embedded Search Engines Benefits Fastest update No extra coding its necessary, SQL access Ready to use to any language, PHP, Phyton, .Net You can index tables Changes are automatically notified No network traffic No network marshalling Drawbacks A little slow down of Java execution compared to a Sun JDK JVM
  • 6. What is Lucene? Open Source Information Retrieval (IR) Library with extensible APIs Top level Apache project Is the core component of Apache Solr and Nutch projects 100% Java Around 800 classes 47.000 lines of code 33.000 lines of test 78.000 lines at contrib area Can search and Index any textual data Can scales to millions of pages or records Provides fuzzy search, proximity search, range queries, ... Wildcards: single and multiple characters, anywhere in the search words
  • 7. What is Lucene Domain Index? An Embebed version of Lucene IR library running inside Oracle OJVM 37 new Java Classes and a new PLSQL Object Type A new domain index for Oracle Databases using Data Cartridge API (ODCI) A new Store implementation for Lucene (OJVMDirectory) which replaces a traditional filesystem by Secure BLOB Two new SQL operators lcontains() and lscore() An orthogonal/uptodate Lucene solution for any programming language, especially Java, Ruby, Python, PHP and .Net, currently latest production version - 2.3.2.
  • 8. Benefits Benefits added to Oracle Applications No network round trip for indexing Oracle tables A fault tolerant, transactional and scalable storage for Lucene inverted index Small Lucene index structure Support for IOT Support for indexing joined tables using default User Data Store Support for indexing virtual columns Support for order by, filter by and in-line pagination operations at Index level layer Support padding/formatting for Text/Date/Time/Number But more important than above is Easily to adapt for a new functionality
  • 9. Performance Test Suite Corpus: XML Spanish WikiPedia dump: Total documents: 1.056.163 - 2,67 Gb Average size per document: 2.533 bytes Lucene Index size: 10 BLOB/files 808Mb total. 5 fields (title,revisionDate,comment,text) Table structure (XMLDB): pages title:VARCHAR2 id:NUMBER pages_revisions id:NUMBER revisionDate:TIMESTAMP comment:VARCHAR2 text:CLOB 1 n ..
  • 10. Middle-tier-based approach Requires transfer all database table data to the middle tier A middle tier application performs this query: SELECT /*+ DYNAMIC_SAMPLING(0) RULE NOCACHE(PAGES) */ PAGES.rowid, extractValue(object_value,'/page/title') "title",extractValue(object_value, '/page/revision/comment') "comment",extract(object_value, '/page/revision/text/text()') "text",extractValue(object_value, '/page/revision/timestamp') "revisionDate" FROM ESWIKI.PAGES where PAGES.rowid in (select rid from (select rowid rid,rownum rn from ENWIKI.PAGES) where rn>=1 and rn<=300) For 300 rows SQLTrace reports: bytes sent via SQL*Net to client 3.245.358 bytes received via SQL*Net from client 1.785.912 SQL*Net roundtrips to/from client 2.383 Total indexing time for 33.912 rows 824 seconds SQL Application External Indexer
  • 11. Database-embedded approach Index Definition, Lucene Domain Index syntax: SQL> ALTER SESSION SET sql_trace=true; SQL> ALTER SESSION SET EVENTS '10046 trace name context forever, level 8'; SQL> create index pages_lidx_all on pages p (value(p)) indextype is Lucene.LuceneIndex parameters('PopulateIndex:false;DefaultColumn:text; SyncMode:Deferred;LogLevel:WARNING; Analyzer:org.apache.lucene.analysis.SpanishWikipediaAnalyzer; ExtraCols:extractValue(object_value,''/page/title'') "title", extractValue(object_value,''/page/revision/comment'') "comment", extract(object_value,''/page/revision/text/text()'') "text", extractValue(object_value,''/page/revision/timestamp'') "revisionDate"; FormatCols:revisionDate(day);IncludeMasterColumn:false; LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING');
  • 12. After creating an Index is necessary to submit changes for indexing This can be done using: DECLARE ridlist sys.ODCIRidList; BEGIN select rid BULK COLLECT INTO ridlist from (select rowid rid,rownum rn from pages) where rn>=1 and rn<=300; LuceneDomainIndex.enqueueChange(USER||'.PAGES_LIDX_ALL',ridlist,'insert'); END; For 300 rows SQLTrace reports: bytes sent via SQL*Net to client 1.301 bytes received via SQL*Net from client 1.354 SQL*Net roundtrips to/from client 4 Total indexing time for 33.912 rows 346 seconds Database-embedded approach Jav aJD BC Cal ls SQ Stored Procedure Call Application Client
  • 13. CPU / Network usage during indexing External indexing (824 s.) Integrated indexing (346 s.) User Sys CPU Load Nice User Sys CPU Load Nice Receiver Transmiter Network Server side Receiver Transmiter Network Client side Client side Server side
  • 14. CPU/IO for Database-embedded indexing This information was taken with SQL_TRACE=true indexing 3.000 rows inside OJVM. Most time is spent in full table scan In addition, middle-tier indexing of 3.000 rows would require sending 34Mb of data over the network. 58% 43,2%
  • 15. Application integration: Fuzzy searches Old implementation (without Lucene): select p.id,p.first_Name,p.last_Name,p.nationality,p.sex,p.type_Document, p.number_Document,p.civil_State,p.date_Birth,p.mail,g.organization_id , fnmatchperson(p.first_name, p.last_name, 'John Doe') as suma from person p left join (select * from guest where organization_id = 67) g on g.person_id = p.id where p.state = 1 and fnmatchperson(p.first_name, p.last_name, 'John Doe') >= 50 order by suma desc Lucene implementation: select /*+ DOMAIN_INDEX_SORT */ p.id,p.first_Name,p.last_Name,p.nationality,p.sex,p.type_Document, p.number_Document,p.civil_State,p.date_Birth,p.mail,g.organization_id , lscore(1) as suma from person p left join (select * from guest where organization_id = 67) g on g.person_id = p.id where p.state = 1 and lcontains(p.first_name, 'rownum:[1 TO 20] AND John~ Doe~',1) > 0 where "John Doe " is searched as "John~ Doe~ " to provide partial match. ~ Lucene operator uses Levenshtein Distance, or Edit Distance algorithm. http://en.wikipedia.org/wiki/Levenshtein_distance
  • 16. Execution plan for both queries fnMatch solution Lucene solution
  • 17. Key points Only one extra index is required: create index person_lidx on person(first_name) indextype is lucene.LuceneIndex parameters('SyncMode:OnLine;LogLevel:ALL; AutoTuneMemory:true;IncludeMasterColumn:false;DefaultOperator:OR; DefaultColumn:name_str;Analyzer:org.apache.lucene.analysis. SimpleAnalyzer; ExtraCols:first_name||'' ''||last_name "name_str"'); Simple to adapt, only one class was modified to provide partial match. String split[] = firstLastName.split(" "); sql3 = ""; for(int i = 0; i < split.length; i++){ sql3 += /*"'" +*/ split[i].toLowerCase().trim() + "~ "; } sql3 = sql3.substring(0, sql3.length()-1);//le quita la coma /*sql += ", fnmatchperson(p.first_name, p.last_name, " + sql3 + ") as suma ";*/ /*sql3 = " and fnmatchperson(p.first_name, p.last_name, " + sql3 + ") >= 50 " ;*/ sql3 = "and lcontains(p.first_name, 'rownum:[1 TO 20] AND "+sql3 +"',1) > 0 ";
  • 18. Key points, cont. Less network traffic In the above example, around of 20% of the rows are discarded by the filter operation "GUEST"."PERSON_ID"(+)="P"."ID" AND ORGANIZATION_ID"(+)=67" "P"."STATE"=1 In Solr implementation a new row on person table imply: N bytes of SQLNet +283 bytes for HTTP Post method Faster updates Compared to Solr approach we send 283 bytes less which means faster operations. Compared to middle tier approach, once a new row is added to the table it is ready to be included in the next query in the example shown this is critical constraint Minimal application code impact only a new index only a rewrite where condition is needed to replace fnMatch
  • 19. Future plans Add Faceted search, may be using ODCI aggregate functions or Pipeline Tables Strong committing to latest Lucene production release, once 2.4 version will be released, we will test inside OJVM Add ODCI Extensible Optimizer Interface to better dialogue with the Oracle SQL Engine A slave session which collects query from different parallel session to reduce memory foot print and to provides highest hit ratios A JMX interface to monitor Lucene Domain Index using Sun's JMX console
  • 20. Useful links Lucene Project: http://lucene.apache.org/java/docs/index.html Lucene Oracle Integration http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg Forum, Peer to Peer Support http://sourceforge.net/forum/forum.php?forum_id=187896 Download Binary Distribution (10g/11g) http://sourceforge.net/project/showfiles.php?group_id=56183&package_id=255524 CVS Access cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm http://dbprism.cvs.sourceforge.net/dbprism/ojvm/
  • 21. Q & A Thank you