SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Boston May 7-10 2012


         Integrating Lucene search
         engine into a transactional
               XML database

                                              Petr Pleshachkov, EMC
                                    petr.pleshachkov@emc.com, May 9, 2012




© Copyright 2012 EMC Corporation. All rights reserved.                      1
My Background
Ÿ Petr Pleshachkov, Principal Software Engineer
Ÿ xDB/xPlore team in Rotterdam
        –  My site: EMC Netherlands
        –  Other xPlore/xDB sites: Pleasanton (California),
           Shanghai (China), and Grenoble (France)

Ÿ Areas of expertise:
        –  Semistructured data management
        –  Databases: transaction management, query
           optimization, full-text search

Ÿ Academia & Research:
        –  PhD in Computer Science, ISP RAS


© Copyright 2012 EMC Corporation. All rights reserved.           2
Agenda
Ÿ Overview of EMC Documentum xDB/xPlore
Ÿ Integration of Lucene into xDB
Ÿ xDB transaction model & lucene transaction
   management
Ÿ Performance analysis
Ÿ Future optimizations




© Copyright 2012 EMC Corporation. All rights reserved.            3
Introducing Documentum xPlore
   •  EMC Documentum is a leading
      supplier of Enterprise Content
      Management software
   •  xPlore Provides ‘Integrated
      Search’ for Documentum
            –  but is built as a standalone search
               engine to replace FAST Instream
            –  Highly deployed across
               Documentum environments
               worldwide (over 70+ countries)
   •  xPlore Search Engine built over
      EMC xDB, Lucene, and leading
      content extraction and linguistic
      analysis software

© Copyright 2012 EMC Corporation. All rights reserved.   4
Key values which xDB brings for xPlore
Why build a search engine over an XML database?

                                            Ÿ  Flexible, hierarchical query & data
                                                models
                                            Ÿ  Joins
                                            Ÿ  High throughput, low-latency indexing
                                                     – See documents within secs after saving

                                            Ÿ  Leverage B-tree indexes when
                                                appropriate
                                                         – Lucene doesn’t fit all uses

                                            Ÿ  Rich, innovative query language
                                            Ÿ  Enterprise, single unified database

© Copyright 2012 EMC Corporation. All rights reserved.                                          5
Documentum xDB
Ÿ  Formerly XHive database
        –  100% Java
        –  XML stored in persistent DOM format
            ▪  Each XML node can be located through a 64 bit identifier
            ▪  Structure mapped to pages
            ▪  Easy to operate on GB XML files

Ÿ  Full Transactional Database
Ÿ  Query Language: XQuery
Ÿ  Indexing & Optimization
        –  Palette of index options optimizer can pick from
        –  At it simplest: indexLookup(key) -> node id

Ÿ  Backup/Restore, scalability, multi-node architecture



© Copyright 2012 EMC Corporation. All rights reserved.                    6
xDB Data Storage Model
                                                         An XML Document can be thought of
                                                         as a collection of elements,
                                                         attributes (or ‘xml nodes’)‫‏‬



                                                         A
                                                              B
                                                              C
    This node structure
                                                                     D
    can be represented as
                                                                     E
    a tree - DOM model


                                                                  Database      A B C D E
                                                                  page




© Copyright 2012 EMC Corporation. All rights reserved.                                       7
Libraries & Indexes

                                                                     = = xDB Library
                                                                       X-Hive Library
                             A
                                                                     = X-Hive Index
                                                                      = xDB Index
                                                                     = = xDB xml
                                                                       X-Hive xml file

                                      B                  C            file




Scope of index
covers all xml files in                                      A
all sub-libraries                                                            C


                                                                 B




© Copyright 2012 EMC Corporation. All rights reserved.                                   8
Lucene Integration
Ÿ Both value and full-text queries supported
        –  XML SubPaths mapped into lucene fields
        –  Tokenized and value based indexes
           available
Ÿ Composite key queries supported
        –  Lucene index is much more flexible than B-
           tree composite indexes




© Copyright 2012 EMC Corporation. All rights reserved.          9
Multipath Index Definition
                   <PLAY>
                    <ACT>
                     <SCENE>
                      <SPEECH>
                         <SPEAKER>BRUTUS</SPEAKER>
                         <LINE>I am not gamesome: I do lack some part</LINE>
                      </SPEECH>
                      <SPEECH>
                        <SPEAKER>CASSIUS</SPEAKER>
                        <LINE>Then, Brutus, I have much mistook your passion;</LINE>
                        <LINE>By means whereof this breast of mine hath buried</LINE>
                        <LINE>Thoughts of great value, worthy cogitations.</LINE>
                   </SPEECH>
                    </SCENE>
                   </ACT>
                   </PLAY>




                                          INDEX ROOT PATH: //SPEECH
          SubPath1: (/SPEAKER, VALUE_COMPARISON)
           SubPath2: (//LINE, FULL_TEXT_SEARCH)


© Copyright 2012 EMC Corporation. All rights reserved.                                  10
Lucene Query Mapping
 for $SPEECH score $s in collection(‘col1’)//
 SPEECH[SPEAKER=’BRUTUS’
                     and //LINE contains text ‘lack’]
 order by $s
 return $SPEECH




BooleanQuery (TermQuery1, TermQuery2, BooleanClause.Occur.MUST)

TermQuery1= TermQuery(new Term(‘/speaker_field’, ‘BRUTUS’))

TermQuery2=TermQuery(new Term(‘//line_field’, ‘lack’))




© Copyright 2012 EMC Corporation. All rights reserved.            11
Lucene SubIndexes
Ÿ Each user transaction creates a separate
   Lucene subIndex
Ÿ Transaction performs all the updates in its
   own index
Ÿ The delete operation does not physically
   touch subIndexes created by other
   transactions
Ÿ A pair (minLSN, maxLSN) is associated with
   each subIndex, which is used to construct a
   global index snapshot
.
© Copyright 2012 EMC Corporation. All rights reserved.      12
Blacklists
Ÿ The delete operation of transaction:
        –  Physically deletes document from
           transaction’s own subIndex
        –  Adds a pair (subIndexMinLSN, NODE_ID) to
           the blacklist structure
Ÿ The persistent blacklist structure is
   represented as xdb B-tree index with key =
   subIndexMinLSN, value=NODE_ID
Ÿ Periodically merge operation merges small
   subIndexes into bigger one and physically
   deletes documents.


© Copyright 2012 EMC Corporation. All rights reserved.                13
xDB transaction management
Ÿ ARIES-based ACID transactions
        –  Every page has a Log Sequence Number
           (pageLSN)
        –  Buffer manager tracks dirty pages using RecLSNs
        –  Log ALL updates on per page basis, including
           updates performed during rollbacks
        –  Periodically asynchronous thread runs checkpoint
           procedure
        –  The recovery procedure:
                 ▪  Repeat the history. Redo all the updates since the
                    last successful checkpoint
                 ▪  Undo not complete transactions



© Copyright 2012 EMC Corporation. All rights reserved.                   14
xDB transaction isolation
Ÿ READ_WRITE transaction follow two-phase-
   locking rule:
        –  Expanding phase: locks are acquired and no locks are
           released
        –  Shrinking phase: locks are released and no locks are
           acquired

Ÿ READ_ONLY transaction does not acquire any
   locks!
        –  The data snapshot at the moment of
           transaction start is used
        –  Using log records we undo recent changes
           on the page level

© Copyright 2012 EMC Corporation. All rights reserved.            15
How to integrate Lucene into
                       transactional xDB database ?
Ÿ Old Solution (xDB 10.1/10.2 releases)
        –  All lucene files are stored in separate directory
        –  New transaction model for lucene indexes is
           implemented
        –  Lucene does not use xDB buffer pool
        –  Backup/restore and replication do not use xDB
           mechanisms

Ÿ New Solution (xDB 10.3)
        –  All lucene files are stored in xDB data segment
        –  xDB transaction model is used since all the
           updates go through xDB data pages
        –  Backup/restore and replication are supported
           automatically


© Copyright 2012 EMC Corporation. All rights reserved.         16
Lucene Index Access Model
Ÿ  New LIDirectoryImpl class is implemented (extends
    Directory class)
Ÿ  LIDirectory class stores all files in xDB blob objects
Ÿ  LIIndexInput class extends BufferedIndexInput
        –  void readInternal(byte[] b, int offset, int len)
                 ▪  Reads data from the blob
                 ▪  The blob object is buffered on the xdb buffer management
                    level

Ÿ  LIIndexOutput class extends BufferedIndexOutput
     –  void flushBuffer(byte[] b, int offset, int len)
                 ▪  Writes lucene data to the blob object
                 ▪  The operation is logged automatically on the buffer manager
                    level


© Copyright 2012 EMC Corporation. All rights reserved.                            17
Lucene Index Access Model (con’t)
                                        Queries                             Indexer


                                         IndexReader                           IndexWriter

                                                         LIDirectoryImpl	
  
                                        LIIndexInput	
                     LIIndexOutput	
  

                                      readInternal                         flushBuffer	
  
                                                           Lucene Caches



                                                         buffered	
  data	
  pages


                                                         Lucene Blob Objects



© Copyright 2012 EMC Corporation. All rights reserved.                                         18
Lucene SubIndex Storage Model
  Directory page                                                LIDirectoryStore




                                                         LiFileEntryStore          LiFileEntryStore
                                                         	
  

                                                         	
  

       BlobStore page                                                                                  BlobStore page

                        Blob	
  Tail	
                                                    Blob	
  Tail	
  
                        	
                                                                	
  

                        	
                                                                	
  




       Blob                        Blob                         Blob               Blob                  Blob     Blob
       page                        page                         page               page                  page     page




© Copyright 2012 EMC Corporation. All rights reserved.                                                                   19
Lucene Index Master Record (MIR)
                                                                        •  Tracks information
                                                                           about all subindexes
         SI_1             SI_2              SI_3          …     SI_N       and their state
                                                                        •  Represented as a B-
                                                                           tree concurrent index
      Directory                Directory                                •  Used for lucene index
       object                   Object
                                                                           view construction
                                                         Blob objects   •  Updated concurrently
                                                                           by Ingest transactions
                                                                           and merging/cleaning
                                                                           tasks
                                                                        •  Periodically
                                                                           asynchronous tasks
                                                                           merges subIndexes
                                                                           into bigger one


© Copyright 2012 EMC Corporation. All rights reserved.                                              20
Ingest performance analysis
                                (in seconds)

 3000

 2500
                                                                                                2526.601

 2000                                                                                2149.636


 1500

 1000
                                                          1009.459 1015.937

   500
                   180.956       205.068
       0
                 Ingest 10000 docs                       Ingest 50000 docs          Ingest 100000 docs

                                            xDB 10.3 (pre-release)       xDB 10.2




© Copyright 2012 EMC Corporation. All rights reserved.                                                     21
Query performance analysis
                                   (response time in ms.)
16

14
                                                                                         14.013

12

10
                                                                           10.08

 8
                                               7.713
                        7.088
 6

 4

 2

 0
      Q1 serie: queries with range and 3 value                    Q2 serie: queries with full-text and 2
                comparison conditions                                 value-comparison conditions

                                              xDB 10.3 (pre-release)   xDB 10.2




© Copyright 2012 EMC Corporation. All rights reserved.                                                     22
Future optimizations
Ÿ Reduce number of separate subIndexes
Ÿ Final/NonFinal merge optimizations
Ÿ Advanced buffer management techniques
Ÿ Concurrent Lucene MultiPath Index




© Copyright 2012 EMC Corporation. All rights reserved.        23
Integrating Lucene into a Transactional XML Database

Más contenido relacionado

La actualidad más candente

Service Density By Xelerated At Linley Seminar
Service Density By Xelerated At Linley SeminarService Density By Xelerated At Linley Seminar
Service Density By Xelerated At Linley SeminarXelerated
 
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011Arun Gupta
 
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012Arun Gupta
 
Introduction tohd dvd-advcontents
Introduction tohd dvd-advcontentsIntroduction tohd dvd-advcontents
Introduction tohd dvd-advcontentsVasudevan Mukundan
 
Jfokus 2012 : The Java EE 7 Platform: Developing for the Cloud
Jfokus 2012 : The Java EE 7 Platform: Developing for the CloudJfokus 2012 : The Java EE 7 Platform: Developing for the Cloud
Jfokus 2012 : The Java EE 7 Platform: Developing for the CloudArun Gupta
 
Java EE 7: Developing for the Cloud at Geecon, JEEConf, Johannesburg
Java EE 7: Developing for the Cloud at Geecon, JEEConf, JohannesburgJava EE 7: Developing for the Cloud at Geecon, JEEConf, Johannesburg
Java EE 7: Developing for the Cloud at Geecon, JEEConf, JohannesburgArun Gupta
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on HadoopEMC
 
Implementing a JSR-283 Content Repository in PHP
Implementing a JSR-283 Content Repository in PHPImplementing a JSR-283 Content Repository in PHP
Implementing a JSR-283 Content Repository in PHPKarsten Dambekalns
 
Jfokus 2012: PaaSing a Java EE Application
Jfokus 2012: PaaSing a Java EE ApplicationJfokus 2012: PaaSing a Java EE Application
Jfokus 2012: PaaSing a Java EE ApplicationArun Gupta
 
Dave hay desktop single sign-on in an active directory world
Dave hay   desktop single sign-on in an active directory worldDave hay   desktop single sign-on in an active directory world
Dave hay desktop single sign-on in an active directory worldDave Hay
 
Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012Logica_hummingbird
 
Building Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiBuilding Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiCédric Hüsler
 
Web sphere user group march 2012 - desktop single sign-on in an active dire...
Web sphere user group   march 2012 - desktop single sign-on in an active dire...Web sphere user group   march 2012 - desktop single sign-on in an active dire...
Web sphere user group march 2012 - desktop single sign-on in an active dire...Dave Hay
 
01.egovFrame Training Book II
01.egovFrame Training Book II01.egovFrame Training Book II
01.egovFrame Training Book IIChuong Nguyen
 

La actualidad más candente (16)

Service Density By Xelerated At Linley Seminar
Service Density By Xelerated At Linley SeminarService Density By Xelerated At Linley Seminar
Service Density By Xelerated At Linley Seminar
 
Ta3
Ta3Ta3
Ta3
 
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011
Java EE / GlassFish Strategy & Roadmap @ JavaOne 2011
 
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012
 
Introduction tohd dvd-advcontents
Introduction tohd dvd-advcontentsIntroduction tohd dvd-advcontents
Introduction tohd dvd-advcontents
 
Jfokus 2012 : The Java EE 7 Platform: Developing for the Cloud
Jfokus 2012 : The Java EE 7 Platform: Developing for the CloudJfokus 2012 : The Java EE 7 Platform: Developing for the Cloud
Jfokus 2012 : The Java EE 7 Platform: Developing for the Cloud
 
Java EE 7: Developing for the Cloud at Geecon, JEEConf, Johannesburg
Java EE 7: Developing for the Cloud at Geecon, JEEConf, JohannesburgJava EE 7: Developing for the Cloud at Geecon, JEEConf, Johannesburg
Java EE 7: Developing for the Cloud at Geecon, JEEConf, Johannesburg
 
CloverETL Training Sample
CloverETL Training SampleCloverETL Training Sample
CloverETL Training Sample
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on Hadoop
 
Implementing a JSR-283 Content Repository in PHP
Implementing a JSR-283 Content Repository in PHPImplementing a JSR-283 Content Repository in PHP
Implementing a JSR-283 Content Repository in PHP
 
Jfokus 2012: PaaSing a Java EE Application
Jfokus 2012: PaaSing a Java EE ApplicationJfokus 2012: PaaSing a Java EE Application
Jfokus 2012: PaaSing a Java EE Application
 
Dave hay desktop single sign-on in an active directory world
Dave hay   desktop single sign-on in an active directory worldDave hay   desktop single sign-on in an active directory world
Dave hay desktop single sign-on in an active directory world
 
Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012Hummingbird - Open Source for Small Satellites - GSAW 2012
Hummingbird - Open Source for Small Satellites - GSAW 2012
 
Building Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGiBuilding Content Applications with JCR and OSGi
Building Content Applications with JCR and OSGi
 
Web sphere user group march 2012 - desktop single sign-on in an active dire...
Web sphere user group   march 2012 - desktop single sign-on in an active dire...Web sphere user group   march 2012 - desktop single sign-on in an active dire...
Web sphere user group march 2012 - desktop single sign-on in an active dire...
 
01.egovFrame Training Book II
01.egovFrame Training Book II01.egovFrame Training Book II
01.egovFrame Training Book II
 

Destacado

Lca seminar modified
Lca seminar modifiedLca seminar modified
Lca seminar modifiedInbok Lee
 
Gesture-aware remote controls: guidelines and interaction techniques
Gesture-aware remote controls: guidelines and interaction techniquesGesture-aware remote controls: guidelines and interaction techniques
Gesture-aware remote controls: guidelines and interaction techniquesDong-Bach Vo
 
Introduction to XML and Databases
Introduction to XML and DatabasesIntroduction to XML and Databases
Introduction to XML and Databasestorp42
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine FrameworkAppsterdam Milan
 
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiT
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiTComparing the Multimodal Interaction Technique Design of MINT with NiMMiT
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiTSebastian Feuerstack
 
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)Yusuke Iwasawa
 
Singletons in PHP - Why they are bad and how you can eliminate them from your...
Singletons in PHP - Why they are bad and how you can eliminate them from your...Singletons in PHP - Why they are bad and how you can eliminate them from your...
Singletons in PHP - Why they are bad and how you can eliminate them from your...go_oh
 
The system development life cycle (SDLC)
The system development life cycle (SDLC)The system development life cycle (SDLC)
The system development life cycle (SDLC)gourav kottawar
 
Masterizing php data structure 102
Masterizing php data structure 102Masterizing php data structure 102
Masterizing php data structure 102Patrick Allaert
 
Building and deploying PHP applications with Phing
Building and deploying PHP applications with PhingBuilding and deploying PHP applications with Phing
Building and deploying PHP applications with PhingMichiel Rook
 
Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learningAhmed Taha
 
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...Spark Summit
 
PHP 7 – What changed internally? (PHP Barcelona 2015)
PHP 7 – What changed internally? (PHP Barcelona 2015)PHP 7 – What changed internally? (PHP Barcelona 2015)
PHP 7 – What changed internally? (PHP Barcelona 2015)Nikita Popov
 
Biomolecular interaction analysis (BIA) techniques
Biomolecular interaction analysis (BIA) techniquesBiomolecular interaction analysis (BIA) techniques
Biomolecular interaction analysis (BIA) techniquesN Poorin
 

Destacado (20)

Lca seminar modified
Lca seminar modifiedLca seminar modified
Lca seminar modified
 
Gesture-aware remote controls: guidelines and interaction techniques
Gesture-aware remote controls: guidelines and interaction techniquesGesture-aware remote controls: guidelines and interaction techniques
Gesture-aware remote controls: guidelines and interaction techniques
 
Xml databases
Xml databasesXml databases
Xml databases
 
XML In My Database!
XML In My Database!XML In My Database!
XML In My Database!
 
XML Databases
XML DatabasesXML Databases
XML Databases
 
Introduction to XML and Databases
Introduction to XML and DatabasesIntroduction to XML and Databases
Introduction to XML and Databases
 
Java Search Engine Framework
Java Search Engine FrameworkJava Search Engine Framework
Java Search Engine Framework
 
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiT
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiTComparing the Multimodal Interaction Technique Design of MINT with NiMMiT
Comparing the Multimodal Interaction Technique Design of MINT with NiMMiT
 
Encoding survey
Encoding surveyEncoding survey
Encoding survey
 
Algorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftjAlgorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftj
 
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)
[DL Hacks輪読] Semi-Supervised Learning with Ladder Networks (NIPS2015)
 
Singletons in PHP - Why they are bad and how you can eliminate them from your...
Singletons in PHP - Why they are bad and how you can eliminate them from your...Singletons in PHP - Why they are bad and how you can eliminate them from your...
Singletons in PHP - Why they are bad and how you can eliminate them from your...
 
The system development life cycle (SDLC)
The system development life cycle (SDLC)The system development life cycle (SDLC)
The system development life cycle (SDLC)
 
Masterizing php data structure 102
Masterizing php data structure 102Masterizing php data structure 102
Masterizing php data structure 102
 
Building and deploying PHP applications with Phing
Building and deploying PHP applications with PhingBuilding and deploying PHP applications with Phing
Building and deploying PHP applications with Phing
 
Semi supervised learning
Semi supervised learningSemi supervised learning
Semi supervised learning
 
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
Extending Word2Vec for Performance and Semi-Supervised Learning-(Michael Mala...
 
System Development Life Cycle (SDLC) - Part II
System Development Life Cycle (SDLC) - Part IISystem Development Life Cycle (SDLC) - Part II
System Development Life Cycle (SDLC) - Part II
 
PHP 7 – What changed internally? (PHP Barcelona 2015)
PHP 7 – What changed internally? (PHP Barcelona 2015)PHP 7 – What changed internally? (PHP Barcelona 2015)
PHP 7 – What changed internally? (PHP Barcelona 2015)
 
Biomolecular interaction analysis (BIA) techniques
Biomolecular interaction analysis (BIA) techniquesBiomolecular interaction analysis (BIA) techniques
Biomolecular interaction analysis (BIA) techniques
 

Similar a Integrating Lucene into a Transactional XML Database

CRX Best practices
CRX Best practicesCRX Best practices
CRX Best practiceslisui0807
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issuesPal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issuesMustafa Jarrar
 
EMC #1 Open XML Database (OEM)
EMC #1 Open XML Database (OEM)EMC #1 Open XML Database (OEM)
EMC #1 Open XML Database (OEM)Mountaha
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Data Transformation using Semantic Web Standards
Data Transformation using Semantic Web StandardsData Transformation using Semantic Web Standards
Data Transformation using Semantic Web StandardsIrene Polikoff
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
Validation of Europeana data: application profile, OWL ontology, or else?
Validation of Europeana data: application profile, OWL ontology, or else?Validation of Europeana data: application profile, OWL ontology, or else?
Validation of Europeana data: application profile, OWL ontology, or else?Antoine Isaac
 
Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001jucaab
 
Big data processing using HPCC Systems Above and Beyond Hadoop
Big data processing using HPCC Systems Above and Beyond HadoopBig data processing using HPCC Systems Above and Beyond Hadoop
Big data processing using HPCC Systems Above and Beyond HadoopHPCC Systems
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoopTaldor Group
 
IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...
IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...
IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...Dave Hay
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...InfiniteGraph
 
Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
 Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nubeavanttic Consultoría Tecnológica
 

Similar a Integrating Lucene into a Transactional XML Database (20)

02introduction
02introduction02introduction
02introduction
 
CRX Best practices
CRX Best practicesCRX Best practices
CRX Best practices
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issuesPal gov.tutorial2.session12 2.architectural solutions for the integration issues
Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
 
EMC #1 Open XML Database (OEM)
EMC #1 Open XML Database (OEM)EMC #1 Open XML Database (OEM)
EMC #1 Open XML Database (OEM)
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Data Transformation using Semantic Web Standards
Data Transformation using Semantic Web StandardsData Transformation using Semantic Web Standards
Data Transformation using Semantic Web Standards
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
Validation of Europeana data: application profile, OWL ontology, or else?
Validation of Europeana data: application profile, OWL ontology, or else?Validation of Europeana data: application profile, OWL ontology, or else?
Validation of Europeana data: application profile, OWL ontology, or else?
 
Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001Ebs architecture con9036_pdf_9036_0001
Ebs architecture con9036_pdf_9036_0001
 
Catmandu / LibreCat Project
Catmandu / LibreCat ProjectCatmandu / LibreCat Project
Catmandu / LibreCat Project
 
Big data processing using HPCC Systems Above and Beyond Hadoop
Big data processing using HPCC Systems Above and Beyond HadoopBig data processing using HPCC Systems Above and Beyond Hadoop
Big data processing using HPCC Systems Above and Beyond Hadoop
 
Crx 2.2 Deep-Dive
Crx 2.2 Deep-DiveCrx 2.2 Deep-Dive
Crx 2.2 Deep-Dive
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoop
 
IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...
IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...
IBM Connections and Desktop Single Sign-On using Microsoft Active Directory, ...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
 
Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
 Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
Meetup Oracle Database MAD_BCN: 1.1 Servicios de Oracle Database en la nube
 

Más de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Más de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Último (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Integrating Lucene into a Transactional XML Database

  • 1. Boston May 7-10 2012 Integrating Lucene search engine into a transactional XML database Petr Pleshachkov, EMC petr.pleshachkov@emc.com, May 9, 2012 © Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2. My Background Ÿ Petr Pleshachkov, Principal Software Engineer Ÿ xDB/xPlore team in Rotterdam –  My site: EMC Netherlands –  Other xPlore/xDB sites: Pleasanton (California), Shanghai (China), and Grenoble (France) Ÿ Areas of expertise: –  Semistructured data management –  Databases: transaction management, query optimization, full-text search Ÿ Academia & Research: –  PhD in Computer Science, ISP RAS © Copyright 2012 EMC Corporation. All rights reserved. 2
  • 3. Agenda Ÿ Overview of EMC Documentum xDB/xPlore Ÿ Integration of Lucene into xDB Ÿ xDB transaction model & lucene transaction management Ÿ Performance analysis Ÿ Future optimizations © Copyright 2012 EMC Corporation. All rights reserved. 3
  • 4. Introducing Documentum xPlore •  EMC Documentum is a leading supplier of Enterprise Content Management software •  xPlore Provides ‘Integrated Search’ for Documentum –  but is built as a standalone search engine to replace FAST Instream –  Highly deployed across Documentum environments worldwide (over 70+ countries) •  xPlore Search Engine built over EMC xDB, Lucene, and leading content extraction and linguistic analysis software © Copyright 2012 EMC Corporation. All rights reserved. 4
  • 5. Key values which xDB brings for xPlore Why build a search engine over an XML database? Ÿ  Flexible, hierarchical query & data models Ÿ  Joins Ÿ  High throughput, low-latency indexing – See documents within secs after saving Ÿ  Leverage B-tree indexes when appropriate – Lucene doesn’t fit all uses Ÿ  Rich, innovative query language Ÿ  Enterprise, single unified database © Copyright 2012 EMC Corporation. All rights reserved. 5
  • 6. Documentum xDB Ÿ  Formerly XHive database –  100% Java –  XML stored in persistent DOM format ▪  Each XML node can be located through a 64 bit identifier ▪  Structure mapped to pages ▪  Easy to operate on GB XML files Ÿ  Full Transactional Database Ÿ  Query Language: XQuery Ÿ  Indexing & Optimization –  Palette of index options optimizer can pick from –  At it simplest: indexLookup(key) -> node id Ÿ  Backup/Restore, scalability, multi-node architecture © Copyright 2012 EMC Corporation. All rights reserved. 6
  • 7. xDB Data Storage Model An XML Document can be thought of as a collection of elements, attributes (or ‘xml nodes’)‫‏‬ A B C This node structure D can be represented as E a tree - DOM model Database A B C D E page © Copyright 2012 EMC Corporation. All rights reserved. 7
  • 8. Libraries & Indexes = = xDB Library X-Hive Library A = X-Hive Index = xDB Index = = xDB xml X-Hive xml file B C file Scope of index covers all xml files in A all sub-libraries C B © Copyright 2012 EMC Corporation. All rights reserved. 8
  • 9. Lucene Integration Ÿ Both value and full-text queries supported –  XML SubPaths mapped into lucene fields –  Tokenized and value based indexes available Ÿ Composite key queries supported –  Lucene index is much more flexible than B- tree composite indexes © Copyright 2012 EMC Corporation. All rights reserved. 9
  • 10. Multipath Index Definition <PLAY> <ACT> <SCENE> <SPEECH> <SPEAKER>BRUTUS</SPEAKER> <LINE>I am not gamesome: I do lack some part</LINE> </SPEECH> <SPEECH> <SPEAKER>CASSIUS</SPEAKER> <LINE>Then, Brutus, I have much mistook your passion;</LINE> <LINE>By means whereof this breast of mine hath buried</LINE> <LINE>Thoughts of great value, worthy cogitations.</LINE> </SPEECH> </SCENE> </ACT> </PLAY> INDEX ROOT PATH: //SPEECH SubPath1: (/SPEAKER, VALUE_COMPARISON) SubPath2: (//LINE, FULL_TEXT_SEARCH) © Copyright 2012 EMC Corporation. All rights reserved. 10
  • 11. Lucene Query Mapping for $SPEECH score $s in collection(‘col1’)// SPEECH[SPEAKER=’BRUTUS’ and //LINE contains text ‘lack’] order by $s return $SPEECH BooleanQuery (TermQuery1, TermQuery2, BooleanClause.Occur.MUST) TermQuery1= TermQuery(new Term(‘/speaker_field’, ‘BRUTUS’)) TermQuery2=TermQuery(new Term(‘//line_field’, ‘lack’)) © Copyright 2012 EMC Corporation. All rights reserved. 11
  • 12. Lucene SubIndexes Ÿ Each user transaction creates a separate Lucene subIndex Ÿ Transaction performs all the updates in its own index Ÿ The delete operation does not physically touch subIndexes created by other transactions Ÿ A pair (minLSN, maxLSN) is associated with each subIndex, which is used to construct a global index snapshot . © Copyright 2012 EMC Corporation. All rights reserved. 12
  • 13. Blacklists Ÿ The delete operation of transaction: –  Physically deletes document from transaction’s own subIndex –  Adds a pair (subIndexMinLSN, NODE_ID) to the blacklist structure Ÿ The persistent blacklist structure is represented as xdb B-tree index with key = subIndexMinLSN, value=NODE_ID Ÿ Periodically merge operation merges small subIndexes into bigger one and physically deletes documents. © Copyright 2012 EMC Corporation. All rights reserved. 13
  • 14. xDB transaction management Ÿ ARIES-based ACID transactions –  Every page has a Log Sequence Number (pageLSN) –  Buffer manager tracks dirty pages using RecLSNs –  Log ALL updates on per page basis, including updates performed during rollbacks –  Periodically asynchronous thread runs checkpoint procedure –  The recovery procedure: ▪  Repeat the history. Redo all the updates since the last successful checkpoint ▪  Undo not complete transactions © Copyright 2012 EMC Corporation. All rights reserved. 14
  • 15. xDB transaction isolation Ÿ READ_WRITE transaction follow two-phase- locking rule: –  Expanding phase: locks are acquired and no locks are released –  Shrinking phase: locks are released and no locks are acquired Ÿ READ_ONLY transaction does not acquire any locks! –  The data snapshot at the moment of transaction start is used –  Using log records we undo recent changes on the page level © Copyright 2012 EMC Corporation. All rights reserved. 15
  • 16. How to integrate Lucene into transactional xDB database ? Ÿ Old Solution (xDB 10.1/10.2 releases) –  All lucene files are stored in separate directory –  New transaction model for lucene indexes is implemented –  Lucene does not use xDB buffer pool –  Backup/restore and replication do not use xDB mechanisms Ÿ New Solution (xDB 10.3) –  All lucene files are stored in xDB data segment –  xDB transaction model is used since all the updates go through xDB data pages –  Backup/restore and replication are supported automatically © Copyright 2012 EMC Corporation. All rights reserved. 16
  • 17. Lucene Index Access Model Ÿ  New LIDirectoryImpl class is implemented (extends Directory class) Ÿ  LIDirectory class stores all files in xDB blob objects Ÿ  LIIndexInput class extends BufferedIndexInput –  void readInternal(byte[] b, int offset, int len) ▪  Reads data from the blob ▪  The blob object is buffered on the xdb buffer management level Ÿ  LIIndexOutput class extends BufferedIndexOutput –  void flushBuffer(byte[] b, int offset, int len) ▪  Writes lucene data to the blob object ▪  The operation is logged automatically on the buffer manager level © Copyright 2012 EMC Corporation. All rights reserved. 17
  • 18. Lucene Index Access Model (con’t) Queries Indexer IndexReader IndexWriter LIDirectoryImpl   LIIndexInput   LIIndexOutput   readInternal flushBuffer   Lucene Caches buffered  data  pages Lucene Blob Objects © Copyright 2012 EMC Corporation. All rights reserved. 18
  • 19. Lucene SubIndex Storage Model Directory page LIDirectoryStore LiFileEntryStore LiFileEntryStore     BlobStore page BlobStore page Blob  Tail   Blob  Tail           Blob Blob Blob Blob Blob Blob page page page page page page © Copyright 2012 EMC Corporation. All rights reserved. 19
  • 20. Lucene Index Master Record (MIR) •  Tracks information about all subindexes SI_1 SI_2 SI_3 … SI_N and their state •  Represented as a B- tree concurrent index Directory Directory •  Used for lucene index object Object view construction Blob objects •  Updated concurrently by Ingest transactions and merging/cleaning tasks •  Periodically asynchronous tasks merges subIndexes into bigger one © Copyright 2012 EMC Corporation. All rights reserved. 20
  • 21. Ingest performance analysis (in seconds) 3000 2500 2526.601 2000 2149.636 1500 1000 1009.459 1015.937 500 180.956 205.068 0 Ingest 10000 docs Ingest 50000 docs Ingest 100000 docs xDB 10.3 (pre-release) xDB 10.2 © Copyright 2012 EMC Corporation. All rights reserved. 21
  • 22. Query performance analysis (response time in ms.) 16 14 14.013 12 10 10.08 8 7.713 7.088 6 4 2 0 Q1 serie: queries with range and 3 value Q2 serie: queries with full-text and 2 comparison conditions value-comparison conditions xDB 10.3 (pre-release) xDB 10.2 © Copyright 2012 EMC Corporation. All rights reserved. 22
  • 23. Future optimizations Ÿ Reduce number of separate subIndexes Ÿ Final/NonFinal merge optimizations Ÿ Advanced buffer management techniques Ÿ Concurrent Lucene MultiPath Index © Copyright 2012 EMC Corporation. All rights reserved. 23