SlideShare una empresa de Scribd logo
1 de 85
Descargar para leer sin conexión
MontySolr:
                         Embedding CPython in Solr
                                   Roman Chyla, CERN
                            roman.chyla@cern.ch, May 26, 2011




Thursday, May 26, 2011
Why should I care?
       - Our challenge is to connect Python and Java
       - Without compromises
       - We created MontySolr extension
             -   Robust, tested (will be used by our system)
             -   But works for any Python application (eg. Django)
             -   And for any C/C++ app that Python understands!
             -   Open source (GPL v2)
       - Try it out!
             - https://github.com/romanchyla/montysolr



                                                                     2
Thursday, May 26, 2011
Outline

       ‣ Context
       - The Challenge
       - Key components
             - Available technologies
             - Our approach
             - Problems solved
       - Evaluation
       - Wrap-up



                                        3
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
CERN
       - European Organization for Nuclear Research
             - Switzerland, Geneva
       - The largest laboratory for High Energy Physics
       - Home to the Large Hadron Collider
       - 40-50K HEP scientists worldwide




                                                          4
Thursday, May 26, 2011
SPIRES
       - Stanford Linear Accelerator Center - SLAC
       - High-Energy Physics Literature Database
       - Started December 1991
             - The first web outside Europe/CERN
             - The first database on web




                                                     5
Thursday, May 26, 2011
SPIRES
       - Stanford Linear Accelerator Center - SLAC
       - High-Energy Physics Literature Database
       - Started December 1991
             - The first web outside Europe/CERN
             - The first database on web




                                                     5
Thursday, May 26, 2011
6
Thursday, May 26, 2011
7
Thursday, May 26, 2011
Invenio
       - Integrated digital library software behind INSPIRE
       - Used by very large institutional repositories
             - http://repositories.webometrics.info/toprep_inst.asp
       - Customizable virtual collections
       - Flexible management of metadata
             - 3 000 authors per article
       - Powerful search engine
             - Incl. citation map analysis
       - Written in Python (since 2001)
             - 290 000 lines of code

                                                                      8
Thursday, May 26, 2011
Outline

       - Context
       ‣ The Challenge
       - Key components
             - Available technologies
             - Our approach
             - Problems solved
       - Evaluation
       - Wrap-up



                                        9
Thursday, May 26, 2011
The Challenge
       - HEP scientific community
             - Searches metadata oriented
       - However fulltexts are changing the situation
       - And we want to provide even better service
             - Bigger volumes of data
             - NLP processing
             - Semantic search




                                                        10
Thursday, May 26, 2011
The Challenge




             Invenio




                         11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio




                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry




                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry


                             IDs: 1;2;3;9....




                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry


                             IDs: 1;2;3;9....




                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry


                             IDs: 1;2;3;9....




                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry


                             IDs: 1;2;3;9....




                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry
                                   1-6M IDs

                             IDs: 1;2;3;9....




                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry
                                   1-6M IDs

                             IDs: 1;2;3;9....


                                    1. only IDs,
                                    no score
                                    = no ranking

                                                      11
Thursday, May 26, 2011
The Challenge

              Query: supersymmetry AND author:ellis



             Invenio         fulltext:supersymmetry
                                   1-6M IDs

                             IDs: 1;2;3;9....

         2. score merging           1. only IDs,
         difficult (if               no score
         available)                 = no ranking

                                                      11
Thursday, May 26, 2011
The Challenge
                                            3. push IDs ?
              Query: supersymmetry AND author:ellis
                                            (eg._faceting)


             Invenio       fulltext:supersymmetry
                                 1-6M IDs

                            IDs: 1;2;3;9....

         2. score merging          1. only IDs,
         difficult (if              no score
         available)                = no ranking

                                                             11
Thursday, May 26, 2011
What is the “best” solution?
       - We love Python...
       - ...and our applications are written in Python...

       - But what if Solr is the master search engine?
       - Merge results inside Solr?
             - Typical size: 1-10 mil. IDs
             - Expected latency: 1-2 s.
       - What we want to achieve:
             - Fast transfer of hits from Invenio to Solr
             - Leverage the power of both (no compromises)
             - Developer-friendly integration, simplicity
                                                             12
Thursday, May 26, 2011
Outline

       - Context
       - The Challenge
       ‣ Key components
             - Available technologies
             - Our approach
             - Evaluation
       - Demonstration
       - Wrap-up



                                        13
Thursday, May 26, 2011
To embed Solr (in Java app)

       - Your app simulates Java web container?
             - use EmbeddedSolrServer
       - It knows nothing about Java servlets?
             - use DirectConnect class
       - Maybe we are too lazy?
             - Embed the web container (in my case Jetty)
             - Seemed strange (webserver inside webserver)
             - ... but it worked well



                                                             14
Thursday, May 26, 2011
To embed Solr (in Java app)

       - Your app simulates Java web container?
             - use EmbeddedSolrServer
       - It knows nothing about Java servlets?
             - use DirectConnect class
       - Maybe we are too lazy?
             - Embed the web container (in my case Jetty)
             - Seemed strange (webserver inside webserver)
             - ... but it worked well



                                                             14
Thursday, May 26, 2011
To embed Solr (in Java app)

       - Your app simulates Java web container?
             - use EmbeddedSolrServer
       - It knows nothing about Java servlets?
             - use DirectConnect class
       - Maybe we are too lazy?
             - Embed the web container (in my case Jetty)
             - Seemed strange (webserver inside webserver)
             - ... but it worked well



                                                             14
Thursday, May 26, 2011
To embed Solr (in Java app)

       - Your app simulates Java web container?
             - use EmbeddedSolrServer
       - It knows nothing about Java servlets?
             - use DirectConnect class
       - Maybe we are too lazy?
             - Embed the web container (in my case Jetty)
             - Seemed strange (webserver inside webserver)
             - ... but it worked well



                                                             14
Thursday, May 26, 2011
To embed Solr (in Java app)

       - Your app simulates Java web container?
             - use EmbeddedSolrServer
       - It knows nothing about Java servlets?
             - use DirectConnect class
       - Maybe we are too lazy?
             - Embed the web container (in my case Jetty)
             - Seemed strange (webserver inside webserver)
             - ... but it worked well



                                                             14
Thursday, May 26, 2011
To use Solr in non-Java app
       - Solr is already usable via HTTP requests, but we
         need something else here...
       - Remote objects/calls?
             - Pyro, execnet, CORBA, SOAP...
             - or simply pipes?
       - Access Python from Java?
             - Jython
             - JEPP
       - Access Java from Python?
             - JPype
             - JCC
                                                            15
Thursday, May 26, 2011
Jython?
       - Implementation of Python in 100% Java
       - Both Java and Python code
       - Truly multithreaded



       - C modules will not work
             - but see http://bit.ly/iTRYbb
       - Slower than CPython



                                                 16
Thursday, May 26, 2011
Jython?
       - Implementation of Python in 100% Java
       - Both Java and Python code
       - Truly multithreaded



       - C modules will not work
             - but see http://bit.ly/iTRYbb
       - Slower than CPython



                                                 17
Thursday, May 26, 2011
Jython?
       - Implementation of Python in 100% Java
       - Both Java and Python code
       - Truly multithreaded



       - C modules will not work
             - but see http://bit.ly/iTRYbb
       - Slower than CPython



                                                 17
Thursday, May 26, 2011
JEPP - Java Embedded Python
       - Python code runs inside
         Python interpreter
       - Embeds CPython interpreter
         via Java Native Interface
         (JNI) in Java
       - http://jepp.sourceforge.net/
             - recently updated (27-Jan)
             - but JCC is more active




                                           18
Thursday, May 26, 2011
JEPP - Java Embedded Python




                                     19
Thursday, May 26, 2011
JCC
       - Embeds JVM in Python
       - C++ code generator
       - C++ object interface
         wraps a Java library
       - C++ wrappers conform
         to Python's C type
         system
       - result: complete Python
         extension module



                                   20
Thursday, May 26, 2011
JCC




                         21
Thursday, May 26, 2011
JCC




                         21
Thursday, May 26, 2011
JCC




                         21
Thursday, May 26, 2011
To use Solr in non-Java app

                         Jython   JCC   JEPP

         Python                   ✓      ✓
         CModules
         Speed                    ✓      ?

         No code                  ✓      ✓
         changes
         Access from       ✓      ✓
         Python
         Access from       ✓      ...    ✓
         Java
                                               22
Thursday, May 26, 2011
The first try


                                Invenio


                         Solr




                                JCC



                                          23
Thursday, May 26, 2011
Devil is in details...




                                        24
Thursday, May 26, 2011
GIL - Global Interpreter Lock
                 Unfortunately Python webapp is not like Java...




                                                                   25
Thursday, May 26, 2011
GIL - Global Interpreter Lock




       We can have 200 threads, but only 4 will run at time...
                                                                 26
Thursday, May 26, 2011
GIL - Global Interpreter Lock




                                       27
Thursday, May 26, 2011
Fortunately solution exists
       - JCC can embed Python inside Java
             - Special thanks to Andi Vajda! (JCC creator)
       - We write ‘empty’ classes in Java ...
       - ... and implement them in Python




            Python /w Java inside            Java /w Python inside   28
Thursday, May 26, 2011
The second try

                                 Solr /w Invenio
                 Invenio           (backend)
                frontend

                           XML




                                              JCC


                                                    29
Thursday, May 26, 2011
Implementing the bridge
       - Special Java class
       - With method pythonExtension()
       - Native method pythonDecRef()
             - JCC provides its implementation
       - And number of other native methods
             - These will be implemented using Python
       - Like writing JNI Java/C code but without
         compilation...




                                                        30
Thursday, May 26, 2011
MontySolr extension
       - JCC has great potential, but also added
         complexity...
       - So the MontySolr project was born
             - Modules must be built in shared mode
             - JCC dynamic library loaded and started from the main
               thread
             - Simple mechanism of the Python bridge and message
             - Configurable handlers on the Python side
             - Secured dereferencing of the native objects
             - Threading on the Java side
             - Multiprocessing on the Python side
             - Easy ant targets (compilation) ...
                                                                      31
Thursday, May 26, 2011
Hello World - Java part
      public class MontySolrBridge extends BasicBridge implements
      PythonBridge {
      	   private long pythonObject;
      	   public void pythonExtension(long pythonObject) {
      	   	   this.pythonObject = pythonObject;
      	   }
      	   public long pythonExtension() {
      	   	   return this.pythonObject;
      	   }
      	   public void finalize() throws Throwable {
      	   	   pythonDecRef();
      	   }
      	   public native void pythonDecRef();
      	   public void sendMessage(PythonMessage message) {
      	   	   PythonVM vm = PythonVM.get();
      	   	   vm.acquireThreadState();
      	   	   receive_message(message);
      	   	   vm.releaseThreadState();
      	   }
      	   public native void receive_message(PythonMessage message);
      }                                                                32
Thursday, May 26, 2011
Hello World - Python part

      from montysolr import MontySolrBridge

      class SimpleBridge(MontySolrBridge):

             def __init__(self):
                 super(SimpleBridge, self).__init__()

             def receive_message(self, message):
                 query = message.getParam(‘query’)
                 message.setResults(‘Hello world!’)
                 print ‘Python received from Java:’, query




                                                             33
Thursday, May 26, 2011
Example - running MontySolr
       - Java side
             - JRE (32/64 bit)
             - Standard Solr/Lucene jars
             - JCC dynamic library
       - Python side
             - Python interpreter (32/64 bit)
             - 4 Python modules (jcc, solr, lucene, montysolr)
       - In the main thread
             - First we load JCC
             - Then start Python interpreter ...
             - ... load Python handlers

                                                                 34
Thursday, May 26, 2011
Solr as search service

                                 Solr /w Invenio
                 Invenio           (backend)
                frontend

                           XML




                                              JCC


                                                    35
Thursday, May 26, 2011
Example

                         Solr




              MyCustom
               Handler




                                36
Thursday, May 26, 2011
Example
            refersto:author:ellis
                                    Solr




              MyCustom
               Handler




                                           37
Thursday, May 26, 2011
Example - Solr custom handler

      	     MontySolrVM.INSTANCE.sendMessage(message);
      	
      	     PythonMessage msg = MontySolrVM.INSTANCE
      	     	 .createMessage("perform_search")
      	     	 .setSender("Invenio")
      	     	 .setParam("query","refersto:author:ellis");

      	     MontySolrVM.INSTANCE.sendMessage(msg);
      	     Object result = msg.getResults();
      	     if (result != null) {
      	     	 int[] hits = (int[]) message.getResults();
      	
      	     }


                                                            38
Thursday, May 26, 2011
Example - JNI connection
            refersto:author:ellis
                                             Solr




              MyCustom              Python
               Handler              Bridge




                                                    39
Thursday, May 26, 2011
Example - JNI connection
            refersto:author:ellis
                                                      Solr




              MyCustom              Python Invenio
               Handler              Bridge wrappers




                                                             40
Thursday, May 26, 2011
Example - Python side

                   # handler is made ‘visible’ at startup
                   SolrpieTarget('Invenio:perform_search',
                        perform_search)



                   # search time - called from Java
                   def perform_search(message):
                       query = message.getParam(“query”)
                       hits = call_real_search(query)
                       # cast Python list into Java array
                       message.setResults(JArray_ints(hits))




                                                               41
Thursday, May 26, 2011
Example
            refersto:author:ellis
                                                           Solr

                                                      Invenio


                                                      Invenio
              MyCustom              Python Invenio
               Handler              Bridge wrappers
                                                      Invenio


                                                      Invenio



                                                                  42
Thursday, May 26, 2011
Example - Java side again

            MontySolrVM.INSTANCE.sendMessage(message);
      	     	
      	     PythonMessage msg = MontySolrVM.INSTANCE
      	     	 .createMessage("perform_search")
      	     	 .setSender("Invenio")
      	     	 .setParam("query","refersto:author:ellis");

      	     MontySolrVM.INSTANCE.sendMessage(msg);
      	     Object result = msg.getResults();
      	     if (result != null) {
      	     	 int[] hits = (int[]) message.getResults();
      	
      	     }


                                                            43
Thursday, May 26, 2011
Solr as search service

                                Solr /w Invenio
               Apache             (backend)
              webserver

                          XML

                                 Invenio
               Invenio


                                                  JCC


                                                        44
Thursday, May 26, 2011
Outline

       - Context
       - The Challenge
       - Key components
             - Available technologies
             - Our approach
             - Problems solved
       ‣ Evaluation
       - Wrap-up



                                        45
Thursday, May 26, 2011
Memory and garbage collection




                                       46
Thursday, May 26, 2011
Comparing speed and load...




                                     47
Thursday, May 26, 2011
The effect of cache




                             48
Thursday, May 26, 2011
Robust?
       - Extensive siege tests show very good
         performance and stability under high load
             - 100-200 users, complex searches
             - 50 concurrent users, citation analysis
             - JCC incurs small overhead
       - We detected no memory leaks
             - The same as dbpedia.org
       - But watch out for errors in C
             - An error in C module brings down the whole JVM
             - (errors in pure Python module can be handled)


                                                                49
Thursday, May 26, 2011
Easy to develop/maintain?
       - Added complexity
             - Java in the toolbox
             - Need to compile C++ extensions
             - Python/OS version dependencies


       - For this we get
             -   Easy integration with Invenio
             -   The best of two applications
             -   A lot of features for free
             -   And we can control Solr from Python!


                                                        50
Thursday, May 26, 2011
Outline

       - Context
       - The Challenge
       - Key components
             - Available technologies
             - Our approach
             - Problems solved
       - Evaluation
       ‣ Wrap-up



                                        51
Thursday, May 26, 2011
Wrap-up
       - Our challenge was to connect two different
         languages/systems
       - And we wanted to get the best of the two...
             - So we had to plug Python into Solr
             - And now our Solr knows citation analysis!
       - We created MontySolr extension
             -   Robust, tested (will be used by INSPIRE)
             -   Works for any Python application (eg. Django)
             -   And for any C/C++ app that Python understands!
             -   Free software license
       - Try it out! Help us make it better!
             - https://github.com/romanchyla/montysolr
                                                                  52
Thursday, May 26, 2011
Questions?
       - MontySolr
             - https://github.com/romanchyla/montysolr

       - Roman Chyla
             -   Fellow, CERN Scientific Information Service
             -   roman.chyla@cern.ch
             -   @rchyla
             -   https://svnweb.cern.ch/trac/rcarepo




Thursday, May 26, 2011
Additional information




                                        54
Thursday, May 26, 2011
Links
       - Invenio platform
             - http://invenio-software.org/
       - INSPIRE Digital library
             - http://inspirebeta.net/
       - Diagrams of JCC and JEPP
             - Andreas Schreiber : Mixing Java and Python
             - http://www.slideshare.net/onyame/mixing-python-and-
               java
       - On Jython C Extension API
             - http://stackoverflow.com/questions/3097466/using-
               numpy-and-cpython-with-jython
       - Demo of a running service:
             - http://insdev01.cern.ch                               55
Thursday, May 26, 2011
#1 - How to embed Solr (standard)
       - solr.client.solrj.embedded.EmbeddedSolrServer




                                                         56
Thursday, May 26, 2011
#2 - How to embed Solr (simplified)
       - solr.servlet.DirectSolrConnection
       - like previous, but simpler
       - all the queries are sent as strings, everything is
         just a string
       - very flexible and probably suitable for quick
         integration




                                                              57
Thursday, May 26, 2011
#2 - How to embed Solr (simplified)
       - solr.servlet.DirectSolrConnection
       - like previous, but simpler
       - all the queries are sent as strings, everything is
         just a string
       - very flexible and probably suitable for quick
         integration




                                                              57
Thursday, May 26, 2011
#3 - Example of a Solr custom handler




                                               58
Thursday, May 26, 2011
#4 - Example Python handler




                                     59
Thursday, May 26, 2011

Más contenido relacionado

Similar a Cpython embedded in solr - By Roman Chyla

Keynote: from publisher to platform, How The Guardian Embraced the Internet ...
 Keynote: from publisher to platform, How The Guardian Embraced the Internet ... Keynote: from publisher to platform, How The Guardian Embraced the Internet ...
Keynote: from publisher to platform, How The Guardian Embraced the Internet ...lucenerevolution
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesreesBoston Consulting Group
 
Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011Nick Jankowski
 
Metaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open ProblemsMetaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open ProblemsXin-She Yang
 
Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...
Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...
Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...Nick Jankowski
 
Chap1 intro to-accelerators_final
Chap1 intro to-accelerators_finalChap1 intro to-accelerators_final
Chap1 intro to-accelerators_finalSanjay Dubey
 
Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.Proyecto CeVALE2
 
KCL CeRch presentation, enhanced scholarly publications, 27march2012
KCL CeRch presentation, enhanced scholarly publications, 27march2012KCL CeRch presentation, enhanced scholarly publications, 27march2012
KCL CeRch presentation, enhanced scholarly publications, 27march2012Nick Jankowski
 
Philosophy of the Web - Alexandre Monnin
Philosophy of the Web - Alexandre MonninPhilosophy of the Web - Alexandre Monnin
Philosophy of the Web - Alexandre Monninwebscience-montpellier
 
Large hadron collider
Large hadron colliderLarge hadron collider
Large hadron collideranoop kp
 
Science Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkScience Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkJean-Claude Bradley
 

Similar a Cpython embedded in solr - By Roman Chyla (14)

Keynote: from publisher to platform, How The Guardian Embraced the Internet ...
 Keynote: from publisher to platform, How The Guardian Embraced the Internet ... Keynote: from publisher to platform, How The Guardian Embraced the Internet ...
Keynote: from publisher to platform, How The Guardian Embraced the Internet ...
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
 
Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011Slides, SURF presentation, nj&ct, draft5, 14 mar2011
Slides, SURF presentation, nj&ct, draft5, 14 mar2011
 
Metaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open ProblemsMetaheuristic Optimization: Algorithm Analysis and Open Problems
Metaheuristic Optimization: Algorithm Analysis and Open Problems
 
New Energy Part 3: The Science - Introduction
New Energy Part 3: The Science - IntroductionNew Energy Part 3: The Science - Introduction
New Energy Part 3: The Science - Introduction
 
Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...
Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...
Jankowski, KCL CeRch presentation, enhanced scholarly publications, with note...
 
Chap1 intro to-accelerators_final
Chap1 intro to-accelerators_finalChap1 intro to-accelerators_final
Chap1 intro to-accelerators_final
 
Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.
 
Digitally enabling the RSC archive
Digitally enabling the RSC archiveDigitally enabling the RSC archive
Digitally enabling the RSC archive
 
KCL CeRch presentation, enhanced scholarly publications, 27march2012
KCL CeRch presentation, enhanced scholarly publications, 27march2012KCL CeRch presentation, enhanced scholarly publications, 27march2012
KCL CeRch presentation, enhanced scholarly publications, 27march2012
 
Philosophy of the Web - Alexandre Monnin
Philosophy of the Web - Alexandre MonninPhilosophy of the Web - Alexandre Monnin
Philosophy of the Web - Alexandre Monnin
 
Large hadron collider
Large hadron colliderLarge hadron collider
Large hadron collider
 
Science Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkScience Commons Open Notebook Science Talk
Science Commons Open Notebook Science Talk
 
Webscience montpellier
Webscience montpellierWebscience montpellier
Webscience montpellier
 

Más de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Más de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Último

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Cpython embedded in solr - By Roman Chyla

  • 1. MontySolr: Embedding CPython in Solr Roman Chyla, CERN roman.chyla@cern.ch, May 26, 2011 Thursday, May 26, 2011
  • 2. Why should I care? - Our challenge is to connect Python and Java - Without compromises - We created MontySolr extension - Robust, tested (will be used by our system) - But works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Open source (GPL v2) - Try it out! - https://github.com/romanchyla/montysolr 2 Thursday, May 26, 2011
  • 3. Outline ‣ Context - The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation - Wrap-up 3 Thursday, May 26, 2011
  • 4. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 5. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 6. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 7. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 8. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 9. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 10. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 11. CERN - European Organization for Nuclear Research - Switzerland, Geneva - The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide 4 Thursday, May 26, 2011
  • 12. SPIRES - Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991 - The first web outside Europe/CERN - The first database on web 5 Thursday, May 26, 2011
  • 13. SPIRES - Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991 - The first web outside Europe/CERN - The first database on web 5 Thursday, May 26, 2011
  • 16. Invenio - Integrated digital library software behind INSPIRE - Used by very large institutional repositories - http://repositories.webometrics.info/toprep_inst.asp - Customizable virtual collections - Flexible management of metadata - 3 000 authors per article - Powerful search engine - Incl. citation map analysis - Written in Python (since 2001) - 290 000 lines of code 8 Thursday, May 26, 2011
  • 17. Outline - Context ‣ The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation - Wrap-up 9 Thursday, May 26, 2011
  • 18. The Challenge - HEP scientific community - Searches metadata oriented - However fulltexts are changing the situation - And we want to provide even better service - Bigger volumes of data - NLP processing - Semantic search 10 Thursday, May 26, 2011
  • 19. The Challenge Invenio 11 Thursday, May 26, 2011
  • 20. The Challenge Query: supersymmetry AND author:ellis Invenio 11 Thursday, May 26, 2011
  • 21. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 11 Thursday, May 26, 2011
  • 22. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11 Thursday, May 26, 2011
  • 23. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11 Thursday, May 26, 2011
  • 24. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11 Thursday, May 26, 2011
  • 25. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry IDs: 1;2;3;9.... 11 Thursday, May 26, 2011
  • 26. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 11 Thursday, May 26, 2011
  • 27. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 1. only IDs, no score = no ranking 11 Thursday, May 26, 2011
  • 28. The Challenge Query: supersymmetry AND author:ellis Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 2. score merging 1. only IDs, difficult (if no score available) = no ranking 11 Thursday, May 26, 2011
  • 29. The Challenge 3. push IDs ? Query: supersymmetry AND author:ellis (eg._faceting) Invenio fulltext:supersymmetry 1-6M IDs IDs: 1;2;3;9.... 2. score merging 1. only IDs, difficult (if no score available) = no ranking 11 Thursday, May 26, 2011
  • 30. What is the “best” solution? - We love Python... - ...and our applications are written in Python... - But what if Solr is the master search engine? - Merge results inside Solr? - Typical size: 1-10 mil. IDs - Expected latency: 1-2 s. - What we want to achieve: - Fast transfer of hits from Invenio to Solr - Leverage the power of both (no compromises) - Developer-friendly integration, simplicity 12 Thursday, May 26, 2011
  • 31. Outline - Context - The Challenge ‣ Key components - Available technologies - Our approach - Evaluation - Demonstration - Wrap-up 13 Thursday, May 26, 2011
  • 32. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14 Thursday, May 26, 2011
  • 33. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14 Thursday, May 26, 2011
  • 34. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14 Thursday, May 26, 2011
  • 35. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14 Thursday, May 26, 2011
  • 36. To embed Solr (in Java app) - Your app simulates Java web container? - use EmbeddedSolrServer - It knows nothing about Java servlets? - use DirectConnect class - Maybe we are too lazy? - Embed the web container (in my case Jetty) - Seemed strange (webserver inside webserver) - ... but it worked well 14 Thursday, May 26, 2011
  • 37. To use Solr in non-Java app - Solr is already usable via HTTP requests, but we need something else here... - Remote objects/calls? - Pyro, execnet, CORBA, SOAP... - or simply pipes? - Access Python from Java? - Jython - JEPP - Access Java from Python? - JPype - JCC 15 Thursday, May 26, 2011
  • 38. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 16 Thursday, May 26, 2011
  • 39. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 17 Thursday, May 26, 2011
  • 40. Jython? - Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded - C modules will not work - but see http://bit.ly/iTRYbb - Slower than CPython 17 Thursday, May 26, 2011
  • 41. JEPP - Java Embedded Python - Python code runs inside Python interpreter - Embeds CPython interpreter via Java Native Interface (JNI) in Java - http://jepp.sourceforge.net/ - recently updated (27-Jan) - but JCC is more active 18 Thursday, May 26, 2011
  • 42. JEPP - Java Embedded Python 19 Thursday, May 26, 2011
  • 43. JCC - Embeds JVM in Python - C++ code generator - C++ object interface wraps a Java library - C++ wrappers conform to Python's C type system - result: complete Python extension module 20 Thursday, May 26, 2011
  • 44. JCC 21 Thursday, May 26, 2011
  • 45. JCC 21 Thursday, May 26, 2011
  • 46. JCC 21 Thursday, May 26, 2011
  • 47. To use Solr in non-Java app Jython JCC JEPP Python ✓ ✓ CModules Speed ✓ ? No code ✓ ✓ changes Access from ✓ ✓ Python Access from ✓ ... ✓ Java 22 Thursday, May 26, 2011
  • 48. The first try Invenio Solr JCC 23 Thursday, May 26, 2011
  • 49. Devil is in details... 24 Thursday, May 26, 2011
  • 50. GIL - Global Interpreter Lock Unfortunately Python webapp is not like Java... 25 Thursday, May 26, 2011
  • 51. GIL - Global Interpreter Lock We can have 200 threads, but only 4 will run at time... 26 Thursday, May 26, 2011
  • 52. GIL - Global Interpreter Lock 27 Thursday, May 26, 2011
  • 53. Fortunately solution exists - JCC can embed Python inside Java - Special thanks to Andi Vajda! (JCC creator) - We write ‘empty’ classes in Java ... - ... and implement them in Python Python /w Java inside Java /w Python inside 28 Thursday, May 26, 2011
  • 54. The second try Solr /w Invenio Invenio (backend) frontend XML JCC 29 Thursday, May 26, 2011
  • 55. Implementing the bridge - Special Java class - With method pythonExtension() - Native method pythonDecRef() - JCC provides its implementation - And number of other native methods - These will be implemented using Python - Like writing JNI Java/C code but without compilation... 30 Thursday, May 26, 2011
  • 56. MontySolr extension - JCC has great potential, but also added complexity... - So the MontySolr project was born - Modules must be built in shared mode - JCC dynamic library loaded and started from the main thread - Simple mechanism of the Python bridge and message - Configurable handlers on the Python side - Secured dereferencing of the native objects - Threading on the Java side - Multiprocessing on the Python side - Easy ant targets (compilation) ... 31 Thursday, May 26, 2011
  • 57. Hello World - Java part public class MontySolrBridge extends BasicBridge implements PythonBridge { private long pythonObject; public void pythonExtension(long pythonObject) { this.pythonObject = pythonObject; } public long pythonExtension() { return this.pythonObject; } public void finalize() throws Throwable { pythonDecRef(); } public native void pythonDecRef(); public void sendMessage(PythonMessage message) { PythonVM vm = PythonVM.get(); vm.acquireThreadState(); receive_message(message); vm.releaseThreadState(); } public native void receive_message(PythonMessage message); } 32 Thursday, May 26, 2011
  • 58. Hello World - Python part from montysolr import MontySolrBridge class SimpleBridge(MontySolrBridge): def __init__(self): super(SimpleBridge, self).__init__() def receive_message(self, message): query = message.getParam(‘query’) message.setResults(‘Hello world!’) print ‘Python received from Java:’, query 33 Thursday, May 26, 2011
  • 59. Example - running MontySolr - Java side - JRE (32/64 bit) - Standard Solr/Lucene jars - JCC dynamic library - Python side - Python interpreter (32/64 bit) - 4 Python modules (jcc, solr, lucene, montysolr) - In the main thread - First we load JCC - Then start Python interpreter ... - ... load Python handlers 34 Thursday, May 26, 2011
  • 60. Solr as search service Solr /w Invenio Invenio (backend) frontend XML JCC 35 Thursday, May 26, 2011
  • 61. Example Solr MyCustom Handler 36 Thursday, May 26, 2011
  • 62. Example refersto:author:ellis Solr MyCustom Handler 37 Thursday, May 26, 2011
  • 63. Example - Solr custom handler MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 38 Thursday, May 26, 2011
  • 64. Example - JNI connection refersto:author:ellis Solr MyCustom Python Handler Bridge 39 Thursday, May 26, 2011
  • 65. Example - JNI connection refersto:author:ellis Solr MyCustom Python Invenio Handler Bridge wrappers 40 Thursday, May 26, 2011
  • 66. Example - Python side # handler is made ‘visible’ at startup SolrpieTarget('Invenio:perform_search', perform_search) # search time - called from Java def perform_search(message): query = message.getParam(“query”) hits = call_real_search(query) # cast Python list into Java array message.setResults(JArray_ints(hits)) 41 Thursday, May 26, 2011
  • 67. Example refersto:author:ellis Solr Invenio Invenio MyCustom Python Invenio Handler Bridge wrappers Invenio Invenio 42 Thursday, May 26, 2011
  • 68. Example - Java side again MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis"); MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); } 43 Thursday, May 26, 2011
  • 69. Solr as search service Solr /w Invenio Apache (backend) webserver XML Invenio Invenio JCC 44 Thursday, May 26, 2011
  • 70. Outline - Context - The Challenge - Key components - Available technologies - Our approach - Problems solved ‣ Evaluation - Wrap-up 45 Thursday, May 26, 2011
  • 71. Memory and garbage collection 46 Thursday, May 26, 2011
  • 72. Comparing speed and load... 47 Thursday, May 26, 2011
  • 73. The effect of cache 48 Thursday, May 26, 2011
  • 74. Robust? - Extensive siege tests show very good performance and stability under high load - 100-200 users, complex searches - 50 concurrent users, citation analysis - JCC incurs small overhead - We detected no memory leaks - The same as dbpedia.org - But watch out for errors in C - An error in C module brings down the whole JVM - (errors in pure Python module can be handled) 49 Thursday, May 26, 2011
  • 75. Easy to develop/maintain? - Added complexity - Java in the toolbox - Need to compile C++ extensions - Python/OS version dependencies - For this we get - Easy integration with Invenio - The best of two applications - A lot of features for free - And we can control Solr from Python! 50 Thursday, May 26, 2011
  • 76. Outline - Context - The Challenge - Key components - Available technologies - Our approach - Problems solved - Evaluation ‣ Wrap-up 51 Thursday, May 26, 2011
  • 77. Wrap-up - Our challenge was to connect two different languages/systems - And we wanted to get the best of the two... - So we had to plug Python into Solr - And now our Solr knows citation analysis! - We created MontySolr extension - Robust, tested (will be used by INSPIRE) - Works for any Python application (eg. Django) - And for any C/C++ app that Python understands! - Free software license - Try it out! Help us make it better! - https://github.com/romanchyla/montysolr 52 Thursday, May 26, 2011
  • 78. Questions? - MontySolr - https://github.com/romanchyla/montysolr - Roman Chyla - Fellow, CERN Scientific Information Service - roman.chyla@cern.ch - @rchyla - https://svnweb.cern.ch/trac/rcarepo Thursday, May 26, 2011
  • 79. Additional information 54 Thursday, May 26, 2011
  • 80. Links - Invenio platform - http://invenio-software.org/ - INSPIRE Digital library - http://inspirebeta.net/ - Diagrams of JCC and JEPP - Andreas Schreiber : Mixing Java and Python - http://www.slideshare.net/onyame/mixing-python-and- java - On Jython C Extension API - http://stackoverflow.com/questions/3097466/using- numpy-and-cpython-with-jython - Demo of a running service: - http://insdev01.cern.ch 55 Thursday, May 26, 2011
  • 81. #1 - How to embed Solr (standard) - solr.client.solrj.embedded.EmbeddedSolrServer 56 Thursday, May 26, 2011
  • 82. #2 - How to embed Solr (simplified) - solr.servlet.DirectSolrConnection - like previous, but simpler - all the queries are sent as strings, everything is just a string - very flexible and probably suitable for quick integration 57 Thursday, May 26, 2011
  • 83. #2 - How to embed Solr (simplified) - solr.servlet.DirectSolrConnection - like previous, but simpler - all the queries are sent as strings, everything is just a string - very flexible and probably suitable for quick integration 57 Thursday, May 26, 2011
  • 84. #3 - Example of a Solr custom handler 58 Thursday, May 26, 2011
  • 85. #4 - Example Python handler 59 Thursday, May 26, 2011