This document summarizes a presentation about MontySolr, an extension that allows embedding CPython in Solr. It was created by Roman Chyla of CERN to connect Python and Java applications without compromises. MontySolr uses JCC to embed a Python interpreter in Java, allowing Python code to interface with Solr. This provides a robust, tested integration that works for any Python or C/C++ application and leverages the strengths of both Solr and Invenio.
Strategies for Landing an Oracle DBA Job as a Fresher
Cpython embedded in solr - By Roman Chyla
1. MontySolr:
Embedding CPython in Solr
Roman Chyla, CERN
roman.chyla@cern.ch, May 26, 2011
Thursday, May 26, 2011
2. Why should I care?
- Our challenge is to connect Python and Java
- Without compromises
- We created MontySolr extension
- Robust, tested (will be used by our system)
- But works for any Python application (eg. Django)
- And for any C/C++ app that Python understands!
- Open source (GPL v2)
- Try it out!
- https://github.com/romanchyla/montysolr
2
Thursday, May 26, 2011
3. Outline
‣ Context
- The Challenge
- Key components
- Available technologies
- Our approach
- Problems solved
- Evaluation
- Wrap-up
3
Thursday, May 26, 2011
4. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
5. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
6. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
7. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
8. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
9. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
10. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
11. CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics
- Home to the Large Hadron Collider
- 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
12. SPIRES
- Stanford Linear Accelerator Center - SLAC
- High-Energy Physics Literature Database
- Started December 1991
- The first web outside Europe/CERN
- The first database on web
5
Thursday, May 26, 2011
13. SPIRES
- Stanford Linear Accelerator Center - SLAC
- High-Energy Physics Literature Database
- Started December 1991
- The first web outside Europe/CERN
- The first database on web
5
Thursday, May 26, 2011
16. Invenio
- Integrated digital library software behind INSPIRE
- Used by very large institutional repositories
- http://repositories.webometrics.info/toprep_inst.asp
- Customizable virtual collections
- Flexible management of metadata
- 3 000 authors per article
- Powerful search engine
- Incl. citation map analysis
- Written in Python (since 2001)
- 290 000 lines of code
8
Thursday, May 26, 2011
17. Outline
- Context
‣ The Challenge
- Key components
- Available technologies
- Our approach
- Problems solved
- Evaluation
- Wrap-up
9
Thursday, May 26, 2011
18. The Challenge
- HEP scientific community
- Searches metadata oriented
- However fulltexts are changing the situation
- And we want to provide even better service
- Bigger volumes of data
- NLP processing
- Semantic search
10
Thursday, May 26, 2011
20. The Challenge
Query: supersymmetry AND author:ellis
Invenio
11
Thursday, May 26, 2011
21. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
11
Thursday, May 26, 2011
22. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
IDs: 1;2;3;9....
11
Thursday, May 26, 2011
23. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
IDs: 1;2;3;9....
11
Thursday, May 26, 2011
24. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
IDs: 1;2;3;9....
11
Thursday, May 26, 2011
25. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
IDs: 1;2;3;9....
11
Thursday, May 26, 2011
26. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
1-6M IDs
IDs: 1;2;3;9....
11
Thursday, May 26, 2011
27. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
1-6M IDs
IDs: 1;2;3;9....
1. only IDs,
no score
= no ranking
11
Thursday, May 26, 2011
28. The Challenge
Query: supersymmetry AND author:ellis
Invenio fulltext:supersymmetry
1-6M IDs
IDs: 1;2;3;9....
2. score merging 1. only IDs,
difficult (if no score
available) = no ranking
11
Thursday, May 26, 2011
29. The Challenge
3. push IDs ?
Query: supersymmetry AND author:ellis
(eg._faceting)
Invenio fulltext:supersymmetry
1-6M IDs
IDs: 1;2;3;9....
2. score merging 1. only IDs,
difficult (if no score
available) = no ranking
11
Thursday, May 26, 2011
30. What is the “best” solution?
- We love Python...
- ...and our applications are written in Python...
- But what if Solr is the master search engine?
- Merge results inside Solr?
- Typical size: 1-10 mil. IDs
- Expected latency: 1-2 s.
- What we want to achieve:
- Fast transfer of hits from Invenio to Solr
- Leverage the power of both (no compromises)
- Developer-friendly integration, simplicity
12
Thursday, May 26, 2011
31. Outline
- Context
- The Challenge
‣ Key components
- Available technologies
- Our approach
- Evaluation
- Demonstration
- Wrap-up
13
Thursday, May 26, 2011
32. To embed Solr (in Java app)
- Your app simulates Java web container?
- use EmbeddedSolrServer
- It knows nothing about Java servlets?
- use DirectConnect class
- Maybe we are too lazy?
- Embed the web container (in my case Jetty)
- Seemed strange (webserver inside webserver)
- ... but it worked well
14
Thursday, May 26, 2011
33. To embed Solr (in Java app)
- Your app simulates Java web container?
- use EmbeddedSolrServer
- It knows nothing about Java servlets?
- use DirectConnect class
- Maybe we are too lazy?
- Embed the web container (in my case Jetty)
- Seemed strange (webserver inside webserver)
- ... but it worked well
14
Thursday, May 26, 2011
34. To embed Solr (in Java app)
- Your app simulates Java web container?
- use EmbeddedSolrServer
- It knows nothing about Java servlets?
- use DirectConnect class
- Maybe we are too lazy?
- Embed the web container (in my case Jetty)
- Seemed strange (webserver inside webserver)
- ... but it worked well
14
Thursday, May 26, 2011
35. To embed Solr (in Java app)
- Your app simulates Java web container?
- use EmbeddedSolrServer
- It knows nothing about Java servlets?
- use DirectConnect class
- Maybe we are too lazy?
- Embed the web container (in my case Jetty)
- Seemed strange (webserver inside webserver)
- ... but it worked well
14
Thursday, May 26, 2011
36. To embed Solr (in Java app)
- Your app simulates Java web container?
- use EmbeddedSolrServer
- It knows nothing about Java servlets?
- use DirectConnect class
- Maybe we are too lazy?
- Embed the web container (in my case Jetty)
- Seemed strange (webserver inside webserver)
- ... but it worked well
14
Thursday, May 26, 2011
37. To use Solr in non-Java app
- Solr is already usable via HTTP requests, but we
need something else here...
- Remote objects/calls?
- Pyro, execnet, CORBA, SOAP...
- or simply pipes?
- Access Python from Java?
- Jython
- JEPP
- Access Java from Python?
- JPype
- JCC
15
Thursday, May 26, 2011
38. Jython?
- Implementation of Python in 100% Java
- Both Java and Python code
- Truly multithreaded
- C modules will not work
- but see http://bit.ly/iTRYbb
- Slower than CPython
16
Thursday, May 26, 2011
39. Jython?
- Implementation of Python in 100% Java
- Both Java and Python code
- Truly multithreaded
- C modules will not work
- but see http://bit.ly/iTRYbb
- Slower than CPython
17
Thursday, May 26, 2011
40. Jython?
- Implementation of Python in 100% Java
- Both Java and Python code
- Truly multithreaded
- C modules will not work
- but see http://bit.ly/iTRYbb
- Slower than CPython
17
Thursday, May 26, 2011
41. JEPP - Java Embedded Python
- Python code runs inside
Python interpreter
- Embeds CPython interpreter
via Java Native Interface
(JNI) in Java
- http://jepp.sourceforge.net/
- recently updated (27-Jan)
- but JCC is more active
18
Thursday, May 26, 2011
43. JCC
- Embeds JVM in Python
- C++ code generator
- C++ object interface
wraps a Java library
- C++ wrappers conform
to Python's C type
system
- result: complete Python
extension module
20
Thursday, May 26, 2011
47. To use Solr in non-Java app
Jython JCC JEPP
Python ✓ ✓
CModules
Speed ✓ ?
No code ✓ ✓
changes
Access from ✓ ✓
Python
Access from ✓ ... ✓
Java
22
Thursday, May 26, 2011
48. The first try
Invenio
Solr
JCC
23
Thursday, May 26, 2011
49. Devil is in details...
24
Thursday, May 26, 2011
50. GIL - Global Interpreter Lock
Unfortunately Python webapp is not like Java...
25
Thursday, May 26, 2011
51. GIL - Global Interpreter Lock
We can have 200 threads, but only 4 will run at time...
26
Thursday, May 26, 2011
52. GIL - Global Interpreter Lock
27
Thursday, May 26, 2011
53. Fortunately solution exists
- JCC can embed Python inside Java
- Special thanks to Andi Vajda! (JCC creator)
- We write ‘empty’ classes in Java ...
- ... and implement them in Python
Python /w Java inside Java /w Python inside 28
Thursday, May 26, 2011
54. The second try
Solr /w Invenio
Invenio (backend)
frontend
XML
JCC
29
Thursday, May 26, 2011
55. Implementing the bridge
- Special Java class
- With method pythonExtension()
- Native method pythonDecRef()
- JCC provides its implementation
- And number of other native methods
- These will be implemented using Python
- Like writing JNI Java/C code but without
compilation...
30
Thursday, May 26, 2011
56. MontySolr extension
- JCC has great potential, but also added
complexity...
- So the MontySolr project was born
- Modules must be built in shared mode
- JCC dynamic library loaded and started from the main
thread
- Simple mechanism of the Python bridge and message
- Configurable handlers on the Python side
- Secured dereferencing of the native objects
- Threading on the Java side
- Multiprocessing on the Python side
- Easy ant targets (compilation) ...
31
Thursday, May 26, 2011
57. Hello World - Java part
public class MontySolrBridge extends BasicBridge implements
PythonBridge {
private long pythonObject;
public void pythonExtension(long pythonObject) {
this.pythonObject = pythonObject;
}
public long pythonExtension() {
return this.pythonObject;
}
public void finalize() throws Throwable {
pythonDecRef();
}
public native void pythonDecRef();
public void sendMessage(PythonMessage message) {
PythonVM vm = PythonVM.get();
vm.acquireThreadState();
receive_message(message);
vm.releaseThreadState();
}
public native void receive_message(PythonMessage message);
} 32
Thursday, May 26, 2011
58. Hello World - Python part
from montysolr import MontySolrBridge
class SimpleBridge(MontySolrBridge):
def __init__(self):
super(SimpleBridge, self).__init__()
def receive_message(self, message):
query = message.getParam(‘query’)
message.setResults(‘Hello world!’)
print ‘Python received from Java:’, query
33
Thursday, May 26, 2011
59. Example - running MontySolr
- Java side
- JRE (32/64 bit)
- Standard Solr/Lucene jars
- JCC dynamic library
- Python side
- Python interpreter (32/64 bit)
- 4 Python modules (jcc, solr, lucene, montysolr)
- In the main thread
- First we load JCC
- Then start Python interpreter ...
- ... load Python handlers
34
Thursday, May 26, 2011
60. Solr as search service
Solr /w Invenio
Invenio (backend)
frontend
XML
JCC
35
Thursday, May 26, 2011
61. Example
Solr
MyCustom
Handler
36
Thursday, May 26, 2011
62. Example
refersto:author:ellis
Solr
MyCustom
Handler
37
Thursday, May 26, 2011
63. Example - Solr custom handler
MontySolrVM.INSTANCE.sendMessage(message);
PythonMessage msg = MontySolrVM.INSTANCE
.createMessage("perform_search")
.setSender("Invenio")
.setParam("query","refersto:author:ellis");
MontySolrVM.INSTANCE.sendMessage(msg);
Object result = msg.getResults();
if (result != null) {
int[] hits = (int[]) message.getResults();
}
38
Thursday, May 26, 2011
64. Example - JNI connection
refersto:author:ellis
Solr
MyCustom Python
Handler Bridge
39
Thursday, May 26, 2011
65. Example - JNI connection
refersto:author:ellis
Solr
MyCustom Python Invenio
Handler Bridge wrappers
40
Thursday, May 26, 2011
66. Example - Python side
# handler is made ‘visible’ at startup
SolrpieTarget('Invenio:perform_search',
perform_search)
# search time - called from Java
def perform_search(message):
query = message.getParam(“query”)
hits = call_real_search(query)
# cast Python list into Java array
message.setResults(JArray_ints(hits))
41
Thursday, May 26, 2011
67. Example
refersto:author:ellis
Solr
Invenio
Invenio
MyCustom Python Invenio
Handler Bridge wrappers
Invenio
Invenio
42
Thursday, May 26, 2011
68. Example - Java side again
MontySolrVM.INSTANCE.sendMessage(message);
PythonMessage msg = MontySolrVM.INSTANCE
.createMessage("perform_search")
.setSender("Invenio")
.setParam("query","refersto:author:ellis");
MontySolrVM.INSTANCE.sendMessage(msg);
Object result = msg.getResults();
if (result != null) {
int[] hits = (int[]) message.getResults();
}
43
Thursday, May 26, 2011
69. Solr as search service
Solr /w Invenio
Apache (backend)
webserver
XML
Invenio
Invenio
JCC
44
Thursday, May 26, 2011
70. Outline
- Context
- The Challenge
- Key components
- Available technologies
- Our approach
- Problems solved
‣ Evaluation
- Wrap-up
45
Thursday, May 26, 2011
74. Robust?
- Extensive siege tests show very good
performance and stability under high load
- 100-200 users, complex searches
- 50 concurrent users, citation analysis
- JCC incurs small overhead
- We detected no memory leaks
- The same as dbpedia.org
- But watch out for errors in C
- An error in C module brings down the whole JVM
- (errors in pure Python module can be handled)
49
Thursday, May 26, 2011
75. Easy to develop/maintain?
- Added complexity
- Java in the toolbox
- Need to compile C++ extensions
- Python/OS version dependencies
- For this we get
- Easy integration with Invenio
- The best of two applications
- A lot of features for free
- And we can control Solr from Python!
50
Thursday, May 26, 2011
76. Outline
- Context
- The Challenge
- Key components
- Available technologies
- Our approach
- Problems solved
- Evaluation
‣ Wrap-up
51
Thursday, May 26, 2011
77. Wrap-up
- Our challenge was to connect two different
languages/systems
- And we wanted to get the best of the two...
- So we had to plug Python into Solr
- And now our Solr knows citation analysis!
- We created MontySolr extension
- Robust, tested (will be used by INSPIRE)
- Works for any Python application (eg. Django)
- And for any C/C++ app that Python understands!
- Free software license
- Try it out! Help us make it better!
- https://github.com/romanchyla/montysolr
52
Thursday, May 26, 2011
78. Questions?
- MontySolr
- https://github.com/romanchyla/montysolr
- Roman Chyla
- Fellow, CERN Scientific Information Service
- roman.chyla@cern.ch
- @rchyla
- https://svnweb.cern.ch/trac/rcarepo
Thursday, May 26, 2011
80. Links
- Invenio platform
- http://invenio-software.org/
- INSPIRE Digital library
- http://inspirebeta.net/
- Diagrams of JCC and JEPP
- Andreas Schreiber : Mixing Java and Python
- http://www.slideshare.net/onyame/mixing-python-and-
java
- On Jython C Extension API
- http://stackoverflow.com/questions/3097466/using-
numpy-and-cpython-with-jython
- Demo of a running service:
- http://insdev01.cern.ch 55
Thursday, May 26, 2011
81. #1 - How to embed Solr (standard)
- solr.client.solrj.embedded.EmbeddedSolrServer
56
Thursday, May 26, 2011
82. #2 - How to embed Solr (simplified)
- solr.servlet.DirectSolrConnection
- like previous, but simpler
- all the queries are sent as strings, everything is
just a string
- very flexible and probably suitable for quick
integration
57
Thursday, May 26, 2011
83. #2 - How to embed Solr (simplified)
- solr.servlet.DirectSolrConnection
- like previous, but simpler
- all the queries are sent as strings, everything is
just a string
- very flexible and probably suitable for quick
integration
57
Thursday, May 26, 2011
84. #3 - Example of a Solr custom handler
58
Thursday, May 26, 2011
85. #4 - Example Python handler
59
Thursday, May 26, 2011