Apache Lucene is the de-facto standard open source library for Java developers to implement full-text-search capabilities.
While it’s thriving in its field, it is rarely mentioned in the scope of Java EE development.
In this talk we will see for which features many developers love Lucene, make some concrete examples of common problems it elegantly solves, and see some best practices about using it in a Java EE stack.
Finally we'll see how some popular OSS projects such as Hibernate ORM (JPA provider), WildFly (Java EE runtime) and Infinispan (in-memory datagrid, JCache implementor) actually provide great Lucene integration capabilities.
6. AGENDA
What is and how can it help you
Integrations with a JPA application via
How does this all relate with and
Lucene index management
Plans and wishlist for the future
Apache Lucene
Hibernate Search
Infinispan WildFly
10. LET'S REFRESH SOME HISTORY ON THE
WIKIPEDIA
Select * from WikipediaPages p where p.content LIKE ?;
Select * from WikipediaPages p where p.title LIKE ?;
Select * from WikipediaPages p where
(lowercase(p.content) LIKE %:1% OR
lowercase(p.content) LIKE %:2% OR
lowercase(p.content) LIKE %:3% OR
...);
11.
12.
13.
14. AM I CHEATING?
I'm quoting some very successfull web companies.
How many can you list which do not provide an effective
search engine?
Why is that?
15. REQUIREMENTS FOR A SEARCH ENGINE
Need to guess what you want w/o you typing all of the
content
We all hate forms
We want the results in the blink of an eye
We want the right result on top: Relevance
16.
17. SOME MORE THINGS TO CONSIDER:
Approximate word matches
Stemming / Language specific analysis
Typos
Synonyms, Abbreviations, Technical Language
specializations
18. BASICS: KEYWORD EXTRACTION
On how to improve running by Scott
1. Tokenization & Analysis:
how
improv
run
scott
2. Scoring
19. APACHE LUCENE
Open source Apache™ top level project
Primarily Java, ported to many other languages and
platforms
Extremely popular, it's everywhere!
High pace of improvement, excellent team
Most impressive testing
20. AS A JAVAEE DEVELOPER:
You are familiar with JPA
But Lucene is much better than a relational database to
address this problem
Easy integration with the platform is a requirement
21.
22. LET'S INTRODUCE APACHE LUCENE VIA
HIBERNATE SEARCH
Deeply but transparently integrated with Hibernate's
23. EntityManager
Internally uses advanced Apache Lucene features, but
protects your deadlines from the lower level details
Gets great performance out of it
Simple annotations, yet many flexible override options
24. Does not prevent you to perform any form of advanced /
native Lucene query
25. Transparent index state synchronization
Transaction integrations
Options to rebuild the index efficiently
Failover and clustering integration points
Flexible Error handling
32. INDEX FIELDS FOR ACTOR
id name
1 Harrison Ford
2 Kirsten Dunst
INDEX FIELDS FOR DVD
id title actors.name *
1 Melancholia {Kirsten Dunst, Charlotte Gainsbourg,
Kiefer Sutherland}
2 The Force
Awakens
{Harrison Ford, Mark Hamill, Carrie
Fisher}
33. RUN A LUCENE QUERY BUT GET JPA
MANAGED RESULTS
String[] productFields = { "title", "actors.name" };
org.apache.lucene.search.Query luceneQuery = // ...
FullTextEntityManager ftEm =
Search.getFullTextEntityManager( entityManager );
FullTextQuery query = // extends javax.persistence.Query
ftEm.createFullTextQuery( luceneQuery, DVD.class );
List dvds = // Managed entities!
query.setMaxResults(100).getResultList();
int totalNbrOfResults = query.getResultSize();
45. Indexes need to be stored, updated and read from.
You can have many indexes, managed independently
An index can be written by an exclusive writer only
Backends can be configured differently per index
Index storage - the Directory - can also be configured per
index
49. WHAT IS INFINISPAN?
In Memory Key/Value Store
ASL v2 License
Scalable
JTA Transactions
Persistence (File/JDBC/LevelDB/...)
Local/Clustered
Embedded/Server
...
50. INFINISPAN / JAVAEE?
JavaEE: JCache implementation
A core component of WildFly
"Embedded" mode does not depend on WildFly
Hibernate 2n level cache
51. INFINISPAN / APACHE LUCENE?
Lucene integrations for Querying the datagrid!
Lucene integrations to store the index!
Hibernate Search integrations!
52. HIBERNATE SEARCH & INFINISPAN
QUERY
Same underlying technology
Same API to learn
Same indexing configuration options
53.
54. Same annotations, not an entity:
@Indexed
@AnalyzerDef(name = "lowercaseKeyword",
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),
filters = {@TokenFilterDef(factory = LowerCaseFilterFactory.class)}
)
@SerializeWith(CountryExternalizer.class)
public class Country {
@Field(store = Store.YES)
@Analyzer(definition = "lowercaseKeyword")
private String name;
55. Store the Java POJO classes in the cache directly:
Country uk = ...
cache.put("UK", uk );
70. QUERY - INFINISPAN INDEXMANAGER
Configuration configuration = new ConfigurationBuilder()
.indexing().index(Index.LOCAL)
.addProperty("default.worker.execution", "async")
.addProperty("default.indexmanager",
"org.infinispan.query.indexmanager.InfinispanIndexManager"
.build();
EmbeddedCacheManager cm = new DefaultCacheManager(configuration);
Cache<Integer, DaySummary> cache = cm.getCache();
71.
72. THE INFINISPAN LUCENE DIRECTORY
Storing the Apache Lucene index in an high-performance in
memory data grid
73. You don't need Hibernate Search to cluster your existing
Lucene application, but you'll need some external
coordination to guarantee the single IndexWriter.
74. INFINISPAN QUERY AND THE LUCENE
DIRECTORY IN ACTION
Weather Demo by Gustavo Nalle Fernandes
DEMO
Indexed
NOAA.gov data from 1901 to 2014
~10M summaries
Yearly country max recorded temperature by month
Cache<Integer, DaySummary>
75. WHAT'S ON THE HORIZON FOR
INFINISPAN
Improvements in indexing performance
Hadoop and Spark integration experiments
Combining indexed & non-indexed query capabilities,
including remote queries
76. WHAT'S COMING FOR HIBERNATE
SEARCH
Upgrading to Lucene 5
Experiment integrations with REST based Lucene servers
(Solr, ElasticSearch)
Improved backends to simplify clustering setup
GSOC: generic JPA support and improved developer
tooling
A lot more! See also the roadmap
77. THANK YOU!
Some references:
, the super simple
, the
The website, the by
The website
Our team's blog
(requires Chrome)
Apache Lucene website
Hibernate Search website Hibernate
Search JPA demo WildFly integration tests
Infinispan Weather Demo
@gustavonalle
WildFly
in.relation.to
Export these slides to PDF