SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
Java Search Engine Framework
soluzioni
•  Regular expression (can be slow and memory hungry)
•  Lucene (full-text search engine library)
•  Solr (standalone full-text search server )
•  SolrJ (java client per solr)
Regular expression
•  (cos’è) una sequenza di simboli (quindi una
stringa) che identifica un insieme di stringhe
•  (che fa) definisce una funzione che prende in
ingresso una stringa, e restituisce in uscita un
valore del tipo sì/no, a seconda che la stringa
segua o meno un certo pattern.
Regular expression (esempio)
1.  Pattern p = Pattern.compile("eur*usd");
2.  Matcher m = p.matcher(
3.  “In quel ramo del lago di
eUr&uSd”).toLowerCase()
4.  );
5.  If(m.find()) {
//trovato! Ma dove nella stringa?
6.  }
Lucene
•  Lucene is a high-performance, full-featured text
search engine library written entirely in Java. It
is a technology suitable for nearly any
application that requires full-text search,
especially cross-platform.
•  Apache Software Foundation
•  Stable release 4.3.0 / May 6, 2013
•  Development status Active
Lucene (esempio)
•  Analyzer analyzer = null;
•  Directory index = null;
•  IndexWriterConfig config = null;
•  IndexWriter w = null;
•  //analyzer = new StandardAnalyzer(Version.LUCENE_43);
•  analyzer = new KeywordAnalyzer();
•  index = new RAMDirectory();
•  config = new IndexWriterConfig(Version.LUCENE_43, analyzer);
•  w = new IndexWriter(index, config);
Lucene (esempio 2)
1.  private void addDoc(long time, String value, String flag) throws Exception
{
2.  Document doc = new Document();
3.  doc.add(new StringField("time", String.valueOf(time), Field.Store.YES));
4.  doc.add(new StringField("value", value, Field.Store.YES));
5.  doc.add(new StringField("flag", flag, Field.Store.YES));
6.  w.addDocument(doc);
7.  }
à w.commit(); //da eseguire alla fine del batch
Lucene (esempio 3)
1.  IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(index));
2.  MultiFieldQueryParser queryParser = new MultiFieldQueryParser(
3.  Version.LUCENE_43,
4.  new String[] {"time", "value", "flag"},
5.  analyzer);
6.  QueryParser queryParser = new QueryParser(
7.  Version.LUCENE_43,
8.  "value",
9.  analyzer);
10.  TopDocs hits = searcher.search(queryParser.parse("VALUE:(+eurusd)"), 50);
11.  System.out.println(hits.totalHits);
12.  for(ScoreDoc scoreDoc : hits.scoreDocs) {
13.  Document doc = searcher.doc(scoreDoc.doc);
14.  System.out.println(doc.toString());
15.  }
Solr
•  Solr is written in Java and runs as a standalone full-text search
server within a servlet container such as Jetty.
•  Solr uses the Lucene Java search library at its core for full-text
indexing and search, and has REST-like HTTP/XML and JSON APIs
that make it easy to use from virtually any programming language.
Solr's powerful external configuration allows it to be tailored to
almost any type of application without Java coding, and it has an
extensive plugin architecture when more advanced customization is
required.
•  Apache Software Foundation
•  Stable release 4.3.0 / May 6, 2013
•  Development status Active
SolrJ
•  SolrJ is a java client to access Solr.
•  It offers a java interface to add, update,
and query the solr index.
•  Last version: 1.4.X
SolrJ (esempio)
1.  SolrServer server = new HttpSolrServer("http://localhost:8983/solr/");
2.  server.deleteByQuery( "*:*" );// CAUTION: deletes everything!
3.  SolrInputDocument doc1 = new SolrInputDocument();
4.  doc1.addField( "id", 23425);
5.  doc1.addField( "name", "doc1");
6.  doc1.addField( "price", 100980 );
7.  SolrInputDocument doc2 = new SolrInputDocument();
8.  doc2.addField( "id", 63432);
9.  doc2.addField( "name", "doc2");
10. doc2.addField( "price", 205345 );
11. Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
12. docs.add(doc1);
13. docs.add(doc2);
14.  server.add(docs);
15.  server.commit();
16.  SolrQuery query = new SolrQuery();
17.  query.setQuery("+name:*c1 +price:100980");
18.  QueryResponse rsp = server.query(query);
SolrJ (esempio)
1.  SolrDocumentList docsr = rsp.getResults();
2.  for(SolrDocument document : docsr){
3.  Object formName = document.getFieldValue("id");
4.  System.out.println(formName);
5.  }
6.  List<Product> products = rsp.getBeans(Product.class);
7.  for(Product product : products){
8.  Object empName = product.getId();
9.  System.out.println(empName);
10.  }
SolrJ (Product class)
1.  public class Product {
2.  private String id;
3.  public String getId() {
4.  return id;
5.  }
6.  @Field("id")
7.  public void setId(String id) {
8.  this.id = id;
9.  }
…the same for price and name attributes.
10. }
SolrJ (file indexing)
1.  public static void indexPdfWithSolrJ(String fileName, String solrId) throws Exception {
2.  String urlString = "http://localhost:8983/solr";
3.  SolrServer solr = new HttpSolrServer(urlString);
4.  ContentStreamUpdateRequest up = new longnameclass("/update/extract");
5.  up.addFile(new File(fileName),"application/pdf");
6.  up.setParam("literal.id",solrId);
7.  up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
8.  solr.request(up);
9.  QueryResponse rsp = solr.query(new SolrQuery("*:*"));
10.  System.out.println(rsp);
11.  }
references
•  Lucene & Solr http://lucene.apache.org/solr/
•  SolrJ http://wiki.apache.org/solr/Solrj
•  Tika http://tika.apache.org/

Más contenido relacionado

La actualidad más candente

03 standard class library
03 standard class library03 standard class library
03 standard class libraryeleksdev
 
Dagger & rxjava & retrofit
Dagger & rxjava & retrofitDagger & rxjava & retrofit
Dagger & rxjava & retrofitTed Liang
 
The Ring programming language version 1.6 book - Part 28 of 189
The Ring programming language version 1.6 book - Part 28 of 189The Ring programming language version 1.6 book - Part 28 of 189
The Ring programming language version 1.6 book - Part 28 of 189Mahmoud Samir Fayed
 
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189Mahmoud Samir Fayed
 
Using xUnit as a Swiss-Aarmy Testing Toolkit
Using xUnit as a Swiss-Aarmy Testing ToolkitUsing xUnit as a Swiss-Aarmy Testing Toolkit
Using xUnit as a Swiss-Aarmy Testing ToolkitChris Oldwood
 
Java practice programs for beginners
Java practice programs for beginnersJava practice programs for beginners
Java practice programs for beginnersishan0019
 
JavaScript Classes and Inheritance
JavaScript Classes and InheritanceJavaScript Classes and Inheritance
JavaScript Classes and Inheritancemarcheiligers
 
Application-Specific Models and Pointcuts using a Logic Meta Language
Application-Specific Models and Pointcuts using a Logic Meta LanguageApplication-Specific Models and Pointcuts using a Logic Meta Language
Application-Specific Models and Pointcuts using a Logic Meta LanguageESUG
 
ActiveRecord Query Interface (1), Season 1
ActiveRecord Query Interface (1), Season 1ActiveRecord Query Interface (1), Season 1
ActiveRecord Query Interface (1), Season 1RORLAB
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Spark Summit
 
Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETTomas Jansson
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
The Ring programming language version 1.5.1 book - Part 44 of 180
The Ring programming language version 1.5.1 book - Part 44 of 180The Ring programming language version 1.5.1 book - Part 44 of 180
The Ring programming language version 1.5.1 book - Part 44 of 180Mahmoud Samir Fayed
 

La actualidad más candente (20)

Java Week4(C) Notepad
Java Week4(C)   NotepadJava Week4(C)   Notepad
Java Week4(C) Notepad
 
03 standard class library
03 standard class library03 standard class library
03 standard class library
 
Networking Core Concept
Networking Core ConceptNetworking Core Concept
Networking Core Concept
 
Dagger & rxjava & retrofit
Dagger & rxjava & retrofitDagger & rxjava & retrofit
Dagger & rxjava & retrofit
 
The Ring programming language version 1.6 book - Part 28 of 189
The Ring programming language version 1.6 book - Part 28 of 189The Ring programming language version 1.6 book - Part 28 of 189
The Ring programming language version 1.6 book - Part 28 of 189
 
JDBC Core Concept
JDBC Core ConceptJDBC Core Concept
JDBC Core Concept
 
Unit Testing with Foq
Unit Testing with FoqUnit Testing with Foq
Unit Testing with Foq
 
Java 7 new features
Java 7 new featuresJava 7 new features
Java 7 new features
 
The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189The Ring programming language version 1.6 book - Part 42 of 189
The Ring programming language version 1.6 book - Part 42 of 189
 
Using xUnit as a Swiss-Aarmy Testing Toolkit
Using xUnit as a Swiss-Aarmy Testing ToolkitUsing xUnit as a Swiss-Aarmy Testing Toolkit
Using xUnit as a Swiss-Aarmy Testing Toolkit
 
MongoDB-SESSION03
MongoDB-SESSION03MongoDB-SESSION03
MongoDB-SESSION03
 
Java practice programs for beginners
Java practice programs for beginnersJava practice programs for beginners
Java practice programs for beginners
 
JavaScript Classes and Inheritance
JavaScript Classes and InheritanceJavaScript Classes and Inheritance
JavaScript Classes and Inheritance
 
Application-Specific Models and Pointcuts using a Logic Meta Language
Application-Specific Models and Pointcuts using a Logic Meta LanguageApplication-Specific Models and Pointcuts using a Logic Meta Language
Application-Specific Models and Pointcuts using a Logic Meta Language
 
ActiveRecord Query Interface (1), Season 1
ActiveRecord Query Interface (1), Season 1ActiveRecord Query Interface (1), Season 1
ActiveRecord Query Interface (1), Season 1
 
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
Diagnosing Open-Source Community Health with Spark-(William Benton, Red Hat)
 
Getting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NETGetting started with Elasticsearch and .NET
Getting started with Elasticsearch and .NET
 
Heap
HeapHeap
Heap
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
The Ring programming language version 1.5.1 book - Part 44 of 180
The Ring programming language version 1.5.1 book - Part 44 of 180The Ring programming language version 1.5.1 book - Part 44 of 180
The Ring programming language version 1.5.1 book - Part 44 of 180
 

Similar a Java Search Engine Framework

Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...DEEPANSHU GUPTA
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataAlexander Schätzle
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 
A topology of memory leaks on the JVM
A topology of memory leaks on the JVMA topology of memory leaks on the JVM
A topology of memory leaks on the JVMRafael Winterhalter
 
Java7 New Features and Code Examples
Java7 New Features and Code ExamplesJava7 New Features and Code Examples
Java7 New Features and Code ExamplesNaresh Chintalcheru
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Andrew Yongjoon Kong
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»
Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»
Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»Anna Shymchenko
 
Java design patterns
Java design patternsJava design patterns
Java design patternsShawn Brito
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...julien.ponge
 

Similar a Java Search Engine Framework (20)

Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...APPLICATION TO DOCUMENT ALL THE  DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
APPLICATION TO DOCUMENT ALL THE DETAILS OF JAVA CLASSES OF A PROJECT AT ONCE...
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
A topology of memory leaks on the JVM
A topology of memory leaks on the JVMA topology of memory leaks on the JVM
A topology of memory leaks on the JVM
 
Java7 New Features and Code Examples
Java7 New Features and Code ExamplesJava7 New Features and Code Examples
Java7 New Features and Code Examples
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»
Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»
Дмитрий Контрерас «Back to the future: the evolution of the Java Type System»
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Jersey
JerseyJersey
Jersey
 
Sequelize
SequelizeSequelize
Sequelize
 
Java design patterns
Java design patternsJava design patterns
Java design patterns
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
Java 7 Launch Event at LyonJUG, Lyon France. Fork / Join framework and Projec...
 

Más de Appsterdam Milan

Más de Appsterdam Milan (18)

App Store Optimisation
App Store OptimisationApp Store Optimisation
App Store Optimisation
 
iOS Accessibility
iOS AccessibilityiOS Accessibility
iOS Accessibility
 
Lean Startup in Action
Lean Startup in ActionLean Startup in Action
Lean Startup in Action
 
Giocare con il fuoco: Firebase
Giocare con il fuoco: FirebaseGiocare con il fuoco: Firebase
Giocare con il fuoco: Firebase
 
Data visualization e fitness app!
Data visualization e fitness app!Data visualization e fitness app!
Data visualization e fitness app!
 
iBeacon, il faro a bassa energia...
iBeacon, il faro a bassa energia...iBeacon, il faro a bassa energia...
iBeacon, il faro a bassa energia...
 
Facciamo delle slide migliori!
Facciamo delle slide migliori!Facciamo delle slide migliori!
Facciamo delle slide migliori!
 
Fitness for developer
Fitness for developerFitness for developer
Fitness for developer
 
Follow the UX path
Follow the UX pathFollow the UX path
Follow the UX path
 
Dalla black box alla scatola nera
Dalla black box alla scatola neraDalla black box alla scatola nera
Dalla black box alla scatola nera
 
iOS design patterns: blocks
iOS design patterns: blocksiOS design patterns: blocks
iOS design patterns: blocks
 
Multithreading in Java
Multithreading in JavaMultithreading in Java
Multithreading in Java
 
Data binding libera tutti!
Data binding libera tutti!Data binding libera tutti!
Data binding libera tutti!
 
Speech for Windows Phone 8
Speech for Windows Phone 8Speech for Windows Phone 8
Speech for Windows Phone 8
 
Web frameworks
Web frameworksWeb frameworks
Web frameworks
 
Interfacciamento di iPhone ed iPad
Interfacciamento di iPhone ed iPadInterfacciamento di iPhone ed iPad
Interfacciamento di iPhone ed iPad
 
Design patterns
Design patternsDesign patterns
Design patterns
 
Appsterdam Milan Winter Launch
Appsterdam Milan Winter LaunchAppsterdam Milan Winter Launch
Appsterdam Milan Winter Launch
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Java Search Engine Framework

  • 1. Java Search Engine Framework
  • 2. soluzioni •  Regular expression (can be slow and memory hungry) •  Lucene (full-text search engine library) •  Solr (standalone full-text search server ) •  SolrJ (java client per solr)
  • 3. Regular expression •  (cos’è) una sequenza di simboli (quindi una stringa) che identifica un insieme di stringhe •  (che fa) definisce una funzione che prende in ingresso una stringa, e restituisce in uscita un valore del tipo sì/no, a seconda che la stringa segua o meno un certo pattern.
  • 4. Regular expression (esempio) 1.  Pattern p = Pattern.compile("eur*usd"); 2.  Matcher m = p.matcher( 3.  “In quel ramo del lago di eUr&uSd”).toLowerCase() 4.  ); 5.  If(m.find()) { //trovato! Ma dove nella stringa? 6.  }
  • 5. Lucene •  Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. •  Apache Software Foundation •  Stable release 4.3.0 / May 6, 2013 •  Development status Active
  • 6. Lucene (esempio) •  Analyzer analyzer = null; •  Directory index = null; •  IndexWriterConfig config = null; •  IndexWriter w = null; •  //analyzer = new StandardAnalyzer(Version.LUCENE_43); •  analyzer = new KeywordAnalyzer(); •  index = new RAMDirectory(); •  config = new IndexWriterConfig(Version.LUCENE_43, analyzer); •  w = new IndexWriter(index, config);
  • 7. Lucene (esempio 2) 1.  private void addDoc(long time, String value, String flag) throws Exception { 2.  Document doc = new Document(); 3.  doc.add(new StringField("time", String.valueOf(time), Field.Store.YES)); 4.  doc.add(new StringField("value", value, Field.Store.YES)); 5.  doc.add(new StringField("flag", flag, Field.Store.YES)); 6.  w.addDocument(doc); 7.  } à w.commit(); //da eseguire alla fine del batch
  • 8. Lucene (esempio 3) 1.  IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(index)); 2.  MultiFieldQueryParser queryParser = new MultiFieldQueryParser( 3.  Version.LUCENE_43, 4.  new String[] {"time", "value", "flag"}, 5.  analyzer); 6.  QueryParser queryParser = new QueryParser( 7.  Version.LUCENE_43, 8.  "value", 9.  analyzer); 10.  TopDocs hits = searcher.search(queryParser.parse("VALUE:(+eurusd)"), 50); 11.  System.out.println(hits.totalHits); 12.  for(ScoreDoc scoreDoc : hits.scoreDocs) { 13.  Document doc = searcher.doc(scoreDoc.doc); 14.  System.out.println(doc.toString()); 15.  }
  • 9. Solr •  Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. •  Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. •  Apache Software Foundation •  Stable release 4.3.0 / May 6, 2013 •  Development status Active
  • 10. SolrJ •  SolrJ is a java client to access Solr. •  It offers a java interface to add, update, and query the solr index. •  Last version: 1.4.X
  • 11. SolrJ (esempio) 1.  SolrServer server = new HttpSolrServer("http://localhost:8983/solr/"); 2.  server.deleteByQuery( "*:*" );// CAUTION: deletes everything! 3.  SolrInputDocument doc1 = new SolrInputDocument(); 4.  doc1.addField( "id", 23425); 5.  doc1.addField( "name", "doc1"); 6.  doc1.addField( "price", 100980 ); 7.  SolrInputDocument doc2 = new SolrInputDocument(); 8.  doc2.addField( "id", 63432); 9.  doc2.addField( "name", "doc2"); 10. doc2.addField( "price", 205345 ); 11. Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>(); 12. docs.add(doc1); 13. docs.add(doc2); 14.  server.add(docs); 15.  server.commit(); 16.  SolrQuery query = new SolrQuery(); 17.  query.setQuery("+name:*c1 +price:100980"); 18.  QueryResponse rsp = server.query(query);
  • 12. SolrJ (esempio) 1.  SolrDocumentList docsr = rsp.getResults(); 2.  for(SolrDocument document : docsr){ 3.  Object formName = document.getFieldValue("id"); 4.  System.out.println(formName); 5.  } 6.  List<Product> products = rsp.getBeans(Product.class); 7.  for(Product product : products){ 8.  Object empName = product.getId(); 9.  System.out.println(empName); 10.  }
  • 13. SolrJ (Product class) 1.  public class Product { 2.  private String id; 3.  public String getId() { 4.  return id; 5.  } 6.  @Field("id") 7.  public void setId(String id) { 8.  this.id = id; 9.  } …the same for price and name attributes. 10. }
  • 14. SolrJ (file indexing) 1.  public static void indexPdfWithSolrJ(String fileName, String solrId) throws Exception { 2.  String urlString = "http://localhost:8983/solr"; 3.  SolrServer solr = new HttpSolrServer(urlString); 4.  ContentStreamUpdateRequest up = new longnameclass("/update/extract"); 5.  up.addFile(new File(fileName),"application/pdf"); 6.  up.setParam("literal.id",solrId); 7.  up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 8.  solr.request(up); 9.  QueryResponse rsp = solr.query(new SolrQuery("*:*")); 10.  System.out.println(rsp); 11.  }
  • 15. references •  Lucene & Solr http://lucene.apache.org/solr/ •  SolrJ http://wiki.apache.org/solr/Solrj •  Tika http://tika.apache.org/