SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
How we use Instaparse
Clojure meetup Helsinki, 21/06/2022
Julien Bille
Agenda
● Who I am?
● What is the problem?
● What is Instaparse?
● How we used it?
Who I am?
{:name "Julien Bille"
:from "France"
:living-in "The Netherlands"
:working-for "IPRally"
:job-title "Fullstack Developer"
}
IPRally
A search engine for patent
powered by unique AI knowledge
graph technology.
What is the problem?
Add boolean support to our product:
- Validate the syntax
- Being compatible with the most popular syntax (no standard)
- Query our database
- Prevent malicious query
(TACD:( ("electric bike” OR "electric $w4 bike” OR electricbike) AND (photovoltaic
OR solar) AND cell* ) AND AUTHORITY:( US OR JP OR EP OR WO ) AND
PBD_Y:[2000 to *]) NOT ( ALL_AN:Gazelle )
An example:
Boolean expression UI
What is Instaparse?
Instaparse is a Clojure library aims to be the simplest
way to build context free grammar parsers in Clojure.
First version release in 2013
https://github.com/Engelberg/instaparse
CLJC compatible
Context free grammar?
In practical term we want:
(defn text->tree [text-input rules])
(defn tree->sql [tree])
An example
(def as-and-bs
(insta/parser
"S = AB*
AB = A B
A = 'a'+
B = 'b'+"))
(as-and-bs "aaaaabbbaaaabb")
[:S
[:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]]
[:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]
A solution
(TACD:( ("electric bike” OR "electric $w4 bike” OR electricbike) AND (photovoltaic OR
solar) AND cell* ) AND AUTHORITY:( US OR JP OR EP OR WO ) AND PBD_Y:[2000 to *])
NOT ( ALL_AN:Gazelle )
Split into smaller pieces:
● word: photovoltaic
● quote-word: “electric bike”
● field-word: T:elect*
● operation: AND, OR, NOT
● list : (a b c)
● field-list: TA:(a b c)
A solution
(def boolean-parser
(insta/parser
"S = (exp )*
exp = word | op | quote-word | list | fields-list
word = fields?#'[a-zA-Z0-9_*/$#?-]+'
op = <space>'OR'<space> | <space>'AND'<space> | <space>'NOT'<space>
quote-word = fields? <'"'> (word<space> )* <'"'>
list = <lparen> (exp )* <rparen>
fields-list = fields list
fields = field <':'><space>
field = 'T' | 'TA' | 'A'
<lparen> = <'('><space>
<rparen> = <space><')'>
<space> = <#'[ ]*'>"
))
Examples
(boolean-parser "electric")
[:S [:exp [:word "electric"]]]
(boolean-parser "(electric)")
[:S [:exp [:list [:exp [:word "electric"]]]]]
(boolean-parser "(electric OR bike)"
)
[:S [:exp [:list
[:exp [:word "electric"]]
[:exp [:op "OR"]]
[:exp [:word "bike"]]]]]
Examples
(boolean-parser "(electric")
Parse error at line 1, column 10:
(boolean-parser "T:electric")
[:S [:exp [:word [:fields [:field "T"]] "electric"]]]
(boolean-parser "TA:electric")
[:S [:exp [:word [:fields [:field "TA"]] "electric"]]]
Examples
(boolean-parser "TAC:electric")
Parse error at line 1, column 4:
TAC:electric
^
Expected one of:
"A"
"TA"
"T"
"("
"""
"NOT"
"AND"
"OR"
#"[a-zA-Z0-9_*/$#?-]+"
Example
(boolean-parser "elec*")
[:S [:exp [:word "elec*"]]]
(boolean-parser "(TA:("electric bike" OR "electric $w4 bike"))"
)
[:S [:exp [:list [:exp [:fields-list
[
:fields [:field "TA"]]
[
:list
[
:exp [:quote-word [:word "electric"] [:word "bike"]]]
[
:exp [:op "OR"]]
[
:exp [:quote-word [:word "electric"] [:word "$w4"] [:word
"bike"]]]]]]]]]
Learning
1. Leaving room for human typo
2. Recursion comes for free
3. Don’t make it too complicated: used insta/transform, clojure.walk or
clojure.zip instead
Transform example
(->> (boolean-parser "(TA:("electric bike" OR "electric $w4
bike"))")
(insta/transform {
:quote-word transform-quote-word}))
[:S [:exp [:list [:exp [:fields-list
[
:fields [:field "TA"]]
[
:list
[
:exp [:quote-word {:words ("electric"
"bike")}]]
[
:exp [:op "OR"]]
[
:exp [:quote-word {:words ("electric" "$w4"
"bike")}]]]]]]]]
Questions?

Más contenido relacionado

Similar a How we use Instaparse

Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
g3_nittala
 

Similar a How we use Instaparse (20)

Slaying the Dragon: Implementing a Programming Language in Ruby
Slaying the Dragon: Implementing a Programming Language in RubySlaying the Dragon: Implementing a Programming Language in Ruby
Slaying the Dragon: Implementing a Programming Language in Ruby
 
LEX & YACC TOOL
LEX & YACC TOOLLEX & YACC TOOL
LEX & YACC TOOL
 
C Code and the Art of Obfuscation
C Code and the Art of ObfuscationC Code and the Art of Obfuscation
C Code and the Art of Obfuscation
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Kotlin for Android Developers
Kotlin for Android DevelopersKotlin for Android Developers
Kotlin for Android Developers
 
A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
Let's build a parser!
Let's build a parser!Let's build a parser!
Let's build a parser!
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
10. R getting spatial
10.  R getting spatial10.  R getting spatial
10. R getting spatial
 
C++11
C++11C++11
C++11
 
Intro to PySpark: Python Data Analysis at scale in the Cloud
Intro to PySpark: Python Data Analysis at scale in the CloudIntro to PySpark: Python Data Analysis at scale in the Cloud
Intro to PySpark: Python Data Analysis at scale in the Cloud
 
Declare Your Language: Transformation by Strategic Term Rewriting
Declare Your Language: Transformation by Strategic Term RewritingDeclare Your Language: Transformation by Strategic Term Rewriting
Declare Your Language: Transformation by Strategic Term Rewriting
 
Tuga IT 2018 Summer Edition - The Future of C#
Tuga IT 2018 Summer Edition - The Future of C#Tuga IT 2018 Summer Edition - The Future of C#
Tuga IT 2018 Summer Edition - The Future of C#
 
R getting spatial
R getting spatialR getting spatial
R getting spatial
 
AiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdfAiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdf
 
AI Deeplearning Programming
AI Deeplearning ProgrammingAI Deeplearning Programming
AI Deeplearning Programming
 
C to perl binding
C to perl bindingC to perl binding
C to perl binding
 
Compiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term RewritingCompiler Construction | Lecture 5 | Transformation by Term Rewriting
Compiler Construction | Lecture 5 | Transformation by Term Rewriting
 
Python 101 language features and functional programming
Python 101 language features and functional programmingPython 101 language features and functional programming
Python 101 language features and functional programming
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 

Último (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

How we use Instaparse

  • 1. How we use Instaparse Clojure meetup Helsinki, 21/06/2022 Julien Bille
  • 2. Agenda ● Who I am? ● What is the problem? ● What is Instaparse? ● How we used it?
  • 3. Who I am? {:name "Julien Bille" :from "France" :living-in "The Netherlands" :working-for "IPRally" :job-title "Fullstack Developer" }
  • 4. IPRally A search engine for patent powered by unique AI knowledge graph technology.
  • 5.
  • 6. What is the problem? Add boolean support to our product: - Validate the syntax - Being compatible with the most popular syntax (no standard) - Query our database - Prevent malicious query (TACD:( ("electric bike” OR "electric $w4 bike” OR electricbike) AND (photovoltaic OR solar) AND cell* ) AND AUTHORITY:( US OR JP OR EP OR WO ) AND PBD_Y:[2000 to *]) NOT ( ALL_AN:Gazelle ) An example:
  • 8. What is Instaparse? Instaparse is a Clojure library aims to be the simplest way to build context free grammar parsers in Clojure. First version release in 2013 https://github.com/Engelberg/instaparse CLJC compatible
  • 9. Context free grammar? In practical term we want: (defn text->tree [text-input rules]) (defn tree->sql [tree])
  • 10. An example (def as-and-bs (insta/parser "S = AB* AB = A B A = 'a'+ B = 'b'+")) (as-and-bs "aaaaabbbaaaabb") [:S [:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]
  • 11. A solution (TACD:( ("electric bike” OR "electric $w4 bike” OR electricbike) AND (photovoltaic OR solar) AND cell* ) AND AUTHORITY:( US OR JP OR EP OR WO ) AND PBD_Y:[2000 to *]) NOT ( ALL_AN:Gazelle ) Split into smaller pieces: ● word: photovoltaic ● quote-word: “electric bike” ● field-word: T:elect* ● operation: AND, OR, NOT ● list : (a b c) ● field-list: TA:(a b c)
  • 12. A solution (def boolean-parser (insta/parser "S = (exp )* exp = word | op | quote-word | list | fields-list word = fields?#'[a-zA-Z0-9_*/$#?-]+' op = <space>'OR'<space> | <space>'AND'<space> | <space>'NOT'<space> quote-word = fields? <'"'> (word<space> )* <'"'> list = <lparen> (exp )* <rparen> fields-list = fields list fields = field <':'><space> field = 'T' | 'TA' | 'A' <lparen> = <'('><space> <rparen> = <space><')'> <space> = <#'[ ]*'>" ))
  • 13. Examples (boolean-parser "electric") [:S [:exp [:word "electric"]]] (boolean-parser "(electric)") [:S [:exp [:list [:exp [:word "electric"]]]]] (boolean-parser "(electric OR bike)" ) [:S [:exp [:list [:exp [:word "electric"]] [:exp [:op "OR"]] [:exp [:word "bike"]]]]]
  • 14. Examples (boolean-parser "(electric") Parse error at line 1, column 10: (boolean-parser "T:electric") [:S [:exp [:word [:fields [:field "T"]] "electric"]]] (boolean-parser "TA:electric") [:S [:exp [:word [:fields [:field "TA"]] "electric"]]]
  • 15. Examples (boolean-parser "TAC:electric") Parse error at line 1, column 4: TAC:electric ^ Expected one of: "A" "TA" "T" "(" """ "NOT" "AND" "OR" #"[a-zA-Z0-9_*/$#?-]+"
  • 16. Example (boolean-parser "elec*") [:S [:exp [:word "elec*"]]] (boolean-parser "(TA:("electric bike" OR "electric $w4 bike"))" ) [:S [:exp [:list [:exp [:fields-list [ :fields [:field "TA"]] [ :list [ :exp [:quote-word [:word "electric"] [:word "bike"]]] [ :exp [:op "OR"]] [ :exp [:quote-word [:word "electric"] [:word "$w4"] [:word "bike"]]]]]]]]]
  • 17. Learning 1. Leaving room for human typo 2. Recursion comes for free 3. Don’t make it too complicated: used insta/transform, clojure.walk or clojure.zip instead
  • 18. Transform example (->> (boolean-parser "(TA:("electric bike" OR "electric $w4 bike"))") (insta/transform { :quote-word transform-quote-word})) [:S [:exp [:list [:exp [:fields-list [ :fields [:field "TA"]] [ :list [ :exp [:quote-word {:words ("electric" "bike")}]] [ :exp [:op "OR"]] [ :exp [:quote-word {:words ("electric" "$w4" "bike")}]]]]]]]]