This document outlines a natural language search solution. It identifies key elements in queries and converts them into connected expressions to query a MongoDB database. The solution includes a tokenizer to identify operands and operators. An expression parser uses the stream of tokens to build the equivalent MongoDB query. It supports various operators and integrates external knowledge bases to improve data intelligence. The search API acts as an endpoint for the natural language querying modules. The presentation concludes with an overview of QBurst's MongoDB expertise.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Running Natural Language Queries on MongoDB
1.
2. Deepak Krishnan | Consultant - Data Scientist
❏ Expert on various Big Data and Machine Learning initiatives
❏ Experienced in schema design for Big Data storage systems
Praveen Rajasekhar | Director - Business Development
❏ <bio to be updated>
❏ <bio to be updated>
Speakers
10. Solution
❏ Identify key operands & operators within natural language query
❏ Convert them into a series of connected expressions
❏ Dynamically build a query which runs against MongoDB instance
❏ Aggregate search results
[Revised]
13. ❏ Acts an FSA to access inverted index
❏ Emits annotations whenever a buffer matches an operator
❏ Ability to identify common data types such as date, time etc.
❏ Emits the matched expressions as a sequential stream of annotations
[Revised]
Tokenizer
14. Expression Parser
❏ Generated using parser generator
❏ Supports conjunction, disjunction, negation operators
❏ Responsible for taking in a stream of annotations and reducing it
❏ Creates the equivalent MongoDB query during reduction process
[Revised]
15. Expression Parser
Example: Show me Java or PHP openings
This will be reduced by
EXPR OR_OPERATOR EXPR
which has an RHS that will convert this to an OR query in MongoDB
16. External Knowledge Bases
❏ Integrated into the expression parser for data intelligence
❏ The application uses NLP date parsers, ConceptNet (knowledge bases)
❏ Improved data intelligence
[Revised]
18. Search API
❏ Acts as natural language quering modules
❏ Acts as a RESTful API endpoint to which clients can connect to via HTTP
Tokenizer
❏ Passes the stream of tokens to an expression parser
Summary
[Revised]
19. Expression Parser
❏ Uses series of tokens to make transitions in a finite state machine
❏ Ingestion of the tokens into the expression parser is based on a sliding
window model where the window size is dynamic
Summary
[Revised]
Modern-day software ecosystem is moving towards a better user experience. This has become critical for user retention.
Organizations have realized that, more the user interacts with their application, higher the chances to appreciate the business value of the application.
One of the most wanted feature of any user-centric application is the "Search" functionality.
Modern-day software ecosystem is moving towards a better user experience. This has become critical for user retention.
Organizations have realized that, more the user interacts with their application, higher the chances to appreciate the business value of the application.
One of the most wanted feature of any user-centric application is the "Search" functionality.
Modern-day software ecosystem is moving towards a better user experience. This has become critical for user retention.
Organizations have realized that, more the user interacts with their application, higher the chances to appreciate the business value of the application.
One of the most wanted feature of any user-centric application is the "Search" functionality.
Our solution is designed to identify key operands and operators within a natural language query and convert them into a series of connected expressions; which are used to dynamically build a query which runs against a MongoDB instance.
The final search results are aggregated and shown as standard results.
Our solution is designed to identify key operands and operators within a natural language query and convert them into a series of connected expressions; which are used to dynamically build a query which runs against a MongoDB instance.
The final search results are aggregated and shown as standard results.
The tokenizer is by itself an FSA that has access to the inverted index
Tokenizer emits annotations whenever its buffer matches an operator or an entry in the inverted index
Tokenizer also has the ability to identify common data types such as date, time etc
Tokenizer emits the matched expressions as a sequential stream of annotations
The tokenizer is by itself an FSA that has access to the inverted index
Tokenizer emits annotations whenever its buffer matches an operator or an entry in the inverted index
Tokenizer also has the ability to identify common data types such as date, time etc
Tokenizer emits the matched expressions as a sequential stream of annotations
The expression parser is by itself generated by a parser generator
It supports conjunction, disjunction, negation operators and has an associated precedence assigned
The parser generator is responsible for taking in a stream of annotations and reducing it.
During the reduction process, the expression parser creates the equivalent MongoDB query
The process is similar to parsing an expression using a CFG and then at each step reducing the expression to a value (which in this case is the MongoDB query)
-There are some external knowledge bases and parsers integrated into the Expression parser to make it more intelligent
-We use custom NLP date parsers, knowledge bases such as conceptnet to increase the intelligence
The expression parser by itself is a state machine that uses these series of tokens to make transitions in a finite state machine.
Some tokens do cause the finite state machine to reach a final state, in which case expressions are built and stored.
The ingestion of the tokens into the expression parser is based on a sliding window model where the window size is dynamic.