Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Natural Language to SQL Query conversion using Machine Learning Techniques on HPCC Systems
1. RV College of
Engineering
Go, change the world
1
Dr. G. Shobha
Professor, CSE Department
RV College of Engineering, Bengaluru - 59
Natural Language to SQL Query conversion using
Machine Learning Techniques on HPCC Systems
Platform
2. RV College of
Engineering
PRESENTATION CONTENTS
2
• Introduction and Motivation
• Components involved in NLP for NL to SQL Conversion
• Rule Based Architecture for NL to SQL conversion
• Machine Learning Based Architecture to Enrich NL for SQl
Conversion
• HPCC Systems Architecture
• Results & Conclusions
3. RV College of
Engineering
Introduction and Motivation
3
Key Factors of NL to SQL
Go, change the world
• Databases serve as the forefront for most systems today.
• Structured query language (SQL) is used to access and manipulate the
data stored in a relational database.
• Most end users have limited knowledge of SQL and thus face
difficulties in accessing such
• Critical to access the data
• Learn the Querying language and understand the various syntax
4. RV College of
Engineering
4
Components Involved in NLP for NL to SQL
Components of NLP
NLP
Part of Computer Science and Artificial Intelligence
which deals with Human Languages
Go, change the world
6. RV College of
Engineering
Rule Based Architecture for NL to SQl Conversion
6
Preprocessor
• Tokenizes the natural language input.
• Remove the redundant tokens
• The output of the preprocessor is duplicated
and supplied to two major components
- Entity Recognizer
- Intent Recognizer
Entity Recognizer
• entity extractor
• a classifier
• a filter.
Go, change the world
7. RV College of
Engineering
Rule Based Architecture for NL to SQl Conversion
7
Entity Extractor
• uses parts of speech tagging and a date parser to extract important
keywords from the sentence
• strong probable to form relation names, attribute names or data
• These are then fed into a classifier along with the user defined schema
mappings of relation names and attribute names.
Classifier
• The classifier uses various checks such as Direct, Concatenation, N gram,
hypernyms, synonyms to discriminate the keywords into relation names,
attribute names and residual keywords.
Filter
• The residual words are filtered to extract the words that form part of the data items of
the SQL query.
Go, change the world
8. RV College of
Engineering
Rule Based Architecture for NL to SQl Conversion
8
Intent Recognizer
• Process of creating a template of the SQL
query by performing checks for each SQL
clause.
• Various techniques such as the context
identification, distance metric, keyword
spotting, grammar rules etc. are applied to
check for the existence of a particular clause.
Go, change the world
9. RV College of
Engineering
Rule Based Architecture for NL to SQl Conversion
9
Challenges faced
• Specific Schema
• Identification of partial or implied data values
• Identification of descriptive values
Go To Solution : Machine Learning Techniques for NL to SQL
Go, change the world
10. RV College of
Engineering
10
Technologies Involved in Machine Learning for NLP to SQL
Feedforward neural networks
Recurrent Neural Networks (RNNs)
• Networks with feedback loops (recurrent edges)
• Output at current time step depends on current input as well
• as previous state (via recurrent edges)
Training RNNs
Problem: can’t capture long-term dependencies due to vanishing/exploding gradients during backpropagation
Go, change the world
11. RV College of
Engineering
11
Technologies Involved in ML for NLP to SQL
Go To Solution : Long Short Term Memory Model
A type of RNN architecture that addresses the vanishing/exploding gradient problem and allows learning of
long-term dependencies
Recently risen to prominence with state-of-the-art performance in speech recognition, language modeling, translation,
image captioning
Go, change the world
14. RV College of
Engineering
14
Data Set Extraction
Go, change the world
• Data extracted from RDBMS
• Apache Common CSV Library - used to extract the dataset in the
form of CSV file
• Attributes which contain descriptive values’ (Ex: Experience,
Description. etc) is also provided as input.
• Three separate components work synchronously to extract
maximum latent information from the dataset, which can either
be used to enrich the natural language or be stored to use during
conversion.
Partial and Implied Values
• Pre-processing techniques
• Embedding Layer
• Long Short Term Memory
• Classification of Inputs
Machine Learning for Implied Data Values
18. RV College of
Engineering
18
Proposed Model – Implied Data Values
Classification of Inputs
• The input Natural Language query is tokenized and
split into different sequences.
• Sequences of 1 word (1-gram) up to sequences of n
words (n-gram, where n is determined by the number
of tokens) is considered for prediction.
• The largest sequences and its classification are
considered (i.e., sub-sequences are ignored).
The final, high confidence classifications given by the
LSTM model can be used in multiple ways, couple of
them are outlined below:
• Enrich the Natural Language query
• Store the data values and attribute names
Go, change the world
19. RV College of
Engineering
19
Elastic Search –Descriptive Values
Go, change the world
Elastic Search
Stop Analyzer : Discards the Stop words
Ex :
Input: Get the doctors with masters degree
Analyzer: Get doctors masters degree
English Language Analyzer:
converts the words of the input query to its
root word.
Ex:
Input: Show all products which are red bikes.
Analyzer: Show all product which road bike
Components of Elastic Search
1. Analyzers
• The extracted CSV file is used to create an index in
Elastic Search.
• Elastic Search’s Bulk API provides the necessary
functions that can create and store large data
simultaneously.
20. RV College of
Engineering
20
Proposed Model – Descriptive Values
Go, change the world
Components of Elastic Search
2. Searching through multiple attributes
3. Generation of suitable fieldname-value pair in
WHERE clause
Multiple columns can be searched in Elastic
Search by using “multi_match” keyword
{ “query”:
{ “multi_match”:
{ “query”: input query,
“fields”:[list of descriptive
column names];
}
}
}
WHERE fieldname1 = value1 AND fieldname2 =
value2 AND.… fieldnameN = valueN
22. RV College of
Engineering
HPCC Systems Platform
22
Key Factors of HPCC Systems
Platform
Go, change the world
Go To Solutions : Synchronous Combination of Hybrid Machine Learning Model,
Elastic Search, WordNet , HPCC Systems Platform
• Highly integrated system environment
- capabilities from raw data processing to high-
performance queries and data analysis using a
common language;
• Optimized cluster approach
- provides high performance at a much lower system
cost than other system alternatives
• Stable and reliable processing environment proven in
production applications for varied organizations over a
15-year period;
• Innovative data-centric programming language (ECL)
• High-level of fault resilience and capabilities
• Suitable for a wide range of data-intensive
24. RV College of
Engineering
24
Results
Input Natural
Language Query
Enriched Natural
Language Query
Output SQL Query
show all unmarried
customers who are
men
show all single Gender
'male' customers
SELECT * FROM
t_cstmrs WHERE
LOWER( MaritalStatus )
= 'single' AND LOWER(
Gender ) = 'male'
Names of customers
who have graduated
and from germany
or france
FullName Names of
customers who have
Education 'graduate
degree' and from
CountryRegion
'germany' or
CountryRegion 'france'
SELECT
t_cstmrs.FullName
FROM t_cstmrs INNER
JOIN t_ggrphy ON
t_ggrphy.GeographyKey
=
t_cstmrs.GeographyKey
WHERE LOWER (
t_ggrphy.CountryRegion
) = 'germany' OR
LOWER
(t_ggrphy.CountryRegion
) = 'france' ) AND
(LOWER(
t_cstmrs.Education ) =
'graduate degree' )
Go, change the world
25. RV College of
Engineering
25
Results
get the price of red or dark helmet
get the price of Color 'red' or Color
‘black' ProductSubCategoryName
'helmet'
SELECT ListPrice , Color FROM
t_prdsubcat INNER JOIN t_prds ON
t_prdsubcat.ProductSubCategoryKey =
t_prds.ProductSubCategoryKey WHERE
LOWER( Color ) = 'red' OR LOWER(
Color ) = 'black'
how much does tire tube cost
how much does ProductName ‘road tire
tube’ cost
SELECT ListPrice , ProductName FROM
t_prds WHERE LOWER( ProductName ) =
'road tire tube'
get the orders from new south wales
australia
get the orders from StateProvince 'new
south wales' CountryRegion 'australia'
SELECT t_saldtls.OrderQuantity,
t_ggrphy.CountryRegion, t_
t_cstmrs.FullName , t_ggrphy.StateProvince
FROM t_ggrphy INNER JOIN t_cstmrs ON
t_cstmrs.GeographyKey =
t_ggrphy.GeographyKey INNER JOIN
t_saldtls ON t_cstmrs.CustomerKey =
t_saldtls.CustomerKey WHERE LOWER(
t_cstmrs.StateProvince) = 'new south wales'
AND LOWER( t_ggrphy.CountryRegion ) =
'australia'
show subtotal of orders for helmet
show subtotal of orders for
ProductSubCategoryName 'helmet’
SELECT SUM( t_saldtls.SalesOrderint )
FROM t_prds INNER JOIN t_saldtls
ON t_prds.ProductKey =
t_saldtls.ProductKey WHERE LOWER(
t_prds.ProductName ) = 'helmet'
Go, change the world
26. RV College of
Engineering
26
Results – Descriptive values
Go, change the world
Select an item with mountain wheel for entry-
level rider.
SELECT * FROM t_prds WHERE t_prds.Description = 'Replacement mountain wheel for entry-level rider.'
Name the items which have pioneering frame
technology as the HQ steel frame.
SELECT t_prds.ProductName FROM t_prds WHERE t_prds.Description = 'The same pioneering frame
technology is used to give you the highest value as the HQ steel frame.'
27. RV College of
Engineering
27
Conclusion
• Partial and implied data values in the natural language queries are identified by a trained hybrid
ML model.
• WordNet is also used as a safety net to understand implied data values where the vocabulary of
the input relational database is not expressive.
• Descriptive values are identified with the help of Elastic Search.
• The accuracy of the system is 91.7% on IMDb database
Go, change the world