Quepy

•Descargar como ODP, PDF•

4 recomendaciones•2,499 vistas

Querying your database in natural language was a presentation done during PyData Silicon Valley 2014, based on the quepy software project. More information at: http://pydata.org/sv2014/abstracts/#197 https://github.com/machinalis/quepy

Software

Querying your database in natural language
PyData – Silicon Valley 2014
Daniel F. Moisset – dmoisset@machinalis.com

Data is everywhere
Collecting data is not the problem, but what to do with it
Any operation starts with selecting/filtering data

A classical approach
Used by:
●
Google
●
Wikipedia
●
Lucene/Solr
Performance can be improved:
●
Stemming/synonyms
●
Sorting data by relevance
Search

Query Languages
●
SQL
●
Many NOSQL approaches
●
SPARQL
●
MQL
Allow complex, accurate
queries
SELECT array_agg(players), player_teams
FROM (
SELECT DISTINCT t1.t1player AS players, t1.player_teams
FROM (
SELECT
p.playerid AS t1id,
concat(p.playerid,':', p.playername, ' ') AS t1player,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t1
INNER JOIN (
SELECT
p.playerid AS t2id,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams

Natural Language Queries
Getting popular:
●
Wolfram Alpha
●
Apple Siri
●
Google Now
Pros and cons:
●
Very accessible, trivial
learning curve
●
Still weak in its coverage:
most applications have a
list of “sample questions”

Outline of this talk: the Quepy approach
●
Overview of our solution
●
Simple example
●
DSL
●
Parser
●
Question Templates
●
Quepy applications
●
Benefits
●
Limitations

Quepy
●
Open Source (BSD License)
https://github.com/machinalis/quepy
●
Status: usable, 2 demos available (dbpedia + freebase)
Online demo at: http://quepy.machinalis.com/
●
Complete documentation:
http://quepy.readthedocs.org/en/latest/
●
You're welcome to get involved!

Overview of the approach
●
Parsing
●
Match + Intermediate representation
●
Query generation & DSL
“What is the airspeed velocity of an unladen swallow?”
What|what|WP is|be|VBZ the|the|DT
airspeed|airspeed|NN velocity|velocity|NN
of|of|IN an|an|DT unladen|unladen|JJ
swallow|swallow|NN
SELECT DISTINCT ?x1 WHERE {
?x0 kingdom "Animal".
?x0 name "unladen swallow".
?x0 airspeed ?x1.
}

Parsing
●
Not done at character level but at a word level
●
Word = Token + Lemma + POS
“is” → is|be|VBZ (VBZ means “verb, 3rd
person, singular, present
tense”)
“swallows” → swallows|swallow|NNS (NNS means “Noun, plural”)
●
NLTK is smart enough to know that “swallows” here means the
bird (noun) and not the action (verb)
●
Question rule = “regular expressions”
Token("what") + Lemma("be") + Question(Pos("DT")) + Plus(Pos(“NN”))
The word “what” followed by any variant of the “to be” verb, optionally followed by a
determiner (articles, “all”, “every”), followed by one or more nouns

Intermediate representation
● Graph like, with some known values and some holes (x0
,
x1
, …). Always has a “root” (house shaped in the picture)
●
Similar to knowledge databases
●
Easy to build from Python code

Code generator
●
Built-in for MQL
●
Built-in for SPARQL
●
Possible approaches for SQL, other languages
●
DSL - guided
● Outputs the query string (Quepy does not connect to a
database)

DSL
class DefinitionOf(FixedRelation):
Relation =
"/common/topic/description"
reverse = True
class IsMovie(FixedType):
fixedtype = "/film/film"
class IsPerformance(FixedType):
fixedtype = "/film/performance"
class PerformanceOfActor(FixedRelation):
relation = "/film/performance/actor"
class HasPerformance(FixedRelation):
relation = "/film/film/starring"
class NameOf(FixedRelation):
relation = "/type/object/name"
reverse = True

DSL
Given a thing x0
, its definition:
DefinitionOf(x0)
Given an actor x2
, movies where x2
acts:
performances = IsPerformance() + PerformanceOfActor(x2)
movies = IsMovie() + HasPerformance(performances)
x3 = NameOf(movies)

Parsing: Particles and templates
class WhatIs(QuestionTemplate):
regex = Lemma("what") + Lemma("be") +
Question(Pos("DT")) + Thing() + Question(Pos("."))
def interpret(self, match):
label = DefinitionOf(match.thing)
return label
class Thing(Particle):
regex = Question(Pos("JJ")) + Plus(Pos("NN") | Pos("NNP") | Pos("NNS"))
def interpret(self, match):
return HasKeyword(match.words.tokens)

Parsing: “movies starring <actor>”
●
More DSL:
class IsPerson(FixedType):
fixedtype = "/people/person"
fixedtyperelation = "/type/object/type"
class IsActor(FixedType):
fixedtype = "Actor"
fixedtyperelation = "/people/person/profession"

Parsing: A more complex particle
●
And then a new Particle:
class Actor(Particle):
regex = Plus(Pos("NN") | Pos("NNS") | Pos("NNP") | Pos("NNPS"))
def interpret(self, match):
name = match.words.tokens
return IsPerson() + IsActor() + HasKeyword(name)

Parsing: A more complex template
class ActedOnQuestion(QuestionTemplate):
acted_on = (Lemma("appear") | Lemma("act") | Lemma("star"))
movie = (Lemma("movie") | Lemma("movies") | Lemma("film"))
regex = (Question(Lemma("list")) + movie + Lemma("with") + Actor()) |
(Question(Pos("IN")) + (Lemma("what") | Lemma("which")) +
movie + Lemma("do") + Actor() + acted_on + Question(Pos("."))) |
(Question(Lemma("list")) + movie + Lemma("star") + Actor())
“list movies with Harrison Ford”
“list films starring Harrison Ford”
“In which film does Harrison Ford appear?”

Parsing: A more complex template
class ActedOnQuestion(QuestionTemplate):
# ...
def interpret(self, match):
performance = IsPerformance() + PerformanceOfActor(match.actor)
movie = IsMovie() + HasPerformance(performance)
movie_name = NameOf(movie)
return movie_name

Apps: gluing it all together
●
You build a Python package with quepy startapp myapp
●
There you add dsl and questions templates
●
Then configure it editing myapp/settings.py (output query
language, data encoding)
You can use that with:
app = quepy.install("myapp")
question = "What is love?"
target, query, metadata = app.get_query(question)
db.execute(query)

The good things
●
Effort to add question templates is small (minutes-hours),
and the benefit is linear wrt effort
●
Good for industry applications
●
Low specialization required to extend
●
Human work is very parallelizable
●
Easy to get many people to work on questions
●
Better for domain specific databases

Limitations
●
Better for domain specific databases
●
It won't scale to massive amounts of question templates
(they start to overlap/contradict each other)
●
Hard to add computation (compare: Wolfram Alpha) or
deduction (can be added in the database)
●
Not very fast (this is an implementation, not design issue)
●
Requires a structured database

Future directions
●
Testing this under other databases
●
Improving performance
●
Collecting uncovered questions, add machine learning to
learn new patterns.

Q & A
You can also reach me at:
dmoisset@machinalis.com
Twitter: @dmoisset
http://machinalis.com/

Más contenido relacionado

Destacado

Most end users can't write a database query, and yet, they often have the need to access information that keyword-based searches can't retrieve precisely. Lately, there's been an explosion of proprietary Natural Language Interfaces to knowledge databases, like Siri, Google Now and Wolfram Alpha. On the open side, huge knowledge bases like DBpedia and Freebase exists, but access to them is typically limited to using formal database query languages. We implemented Quepy as an approach to provide a solution for this problem. Quepy is an open source framework to transform Natural Language questions into semantic database queries that can be used with popular knowledge databases like, for example, DBPedia and Freebase. So instead of requiring end users to learn to write some query language, a Quepy Application can fills the gap, allowing end users to make their queries in "plain English". In this talk we would discuss the techniques used in Quepy, what additional work can be done, and its limitations.

Querying your database in natural language by Daniel Moisset PyData SV 2014

PyData

NLIDB(Natural Language Interface to DataBases)

Swetha Pallati

Iepy pydata-amsterdam-2016

dmoisset

Running Natural Language Queries on MongoDB

MongoDB

Natural Language Processing with Graph Databases and Neo4j

William Lyon

Introduction to Web Analytics

David Hachez

Module 1 introduction to web analytics

Gayathri Choda

Introduction to Web Analytics - Zach Olsen Stukent Expert Session

Stukent Inc.

Topic 1 Introduction to web analytics

Jigsaw Academy

A practical introduction to Web analytics for technical communicators

Samartha Vashishtha

Lumify is an open source platform for big data analysis and visualization, designed to help organizations derive actionable insights from the large volumes of diverse data flowing through their enterprise. Utilizing both Hadoop and Storm, it ingests and integrates virtually any kind of data, from unstructured text documents and structured datasets, to images and video. Several open source analytic tools (including Tika, OpenNLP, CLAVIN, OpenCV, and ElasticSearch) are used to enrich the data, increase its discoverability, and automatically uncover hidden connections. All information is stored in a secure graph database implemented on top of Accumulo to support cell-level security of all data and metadata elements. A modern, browser-based user interface enables analysts to explore and manipulate their data, discovering subtle relationships and drawing critical new insights. In addition to full-text search, geospatial mapping, and multimedia processing, Lumify features a powerful graph visualization supporting sophisticated link analysis and complex knowledge representation. Charlie Greenbacker, Director of Data Science at Altamira, will provide an overview of Lumify and discuss how natural language processing (NLP) tools are used to enrich the text content of ingested data and automatically discover connections with other bits of information. Joe Ferner, Senior Software Engineer at Altamira, will describe the creation of SecureGraph and how it supports authorizations, visibility strings, multivalued properties, and property metadata in a graph database.

Natural Language Processing and Graph Databases in Lumify

Charlie Greenbacker

SAS University Edition - Getting Started

Craig Trim

TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...

Grokking VN

An introduction to Google Analytics - tracking and understanding user behaviour on your website. Sussex guru Julian Erbsloeh - Head of Insight at Fresh Egg, explains in a little detail to the members of Worthing and Adur Chamber of Commerce at Northbrook College, Worthing. For a PDF version of the presentation please click the link here https://goo.gl/CV6CLD See and hear Julian on the YouTube Video version of this presentation: https://youtu.be/WdH1NBnnooU

Introduction to web analytics and the Google analytics platform pdf

Martin Bloomfield

To know more about web analytics and internet marketing log on to: http://www.iexpertsforum.com/smf/index.php Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing web usage. To assess the performance and to improve your website , it is imperative that you understand the key performance indicators of your site like the traffic, hits, and many more concepts. Web analytics help you in having a thorough analysis, of how your site is performing which helps you to optimize your site to suit your needs as well as your customer's and clients.

An Introduction to Web Analytics

iexpertsforum

Project humix overview - For Raspberry pi community meetup

Jeffrey Liu

Bridging the gap from data science to service

dmoisset

台灣樹莓派 2016/12/26 #17 站在Nas的中心呼喊物聯網 QNAP QIoT

Anderson Cheng

Destacado (18)

Querying your database in natural language by Daniel Moisset PyData SV 2014

NLIDB(Natural Language Interface to DataBases)

Iepy pydata-amsterdam-2016

Running Natural Language Queries on MongoDB

Natural Language Processing with Graph Databases and Neo4j

Introduction to Web Analytics

Module 1 introduction to web analytics

Introduction to Web Analytics - Zach Olsen Stukent Expert Session

Topic 1 Introduction to web analytics

A practical introduction to Web analytics for technical communicators

Natural Language Processing and Graph Databases in Lumify

SAS University Edition - Getting Started

TechTalk #13 Grokking: Marrying Elasticsearch with NLP to solve real-world se...

Introduction to web analytics and the Google analytics platform pdf

An Introduction to Web Analytics

Project humix overview - For Raspberry pi community meetup

Bridging the gap from data science to service

台灣樹莓派 2016/12/26 #17 站在Nas的中心呼喊物聯網 QNAP QIoT

Similar a Quepy

Clojure/conj 2017

Darren Kim

Fast REST APIs Development with MongoDB

MongoDB

Python utan-stodhjul-motorsag

niklal

Mehar Singh, CEO of ProCogia, and Jason Grahn, Senior Business Analyst at Apptio, co-present on the journey from Excel to R at the second Bellevue chapter useR Group Meetup. If we’re producing analysis that drives business decision making, that’s production-grade code! This talk will address this question, which in turn shows why R is the way to go – assumptions are built into the code and enables the analyst to automate & reproduce their efforts. This presentation includes: - Data importing (opening a CSV or connecting to a SQL in both tools) - Filtering, grouping, summarizing (pivot tables in Excel vs. tidy code in R) - Visualizations (charts in excel vs ggplot in R)

Is your excel production code?

ProCogia

Intro to Machine Learning with TF- workshop

Prottay Karim

This technical session provides a hands-on introduction to TensorFlow using Keras in the Python programming language. TensorFlow is Google’s scalable, distributed, GPU-powered compute graph engine that machine learning practitioners used for deep learning. Keras provides a Python-based API that makes it easy to create well-known types of neural networks in TensorFlow. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to train neural networks of much greater complexity. Deep learning allows a model to learn hierarchies of information in a way that is similar to the function of the human brain.

Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017

StampedeCon

이 발표에서는 TensorFlow의 지난 1년을 간단하게 돌아보고, TensorFlow의 차기 로드맵에 따라 개발 및 도입될 예정인 여러 기능들을 소개합니다. 또한 2017년 및 2018년의 머신러닝 프레임워크 개발 트렌드와 방향에 대한 이야기도 함께 합니다. In this talk, I look back the TensorFlow development over the past year. Then discusses the overall development direction of machine learning frameworks, with an introduction to features that will be added to TensorFlow later on.

The Flow of TensorFlow

Jeongkyu Shin

GreenDao Introduction

Booch Lin

At the Dublin Fashion Insights Centre, we are exploring methods of categorising the web into a set of known fashion related topics. This raises questions such as: How many fashion related topics are there? How closely are they related to each other, or to other non-fashion topics? Furthermore, what topic hierarchies exist in this landscape? Using Clojure and MLlib to harness the data available from crowd-sourced websites such as DMOZ (a categorisation of millions of websites) and Common Crawl (a monthly crawl of billions of websites), we are answering these questions to understand fashion in a quantitative manner. The latest generation of big data tools such as Apache Spark routinely handle petabytes of data while also addressing real-world realities like node and network failures. Spark's transformations and operations on data sets are a natural fit with Clojure's everyday use of transformations and reductions. Spark MLlib's excellent implementations of distributed machine learning algorithms puts the power of large-scale analytics in the hands of Clojure developers. At Zalando's Dublin Fashion Insights Centre, we're using the Clojure bindings to Spark and MLlib to answer fashion-related questions that until recently have been nearly impossible to answer quantitatively. Hunter Kelly @retnuh tech.zalando.com

Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk

Zalando Technology

Session 1.5 supporting virtual integration of linked data with just-in-time...

semanticsconference

Getting started with tensor flow datasets

Godfrey Nolan

React.js Basics - ConvergeSE 2015

Robert Pearce

The Tidyverse and the Future of the Monitoring Toolchain

John Rauser

DSLs for fun and profit by Jukka Välimaa

Montel Intergalactic

Scala: Functioneel programmeren in een object georiënteerde wereld

Werner Hofstra

30 分鐘學會實作 Python Feature Selection

James Huang

El escritor francés Raymond Queneau escribió a mediados del siglo XX un libro llamado "Ejercicios de Estilo" donde mostraba una misma historia corta, redactada de 99 formas distintas. En esta plática realizaremos el mismo ejercicio con un programa de software. Abarcaremos distintos estilos y paradigmas: programación monolítica, orientada a objetos, relacional, orientada a aspectos, monadas, map-reduce, y muchos otros, a través de los cuales podremos apreciar la riqueza del pensamiento humano aplicado a la computación. Esto va mucho más allá de un ejercicio académico; el diseño de sistemas de gran escala se alimenta de esta variedad de estilos. También platicaremos sobre los peligros de quedar atrapado bajo un conjunto reducido de estilos a lo largo de tu carrera, y la necesidad de verdaderamente entender distintos estilos al diseñar arquitecturas de sistemas de software. Semblanza del conferencista: Crista Lopez es profesora en la Facultad de Ciencias Computacionales de la Universidad de California en Irvine. Su investigación se enfoca en prácticas de ingeniería de software para sistemas de gran escala. Previamente, fue miembro fundador del equipo en Xerox PARC creador del paradigma de programación orientado a aspectos (AOP). Crista es una de las desarrolladoras principales de OpenSimulator, una plataforma open source para crear mundos virtuales 3D. También es fundadora de Encitra, empresa especializada en la utilización de la realidad virtual para proyectos de desarrollo urbano sustentable. @cristalopes

Ejercicios de estilo en la programación

Software Guru

30 分鐘學會實作 Python Feature Selection

James Huang

Penerapan text mining menggunakan python

Andreas Chandra

Clojure Small Intro

John Vlachoyiannis

Similar a Quepy (20)

Clojure/conj 2017

Fast REST APIs Development with MongoDB

Python utan-stodhjul-motorsag

Is your excel production code?

Intro to Machine Learning with TF- workshop

Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017

The Flow of TensorFlow

GreenDao Introduction

Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk

Session 1.5 supporting virtual integration of linked data with just-in-time...

Getting started with tensor flow datasets

React.js Basics - ConvergeSE 2015

The Tidyverse and the Future of the Monitoring Toolchain

DSLs for fun and profit by Jukka Välimaa

Scala: Functioneel programmeren in een object georiënteerde wereld

30 分鐘學會實作 Python Feature Selection

Ejercicios de estilo en la programación

30 分鐘學會實作 Python Feature Selection

Penerapan text mining menggunakan python

Clojure Small Intro

Último

Many specialized tools cater to distinct stages within the software development lifecycle (SDLC). These tools target various aspects of development, delivery, and operations, each with its unique strengths. Uniting these diverse testing needs into a single continuous testing platform presents several challenges. Such a platform must seamlessly integrate with various development tools and environments, accommodate different testing methodologies, and remain flexible to adapt to organizational processes and quality standards.

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...

kalichargn70th171

LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456

KiaraTiradoMicha

(Vivek)Call Us, 8448380779,Call girls in Delhi NCr – We Offer best in class call girls. escort Service At Affordable Price At low Rate with Space Night 8000 We Are One Of The Oldest Escort and Call girls Agencies in Delhi. You Will Find That Our Female Escorts Are Full Of Fun, Sexy And They Would Love Enjoy Your Company. We Have A Fantastic Selection Of Escort Ladies Available For In-Calls As Well As Out-Calls. Our Escorts Are Not Only Beautiful But All Have Great Personalities Making Them The Perfect Companion For Any Occasion. In-Call:- You Can Come At Our Place in Delhi Our place Which Is Very Clean Hygienic 100% safe Accommodation. Out-Call:- You have To Come Pick The Girl From My Place We Are Also Provide Door Step Services (Delhi Ncr, Noida, Gurgaon, Faridabad, Ghaziabad Note:- Pic Collectors Time Passers Bargainers Stay Away As We Respect The Value For Your Money Time And Expect The Same From You Hygienic:- Full Ac room And Clean Rooms Available In Hotel 24 * 7 Hourly In Delhi NCR More Details, With WhatsApp Number, +91-8448380779

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified

Delhi Call girls

Unlocking the Future of AI Agents with Large Language Models

aagamshah0812

Data spaces in distributed environments should be allowed to evolve in agile ways providing data space owners with large flexibility about which data they store. Agility and heterogeneity, however, jeopardize data exchanges because representations may build on varying ontologies and data consumers may not rely on the semantic correctness of their queries in the context of semantically heterogeneous, evolving data spaces. Graph data spaces are one example of a powerful model for representing and querying data whose semantics may change over time. To assert and enforce conditions on individual graph data spaces, shape languages (e.g SHACL) have been developed. We investigate the question of how querying and programming can be guarded by reasoning over SHACL constraints in a distributed setting and we sketch a picture of how a future landscape based on semantically heterogeneous data spaces might look like.

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Steffen Staab

A great deal of attention in medical devices has shifted towards cybersecurity with the ratification of section 524B of the FD&C act. This new law enables the FDA to enforce cybersecurity controls in any medical device that is capable of networked communications or that has software. In this webinar we will recap the process for managing vulnerabilities, identify categories of vulnerabilities and solutions and more.

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

ICS

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Delhi Call girls

Looking for an efficient way to manage your finances? Look no further than our money management app. With easy-to-use features, you can track your expenses, create budgets, and monitor your savings goals all in one place. Our app provides real-time updates on your spending habits and helps you make smarter financial decisions. Take control of your finances today with our user-friendly money management app.

Right Money Management App For Your Financial Goals

Jhone kinadey

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Delhi Call girls

At the recent Microsoft Ignite 2023 conference, Microsoft unveiled a groundbreaking strategy that will redefine enterprise work management. The plan involves integrating Microsoft’s key planning tools, Microsoft To Do, Microsoft Planner, and Microsoft Project for the web into a unified experience called “Microsoft Planner.” What does this new strategy from Microsoft mean for current users? Join us and learn how best to take advantage of this announcement while gaining a clear path on how to elevate the current state of Microsoft Planner from a basic task manager to a comprehensive tool for Enterprise Work Management using OnePlan. Learn how OnePlan’s integration with Microsoft Planner allows for strategic alignment with business goals through advanced features like strategic planning, portfolio management, resource management, financial management, and more!

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution

OnePlan Solutions

MakeMyPass" Online Bus Pass Management System illustrates the flow of activities and actions that occur within the system to accomplish specific tasks or use cases. This type of diagram focuses on representing the sequence of activities and decision points involved in a particular process. Below is an example outline and description of key elements that could be included in an Activity Diagram for the system:

BUS PASS MANGEMENT SYSTEM USING PHP.pptx

alwaysnagaraju26

Azure Native Qumulo scales elastically for common High Performance Compute (HPC) workloads based on application requirements for: Financial Services, Automotive, Genomics / Life Sciences, Media and Entertainment, Energy, Oil and Gas, etc. Performance can be dialed UP (and back down) much higher than the examples shown here. These slides offer a glimpse into ANQ's HPC capabilities, although at a smaller scale. We invite YOU to do your own testing (with a free ANQ trial) and work with us to test your HPC workloads in Azure.

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

ryanfarris8

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain

masabamasaba

In an era where security concerns are paramount, the integration of artificial intelligence (AI) into CCTV cameras has revolutionized surveillance capabilities. One of the most significant advancements is the ability to achieve real-time threat detection, enabling immediate responses to potential security breaches. This blog explores how AI is reshaping surveillance through real-time threat detection and the implications of this technology.

Optimizing AI for immediate response in Smart CCTV

shikhaohhpro

Define the academic and professional writing..pdf

PearlKirahMaeRagusta1

ManageIQ - Sprint 236 Review - Slide Deck

ManageIQ

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

kalichargn70th171

A Secure and Reliable Document Management System is Essential.docx

ComplianceQuest1

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein

masabamasaba

Technology has taken up space all over the world. From generating content with a single command on ChatGPT to getting your food served by Robots at your favorite restaurant, artificial advancements have ruled every space. Every industry is set to develop top-notch technology in every sector; finance, IT, healthcare, gaming, and banking, with competitive market standards. One of these rapidly growing industries is Mobile App Development. According to the Straits Research report, it is expected to reach USD 583.03 billion at a CAGR OF 12.8% between (2022 and 2030). It clearly shows how mobile app development has become an integral part of the digital landscape and revolutionized technology.

The Top App Development Trends Shaping the Industry in 2024-25 .pdf

ayushiqss

Quepy

1. Querying your database in natural language PyData – Silicon Valley 2014 Daniel F. Moisset – dmoisset@machinalis.com

2. Data is everywhere Collecting data is not the problem, but what to do with it Any operation starts with selecting/filtering data

3. A classical approach Used by: ● Google ● Wikipedia ● Lucene/Solr Performance can be improved: ● Stemming/synonyms ● Sorting data by relevance Search

4. A classical approach Used by: ● Google ● Wikipedia ● Lucene/Solr Performance can be improved: ● Stemming/synonyms ● Sorting data by relevance Search

5. Limits of keyword based approaches

6. Query Languages ● SQL ● Many NOSQL approaches ● SPARQL ● MQL Allow complex, accurate queries SELECT array_agg(players), player_teams FROM ( SELECT DISTINCT t1.t1player AS players, t1.player_teams FROM ( SELECT p.playerid AS t1id, concat(p.playerid,':', p.playername, ' ') AS t1player, array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams FROM player p LEFT JOIN plays pl ON p.playerid = pl.playerid GROUP BY p.playerid, p.playername ) t1 INNER JOIN ( SELECT p.playerid AS t2id, array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams FROM player p LEFT JOIN plays pl ON p.playerid = pl.playerid GROUP BY p.playerid, p.playername ) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id ) innerQuery GROUP BY player_teams

7. Natural Language Queries Getting popular: ● Wolfram Alpha ● Apple Siri ● Google Now Pros and cons: ● Very accessible, trivial learning curve ● Still weak in its coverage: most applications have a list of “sample questions”

8. Outline of this talk: the Quepy approach ● Overview of our solution ● Simple example ● DSL ● Parser ● Question Templates ● Quepy applications ● Benefits ● Limitations

9. Quepy ● Open Source (BSD License) https://github.com/machinalis/quepy ● Status: usable, 2 demos available (dbpedia + freebase) Online demo at: http://quepy.machinalis.com/ ● Complete documentation: http://quepy.readthedocs.org/en/latest/ ● You're welcome to get involved!

10. Overview of the approach ● Parsing ● Match + Intermediate representation ● Query generation & DSL “What is the airspeed velocity of an unladen swallow?” What|what|WP is|be|VBZ the|the|DT airspeed|airspeed|NN velocity|velocity|NN of|of|IN an|an|DT unladen|unladen|JJ swallow|swallow|NN SELECT DISTINCT ?x1 WHERE { ?x0 kingdom "Animal". ?x0 name "unladen swallow". ?x0 airspeed ?x1. }

11. Overview of the approach ● Parsing ● Match + Intermediate representation ● Query generation & DSL “What is the airspeed velocity of an unladen swallow?” What|what|WP is|be|VBZ the|the|DT airspeed|airspeed|NN velocity|velocity|NN of|of|IN an|an|DT unladen|unladen|JJ swallow|swallow|NN SELECT DISTINCT ?x1 WHERE { ?x0 kingdom "Animal". ?x0 name "unladen swallow". ?x0 airspeed ?x1. }

12. Parsing ● Not done at character level but at a word level ● Word = Token + Lemma + POS “is” → is|be|VBZ (VBZ means “verb, 3rd person, singular, present tense”) “swallows” → swallows|swallow|NNS (NNS means “Noun, plural”) ● NLTK is smart enough to know that “swallows” here means the bird (noun) and not the action (verb) ● Question rule = “regular expressions” Token("what") + Lemma("be") + Question(Pos("DT")) + Plus(Pos(“NN”)) The word “what” followed by any variant of the “to be” verb, optionally followed by a determiner (articles, “all”, “every”), followed by one or more nouns

13. Intermediate representation ● Graph like, with some known values and some holes (x0 , x1 , …). Always has a “root” (house shaped in the picture) ● Similar to knowledge databases ● Easy to build from Python code

14. Code generator ● Built-in for MQL ● Built-in for SPARQL ● Possible approaches for SQL, other languages ● DSL - guided ● Outputs the query string (Quepy does not connect to a database)

15. Code examples

16. DSL class DefinitionOf(FixedRelation): Relation = "/common/topic/description" reverse = True class IsMovie(FixedType): fixedtype = "/film/film" class IsPerformance(FixedType): fixedtype = "/film/performance" class PerformanceOfActor(FixedRelation): relation = "/film/performance/actor" class HasPerformance(FixedRelation): relation = "/film/film/starring" class NameOf(FixedRelation): relation = "/type/object/name" reverse = True

17. DSL class DefinitionOf(FixedRelation): Relation = "/common/topic/description" reverse = True class IsMovie(FixedType): fixedtype = "/film/film" class IsPerformance(FixedType): fixedtype = "/film/performance" class PerformanceOfActor(FixedRelation): relation = "/film/performance/actor" class HasPerformance(FixedRelation): relation = "/film/film/starring" class NameOf(FixedRelation): relation = "/type/object/name" reverse = True

18. DSL Given a thing x0 , its definition: DefinitionOf(x0) Given an actor x2 , movies where x2 acts: performances = IsPerformance() + PerformanceOfActor(x2) movies = IsMovie() + HasPerformance(performances) x3 = NameOf(movies)

19. Parsing: Particles and templates class WhatIs(QuestionTemplate): regex = Lemma("what") + Lemma("be") + Question(Pos("DT")) + Thing() + Question(Pos(".")) def interpret(self, match): label = DefinitionOf(match.thing) return label class Thing(Particle): regex = Question(Pos("JJ")) + Plus(Pos("NN") | Pos("NNP") | Pos("NNS")) def interpret(self, match): return HasKeyword(match.words.tokens)

20. Parsing: Particles and templates class WhatIs(QuestionTemplate): regex = Lemma("what") + Lemma("be") + Question(Pos("DT")) + Thing() + Question(Pos(".")) def interpret(self, match): label = DefinitionOf(match.thing) return label class Thing(Particle): regex = Question(Pos("JJ")) + Plus(Pos("NN") | Pos("NNP") | Pos("NNS")) def interpret(self, match): return HasKeyword(match.words.tokens)

21. Parsing: “movies starring <actor>” ● More DSL: class IsPerson(FixedType): fixedtype = "/people/person" fixedtyperelation = "/type/object/type" class IsActor(FixedType): fixedtype = "Actor" fixedtyperelation = "/people/person/profession"

22. Parsing: A more complex particle ● And then a new Particle: class Actor(Particle): regex = Plus(Pos("NN") | Pos("NNS") | Pos("NNP") | Pos("NNPS")) def interpret(self, match): name = match.words.tokens return IsPerson() + IsActor() + HasKeyword(name)

23. Parsing: A more complex template class ActedOnQuestion(QuestionTemplate): acted_on = (Lemma("appear") | Lemma("act") | Lemma("star")) movie = (Lemma("movie") | Lemma("movies") | Lemma("film")) regex = (Question(Lemma("list")) + movie + Lemma("with") + Actor()) | (Question(Pos("IN")) + (Lemma("what") | Lemma("which")) + movie + Lemma("do") + Actor() + acted_on + Question(Pos("."))) | (Question(Lemma("list")) + movie + Lemma("star") + Actor()) “list movies with Harrison Ford” “list films starring Harrison Ford” “In which film does Harrison Ford appear?”

24. Parsing: A more complex template class ActedOnQuestion(QuestionTemplate): # ... def interpret(self, match): performance = IsPerformance() + PerformanceOfActor(match.actor) movie = IsMovie() + HasPerformance(performance) movie_name = NameOf(movie) return movie_name

25. Apps: gluing it all together ● You build a Python package with quepy startapp myapp ● There you add dsl and questions templates ● Then configure it editing myapp/settings.py (output query language, data encoding) You can use that with: app = quepy.install("myapp") question = "What is love?" target, query, metadata = app.get_query(question) db.execute(query)

26. The good things ● Effort to add question templates is small (minutes-hours), and the benefit is linear wrt effort ● Good for industry applications ● Low specialization required to extend ● Human work is very parallelizable ● Easy to get many people to work on questions ● Better for domain specific databases

27. The good things ● Effort to add question templates is small (minutes-hours), and the benefit is linear wrt effort ● Good for industry applications ● Low specialization required to extend ● Human work is very parallelizable ● Easy to get many people to work on questions ● Better for domain specific databases

28. Limitations ● Better for domain specific databases ● It won't scale to massive amounts of question templates (they start to overlap/contradict each other) ● Hard to add computation (compare: Wolfram Alpha) or deduction (can be added in the database) ● Not very fast (this is an implementation, not design issue) ● Requires a structured database

29. Limitations ● Better for domain specific databases ● It won't scale to massive amounts of question templates (they start to overlap/contradict each other) ● Hard to add computation (compare: Wolfram Alpha) or deduction (can be added in the database) ● Not very fast (this is an implementation, not design issue) ● Requires a structured database

30. Future directions ● Testing this under other databases ● Improving performance ● Collecting uncovered questions, add machine learning to learn new patterns.

31. Q & A You can also reach me at: dmoisset@machinalis.com Twitter: @dmoisset http://machinalis.com/

32. Thanks!

Notas del editor

Hello everyone, my name is Daniel Moisset. I work at Machinalis, a company based in Argentina which builds data processing solutions for other companies. I&apos;m not a native English speaker, so please just wave a bit if I&apos;m not speaking clearly or just not making any sense. The topic I want to introduce today is about the use of natural language to query databases and a tool that implements a possible approach to solve this Issue Let me start by trying to show you why this problem is relevant. .
The problem I&apos;ll discuss today is not about how to get your data. If you&apos;re here, chances are you have more data that you can handle. The big problem today is to put to work all the data that comes from different sources and is piling up in some database. And of course, the first step at least of that problem is getting the data you want, that is, making &quot;queries&quot;. Of course you&apos;ll want to do more than queries later, but selecting the information you want is typically the first step
A typical approach for large bodies of text-based data is the “keyword” based approach. The basic idea is that the user provides a list of keywords, and the items that contain those keywords are retrieved. There are a lot of well known tricks to improve this, like detecting the relevance of documents with respect to user keywords, doing some preprocessing of the input and the index so I can find documents without an exact keyword match but a similar word instead, etc. This approach has proven very successful in many different contexts, with Google as a leading example of a large database that probably all of us query frequently using keyword-based queries, and many tools to build search bars into your software. It works so well that you might wonder if there&apos;s any significant improvement to make by trying a different approach.
Keyword-based lookups are really good when you know what you&apos;re looking for, typically the name of the entity you&apos;re interested in, or some entity that is uniquely related to that other entity. It&apos;s very simple to get information about Albert Einstein, or figuring out who proposed the Theory of Relativity even if I don&apos;t remember Albert Einstein&apos;s name.
However, it&apos;s not easy to Google &quot;What&apos;s the name of that place in California with a lot of movie studios?&quot; &quot;The one with the big white sign in the hill?&quot;. None of the keywords I used to formulate that question are very good, and other similar formulations will not help us. It&apos;s not a problem of having the data, even if I have a database containing records about movie studios and their locations, but a problem of how you interact with the database. Another problem of keyword-based lookups is that it is heavily dependent on data which is mainly textual. It works fine for the web, but if I have a database with flight schedules for many airlines, a keyword based search will provide me with a very limited interface for making queries. Even with a database with a lot of text, like the schedule for the conference, it&apos;s not easy to answer questions like &quot;Which PyData speakers are affiliated with the sponsors&quot; (without doing it manually)
The solution we have for this problem, which may be summarized as &quot;finding data by the stuff related to it&quot; are query languages. We have many of those, depending on how we want to structure our data. All of these allow us to write very accurate and very complicated queries. And by “us” I mean the people in this room, which are developers and data scientists. Which is the weakness of this approach: it&apos;s not an interface that you can provide to end-users. There&apos;s a lot of data that needs to be made available to people who can&apos;t or won&apos;t learn a complex language to access the information. Not because they&apos;re stupid, but because their field of expertise is another one.
That leaves us with a need to query structured, possibly non textual, related information in a way that does not require much expertise to the person making the queries. And a straightforward way to solve that need, is allowing the data to be queried in the language that the user already knows. Which brings us to the motivation for this talk. Natural language is getting as a popular way to make queries and/or enter commands. It provides a very user friendly experience, even when most current tools are somewhat limited in the coverage they can provide. By “coverage” here I mean how many of the relevant questions are actually understood by the computer. Currently, successful applications like the ones I show here have a guide to the user describing which forms of questions are &quot;valid&quot;
After this introduction and the motivation to the problem, let me outline where I&apos;m trying to get to during this talk: Some very smart people who work with me studied different approaches to a solution and came up with a tool called Quepy which implements that approach. Of course it&apos;s not the only possible approach, but it has several nice properties that are valuable to us in an industrial context. I&apos;ll describe the approach in general and get to a quick overview on how to code a simple quepy app. Then I&apos;ll discuss what we most like about quepy, and the limits to the scope of the problem it solves.
Just in case you&apos;re eager to see the code instead of listening to me, all of it is available and online, so I&apos;ll leave this slide for 10 seconds so you can get a picture, and then move on.
At it&apos;s core, the quepy approach is not unlike a compiler. The input is a string with a question, which is sent through a parser that builds a data structure, called an &quot;intermediate representation&quot;. That representation is then converted to a database query, which is the output from quepy. The parsing is guided by rules provided by the application writer, which describes what kind of questions are valid.
The conversion is guided by some declarative information about the structure of the database that the application writer must define. We call this definition the &quot;DSL&quot;, for Domain Specific Language. As you might have noted from this description, what we built is not an universal solution that you can throw over your database, but something that requires programming customization, both regarding on how to interact with the user and how to interact with your database.
Let&apos;s take a deeper look at the parser. The first step of the parser provided by Quepy is splitting the text into parts, a process also known as tokenization. Once this is done you have a sequence of word objects, containing information on each word: the token, which is the original word as appears in the text, the lemma, which is the root word for the token (the base verb &quot;speak&quot; for a wordlike &quot;speaking&quot;), and a part of speech tag, which indicates if the word is a noun, an adjective, a verb, etc. This list of words is then matched against a set of question templates. Each question template defines a pattern, which is something that looks like a regular expression, where patterns can describe property matches over the token, lemma, and/or part of speech.
Let&apos;s assume a valid match on the question template. In that case, the question template provides a little piece of code that builds the intermediate representation. The intermediate representation of a query is a small graph, where vertices are entities in the database, edges are relations between entities, and both vertices and edges can be labeled or left open. There&apos;s one special vertex called the &quot;head&quot; which is always open, that indicates what is the value for the &quot;answer&quot;. This is an abstract, backend independent representation of the query, although is thought mainly to use with knowledge databases, which usually have this graph structure and allow finding matching subgraphs. Quepy provides a way to build this trees from python code in a way that&apos;s quite more natural than just describing the structure top down. Trees are built by composing tree parts that have some meaningful semantics on your domain. Those components, along with the mapping of those semantics to your database schema form what we call the DSL
From the internal representation tree, and the DSL information it is possibleto automatically build a query string that can be sent to your database. At this time, we have built query generators for SPARQL, which is the defacto standard for knowledge databases, and MQL, the Metaweb Query Language (used by Google&apos;s Freebase). It might be possible to build custom generators for other languages, or use some kind of adapter (I know there are SPARQL endpoints that you can put in front of a SQL database for example). The DSL information needed here is somewhat schema specific but is very simple to define, in a declarative way.
Let me show you some code examples, making queries on freebase with a couple of sample templates questions. We want to answer &quot;What are bananas?&quot; and &quot;In which movies did Harrison Ford appear&quot;. We will be doing this on Freebase; but don&apos;t worry, there&apos;s no need for you to know the Freebase schema to understand this talk. We&apos;ll cover the information we need as we go. I&apos;m going to show you some complete code, but this is not a tutorial so I&apos;m not going to go over line by line explaining what everything does. The code I&apos;m showing has the purpose of displaying what are the different parts that you&apos;ll need to put together and how much (or how little) work is needed to build each.
To build this example, the easiest way is to start with the DSL. We&apos;ll start defining some simple concepts that look naturally related to the queries we want to make. Let&apos;s take a look at the `DefinitionOf` class. What we&apos;re saying here is how to get the definition of something. In freebase, entities are related to their definitions by the &quot;slash common slash topic slash description&quot; attribute (this is why we say that this is a `FixedRelation`; in freebase, attributes are also represented as relations). The &quot;reverse equals true&quot; indicates that we actually fix the left side of the relation to a known value, and want to learn about the right side. Without it, this would be the opposite query, give me an object given its definition.
This is all the DSL we need to answer &quot;What are bananas?&quot;. The other query we wanted to make is quite more complex. Our database has movies, where each movie can have many related entities called &quot;performances&quot;. Each performance relates to an actor, a character, etc. So we define some basic relations to identify the type of some entities using `FixedType`. `IsMovie` describe entities having freebase type &quot;slash film slash film&quot;, and `IsPerformance` helps us recognizing these &quot;performance&quot; objects. To link both types of entities, the `PerformanceOfActor` queries which performances have a given actor and `HasPerformance` allows us to query which movie has a given performance. At last, in freebase movies are complex objects, but when we show a result to the user we want to show him a movie name so `NameOf` gets the &quot;slash type slash object slash name&quot; attribute of a movie, which is the movie title.
The intermediate representation of queries is built on instances of these objects. For example, given an actor “a”, this expression gives the movies with “a” (slide). Note that the operations on the bottom are abstract operations between queries which build a larger query, none of this is touching the database but just building a tree.
Let&apos;s now see how to code the parser for the queries mentioned before. For each kind of question we can build a &quot;question template&quot;. The first thing that a question template specifies is how to match the questions. The matching has to be flexible enough to capture variants of the question like &quot;what is X&quot;, &quot;what are X&quot;, &quot;what is an X&quot;, &quot;what is X?&quot; which you can see we write on the regex here: We have a &quot;what&quot; like word, followed by some form of the verb &quot;to be&quot;, optionally followed by a &quot;determiner&quot; which is a word like &quot;a&quot;, &quot;an&quot; the&quot;, followed by a thing which is what we want to look up, and followed by a question mark. Note that I said &quot;a thing&quot; without being too explicit on what that means. Quepy allows you to define &quot;particles&quot;, which mean pieces of the question that you want to capture and that follow a particular pattern.
Note that at the bottom I have defined what a Thing is, the definition consisting also in one regular expression but also an intermediate representation for it. In this case, a thing is an optional adjective followed by one or more nouns. The semantics of a thing are given by the interpret method, where HasKeyword is a quepy builtin with essentially the semantics of &quot;the object with this primary key&quot;. It&apos;s shown in the slides as a dashed line. Our question template regex refers to Thing(), so in its interpret method it will have access to the already built graph for the matched thing. So if we ask &quot;What is a banana?&quot;, you&apos;ll end up with a valid match that builds the graph on the right, which corresponds to the appropiate query.
Let&apos;s work on the more complex example. The first thing we&apos;ll require is some additional DSL to write the &quot;Actor&quot; particle. In freebase, there&apos;s no actor type, but there&apos;s a &quot;person type&quot; and then an actor profession. That allows us to define &quot;IsPerson&quot; (that is objects with the person type) and &quot;IsActor&quot; (that is objects with the actor profession)
This allows us to define the Actor particle, which matches a sequence of nouns, and represent an object that is a person, works as an actor, and has as identifier the name in the match.
The regex for this questions is more complex because we allow several different forms like the ones shown at the bottom. We allow several synonym verbs to be used like star vs act vs appear. We also allow synonyms like film and movie. Note that it&apos;s more clear to write this by defining intermediate regular expressions, but no Particle definitions is needed if you don&apos;t want to capture the word used. There are possibly more ways to ask this question, but once you figure those out it&apos;s pretty easy to add those to the pattern. The pattern you see here is a simplified version of the pattern you&apos;ll find on the demo we have in the github repo, but I simplified it to make it shorter to read.
Once you&apos;ve captured the actor, you just need to define, using the DSL, how to answer the query. Note that the definition here is very readable: we find performance objects referring to the matched actor, then we find movies with that performance, and then we find the names of those movies. Again, I described this sequentially, but you&apos;re actually describing declaratively how to build a query
Quepy also provide some tools to help you with their boilerplate, which are not very interesting to describe but I just wanted you to know that they are there. There&apos;s the concept of a quepy app which is a python module where you fill out the DSL, question templates, settings like whether you want sparql or mql, etc. Once you have that you can import that python module with quepy dot install and get the query for a natural language question ready to send to your database.
As you have seen, the approach we&apos;ve used for the problem is very simple, but it has some good properties I&apos;d like to highlight. The first one, that is very important for us as a company that needs to build products based on this tool, is that you can add effort incrementally and get results that benefit the application, so it&apos;s very low risk. This is different from machine learning or statistical approaches where you can use a lot of project time building a model and you might end up hitting gold, or you might end up with something that adds 0 visible results to a product. So, as much as we love machine learning where I work, we refrained ourselves from using it, getting something that&apos;&apos;s not state-of-the-art in terms of coverage, but it is a very safe approach. Which is great value when interacting with customers
Other good part about this is that extending or improving requires work that can be done by a developer who doesn&apos;t need a strong linguistic specialization. So it&apos;s easy to get a large team working on improving an application. And many people can work at the same time, because question templates are really modular and not an opaque construct as machine learning models. This approach works well in domain specific databases, where there&apos;s a limited amount of relationships relevant within the data. For very general databases like freebase and dbpedia, if you want to answer general questions, you will find out that users will start making up questions that fal outside your question templates.
And that&apos;s also one of the weaknesses of this. If you have a general database, you&apos;ll have an explosion in the amount of relevant queries and templates, which starts to produce problems between contradicting rules. Note that the limit here is not the amount of entities in your dataset, but the amount of relationships between them. The way this idea works also makes a bit hard if you want to integrate computation or deduction. The latter can be partly solved by using knowledge databases that have some deduction builtin, and apply that when they get a query so it&apos;s something that you can work around
Something that&apos;s a limit of the implementation, but could be improved is the performance of the conversion. What we have is something that works for us in contexts where we don&apos;t have many queries in a short time, but would need some improvements if you want to provide a service available to a wide public. The last point that can be a limitation is the need of a structured database, which is something one doesn&apos;t always have access to. We actually built quepy as a component on a larger project, but we&apos;re also working on the other side of this problem with a tool called iepy,
So that&apos;s all I have. I&apos;ll take a few questions and of course you can get in touch in me later today or online for more information about this and other related work. Thanks for listening, and thenks to the people organizing this great conference.

Quepy

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (18)

Similar a Quepy

Similar a Quepy (20)

Último

Último (20)

Quepy

Notas del editor