The Codex of Business Writing Software for Real-World Solutions 2.pptx
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
1. Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University, Milton Keynes, UK
10. Summarizing an RDF dataset with questions We would like to be able to give an entry point to a dataset by showing questions it is good at answering In a way that can be navigated Example: Who are the people Tom knows? Tom Heath’s FOAF profile
11. A question A list of characteristics of objects (clauses) based on the relationships between objects Things that are people, i.e. instances of <Person> Related to <tom> through the relation <knows> For which the answer is a set of objects All the objects that satisfy the clauses of the question
12. Formal concept analysis Lattice of concepts: set of objects (extension) with common properties (intension) Formal context: objects with binary attributes Example from: http://en.wikipedia.org/wiki/Formal_concept_analysis
13. RDF instances as individuals in a formal context Present relations of objects as binary attributes: RDF: tom a Person. tom knows enrico. jeff knows tom. FCA: tom: {Class:-Person, knows:-Enrico, jeff-:knows} Include implicit information based on the ontology tom: {Class:-Person, Class:-Agent, Class:-Thing, knows:-Enrico, knows:Person, knows:-Agent, knows:-Thing,jeff-:knows, Person:-knows, Agent-:knows, Thing:-knows}
16. A concept in the lattice is a question Intension = clauses of the question Extension = answers All the objects of the extension satisfy the clauses of the question Different areas of the lattice focus on different topics Questions are organized in a hierarchy {Class:-Person, tom-:knows} What are the (Person) that (tom knows)? What are tom’s current projects? What are the people? What are the people that tom knows?
17. But… The RDFFormal Context process can generate a lot of attributes and so a lot of questions Ranging from things uninterestingly general What are the Things? To the ones that might be interesting only in very specific cases What are the indian restaurants located in San Diego that have been rated OK and are called “Chez Bob”? Need to extract a list of questions as an entry point
18. How to measure the interestingness of a question - metrics Inspired by ontology summarization: Coverage: if providing a list of questions, the questions should cover the entire lattice (i.e., at least one question per branch) Level: Too general or too specific questions are not useful Density: The number of clauses can have an impact (avoid too complex questions as well as too simple ones) Inspired from FCA: Support: the cardinality of the extent – i.e. the number of answers Intentional Stability: How much a concept depends on particular elements of the extension Extensional Stability: How much a concept depends on particular elements of the intension
19. Experiment: finding the relevant metrics 4 datasets in different domains 12 evaluators providing questions of interest for these datasets Obtained 44 questions, out of which 27 are valid (no overlap) Some are too complicated for our model (include disjunction, negation, aggregation functions) “What is the highest point in Florida?” A large part do not comply with the initial instructions: should be self-contained and answered by a list of objects “How high is mountain x?” “What are the restaurant in a given city?”
20.
21. Evaluation Algorithm to generate a set of questions from the lattice of an RDF dataset that Cover the entire lattice Are believed to be interesting according to a given measure Datasets from data.open.ac.uk 614 course descriptions 1706 Video podcasts Using the metrics: random, closeness to middle level, density close to 2, support, extensional stability, and Aggregated = 1/3 level + 1/3 density + 1/3 stability 6 users to score the resulting sets of questions (6 metrics in 2 datasets: 12 sets in total) depending on interestingness
27. Conclusion The technique presented provides both a summary and an exploration mechanism over RDF data, using the underlying ontology and formal concept analysis It provides an interface for documenting the dataset by examples rather than by specification It favors serendipity in the exploration of the dataset, without the need for prior, specialized knowledge The current interface in beta is available in an online demo Need to improve the question generation and navigation mechanisms Ongoing experiment including information gathered through the links to external dataset, to generate un-anticipated questions Use-cases in research projects in Arts and Humanities
28. Thank you! More info Demo: http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/ Data.open.ac.uk (for some of the datasets used) @mdaquin – m.daquin@open.ac.uk