Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Disambiguating Identity Web References using Social Data<br />Matthew Rowe<br />Organisations, Information and Knowledge G...
Outline<br />Problem Setting<br />Research Questions<br />Claims of the Thesis<br />State of the Art<br />Requirements for...
Personal Information on the Web<br />Personal information on the Web is disseminated:<br />Voluntarily<br />Involuntarily<...
Ambiguity!<br />
Matthew Rowe: Composer<br />
Matthew Rowe: Cyclist<br />
Matthew Rowe: Gardener<br />
Matthew Rowe: Song Writer<br />
Matthew Rowe: PhD Student<br />
Problem Setting<br />Performing disambiguation manually:<br />Time consuming<br />Laborious<br />Handle masses of informat...
Research Questions<br />How can identity web references be disambiguated automatically?<br />Alleviate human processing:<b...
State of the Art<br />Disambiguation techniques are divisible into 2 types: <br />Seeded techniques<br />E.g. [Bekkerman a...
Requirements<br />Requirements for Seeded Disambiguation:<br />Bootstrap the disambiguation process with minimal supervisi...
Disambiguating Identity Web References<br />
Harnessing the Social Web<br />WWW has evolved into a web of participation<br />Digital identity is important on the Socia...
Data found on Social Web platforms is representative of real identity information<br />
User Study<br />Data found on Social Web platforms is representative of real identity information<br />50 participants fro...
Disambiguating Identity Web References<br />
Leveraging Seed Data from the Social Web<br />3. Seed Data:<br /><ul><li>How can this be gathered inexpensively?</li></li>...
Leveraging Seed Data from the Social Web<br />Link things together!<br />
Leveraging Seed Data from the Social Web<br />Blocking Step<br /><ul><li>Only compare people with the same name</li></ul>C...
Leveraging Seed Data from the Social Web<br />Allows remote resource information to change<br />Automated techniques:<br /...
Disambiguating Identity Web References<br />
Generating Metadata Models<br />Input to disambiguation techniques is a set of web resources<br />Web resources come in ma...
Generating RDF Models from XHTML Documents<br />http://events.linkeddata.org/ldow2009/<br />
Generating RDF Models from XHTML Documents<br />
Generating RDF Models from HTML Documents<br />Rise in use of lowercase semantics!<br />However only 2.6% of web documents...
Generating RDF Models from HTML Documents<br />
Generating RDF Models from HTML Documents<br /><ul><li>HTML is often poorly structured
Need a Document Object Model
Therefore Tidy it!</li></li></ul><li>Generating RDF Models from HTML Documents<br /><ul><li>Identify document segments for...
1 window = Info about 1 person
Get Xpath expression to the window</li></li></ul><li>Generating RDF Models from HTML Documents<br /><ul><li>Extract inform...
E.g. name, email, www, location
Train model parameters: Transition probs, emission probs, start probs
Use Viterbi algorithm to label tokens with states
Returns most likely state sequence</li></li></ul><li>Generating RDF Models from HTML Documents<br />M Rowe. Data.dcs: Conv...
Disambiguating Identity Web References<br />
Disambiguation 1: Inference Rules<br />1. Extract instances from Seed Data<br />2. For each instance, build a rule:<br /><...
Add triples to the rule
Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
Disambiguation 1: Inference Rules<br />1. Extract instances from Seed Data<br />2. For each instance, build a rule:<br /><...
Add triples to the rule
Create a new rule is a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
Disambiguation 1: Inference Rules<br />PREFIX foaf:<http://xmlns.com/foaf/0.1/><br />CONSTRUCT { <http://www.dcs.shef.ac.u...
Add triples to the rule
Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
Disambiguation 1: Inference Rules<br />PREFIX foaf:<http://xmlns.com/foaf/0.1/><br />CONSTRUCT { <http://www.dcs.shef.ac.u...
Add triples to the rule
Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
Disambiguation 1: Inference Rules<br />1. Extract instances<br />2. For each instance, build a rule:<br /><ul><li>Build a ...
Add triples to the rule
Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules<br />PREFIX foaf:<http://xmlns...
Disambiguation 1: Inference Rules<br />Advantages:<br /><ul><li>Highly precise
Applies graph patterns</li></ul>Disadvantages:<br /><ul><li>Does not learn from past decisions (supervised)
Strict matching: lack of generalisation</li></ul>M Rowe. Inferring Web Citations using Social Data and SPARQL Rules. In pr...
Disambiguation 2: Random Walks<br />Seed data and web resources are RDF<br />RDF has a graph structure:<br /><subject, pre...
Disambiguation 2: Random Walks<br /><ul><li>Link the social graph with the web resources
Via common resources/literals</li></li></ul><li>Disambiguation 2: Random Walks<br />
Disambiguation: Random Walks<br />
Disambiguation 2: Random Walks<br /><ul><li>Graph space may contain islands of nodes
Inhibit transitions through the graph space
Get the component containing the social graph</li></li></ul><li>Disambiguation 2: Random Walks<br /><ul><li>Perform Random...
Disambiguation 2: Random Walks<br /><ul><li>Measure Distances:
Commute Time distance
Leave node i : reach node j : return to node i
Optimum Transitions
Próxima SlideShare
Cargando en…5
×

PhD Viva - Disambiguating Identity Web References using Social Data

Libros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo
  • Sé el primero en comentar

PhD Viva - Disambiguating Identity Web References using Social Data

  1. 1. Disambiguating Identity Web References using Social Data<br />Matthew Rowe<br />Organisations, Information and Knowledge Group<br />Department of Computer Science<br />University of Sheffield<br />
  2. 2. Outline<br />Problem Setting<br />Research Questions<br />Claims of the Thesis<br />State of the Art<br />Requirements for Disambiguation and Seed Data<br />Disambiguating Identity Web References<br />Leveraging Seed Data from the Social Web<br />Generating Metadata Models<br />Disambiguation Techniques<br />Evaluation<br />Conclusions<br />Dissemination and Impact<br />
  3. 3. Personal Information on the Web<br />Personal information on the Web is disseminated:<br />Voluntarily<br />Involuntarily<br />Increase in personal information:<br />Identity Theft<br />Lateral Surveillance<br />Web users must discover their identity web references<br />2 stage process<br />Finding<br />Disambiguating<br />Disambiguation = reduction of web reference ambiguity<br />My thesis addresses disambiguation<br />
  4. 4. Ambiguity!<br />
  5. 5. Matthew Rowe: Composer<br />
  6. 6. Matthew Rowe: Cyclist<br />
  7. 7. Matthew Rowe: Gardener<br />
  8. 8. Matthew Rowe: Song Writer<br />
  9. 9. Matthew Rowe: PhD Student<br />
  10. 10. Problem Setting<br />Performing disambiguation manually:<br />Time consuming<br />Laborious<br />Handle masses of information<br />Repeated often<br />The Web keeps changing<br />Solution = automated techniques<br />Alleviate the need for humans<br />Need background knowledge<br />Who am I searching for?<br />What makes them unique? <br />
  11. 11. Research Questions<br />How can identity web references be disambiguated automatically?<br />Alleviate human processing:<br /><ul><li>Can automated techniques replace humans?</li></ul>Supervision:<br /><ul><li>Can automated techniques function independently?</li></ul>Seed Data:<br /><ul><li>How can this be gathered inexpensively?</li></ul>Interpretation:<br /><ul><li>How can automated techniques interpret information?</li></li></ul><li>Claims of the Thesis<br /><ul><li>Automated disambiguation techniques are able to replace human processing</li></ul>Retrieve and process information at large-scale<br />With high accuracy<br />Data found on Social Web platforms is representative of real identity information<br />Platforms allow users to build a digital identity<br /><ul><li>Social data provides the background knowledge required by automated disambiguation techniques</li></ul>Overcoming the burden of seed data generation<br />
  12. 12. State of the Art<br />Disambiguation techniques are divisible into 2 types: <br />Seeded techniques<br />E.g. [Bekkerman and McCallum, 2005], Commercial Services <br />Pros<br />Disambiguate web references for a single person<br />Cons:<br />Require seed data<br />No explanation of how seed data is acquired <br />Unseeded techniques<br />E.g. [Song et al, 2007]<br />Pros<br />Require no background knowledge<br />Cons<br />Groups web references into clusters<br />Need to choose the correct cluster <br />
  13. 13. Requirements<br />Requirements for Seeded Disambiguation:<br />Bootstrap the disambiguation process with minimal supervision<br />Achieve disambiguation accuracy comparable to human processing<br />Cope with web resources not containing seed data features<br />Disambiguation must be effective for all individuals<br />Requirements for Seed Data:<br />Produce seed data with minimal cost<br />Generate reliable seed data<br />
  14. 14. Disambiguating Identity Web References<br />
  15. 15. Harnessing the Social Web<br />WWW has evolved into a web of participation<br />Digital identity is important on the Social Web<br />Digital identity is fragmented across the Social Web<br />Data Portability from Social Web platforms is limited<br />http://www.economist.com/business/displaystory.cfm?story_id=10880936<br />
  16. 16. Data found on Social Web platforms is representative of real identity information<br />
  17. 17. User Study<br />Data found on Social Web platforms is representative of real identity information<br />50 participants from the University of Sheffield <br />Consisted of 3 stages, each participant:<br />List real world social network<br />Extract digital social network<br />Compare networks<br />Relevance: 0.23<br />Coverage: 0.77<br />Updates previous findings <br />[Subrahmanyam et al, 2008]<br />M Rowe. The Credibility of Digital Identity Information on the Social Web: A User Study. In proceedings of 4th Workshop on Information Credibility on the Web, World Wide Web Conference 2010. Raleigh, USA. (2010)<br />
  18. 18. Disambiguating Identity Web References<br />
  19. 19. Leveraging Seed Data from the Social Web<br />3. Seed Data:<br /><ul><li>How can this be gathered inexpensively?</li></li></ul><li>Leveraging Seed Data from the Social Web<br />Use Semantics!<br />M Rowe and F Ciravegna. Getting to Me - Exporting Semantic Social Network Information from Facebook. In proceedings of Social Data on the Web Workshop, ISWC 2008, Karlsruhe, Germany. (2008)<br />http://www.dcs.shef.ac.uk/~mrowe/foafgenerator.html<br />
  20. 20. Leveraging Seed Data from the Social Web<br />Link things together!<br />
  21. 21. Leveraging Seed Data from the Social Web<br />Blocking Step<br /><ul><li>Only compare people with the same name</li></ul>Compare values of Inverse Functional Properties<br /><ul><li>E.g. Homepage/Email</li></ul>Compare Geo URIs<br /><ul><li>E.g. Matching locations</li></ul>Compare Geo data<br /><ul><li>Using Linked Data sources</li></ul>M Rowe. Interlinking Distributed Social Graphs. In proceedings of Linked Data on the Web Workshop, World Wide Web Conference, Madrid, Spain. (2009)<br />
  22. 22. Leveraging Seed Data from the Social Web<br />Allows remote resource information to change<br />Automated techniques:<br />Follow the links<br />Retrieve the instance information<br />
  23. 23. Disambiguating Identity Web References<br />
  24. 24. Generating Metadata Models<br />Input to disambiguation techniques is a set of web resources<br />Web resources come in many flavours:<br />Data models<br />XHTML documents containing embedded semantics<br />HTML documents<br />4. Interpretation:<br />How can automated techniques interpret information?<br />Solution = Semantic Web technologies!<br />Convert web resources to RDF<br />Metadata descriptions = ontology concepts<br />Information is<br />Consistent<br />Interpretable<br />
  25. 25. Generating RDF Models from XHTML Documents<br />http://events.linkeddata.org/ldow2009/<br />
  26. 26. Generating RDF Models from XHTML Documents<br />
  27. 27. Generating RDF Models from HTML Documents<br />Rise in use of lowercase semantics!<br />However only 2.6% of web documents contain semantics <br /> [Mika et al, 2009]<br />Majority of the web is HTML <br />Bad for machines<br />Must extract person information<br />Then build an RDF model<br />Person information is structured <br />for legibility<br />for segmentation<br />i.e. logical distinction between elements<br />
  28. 28. Generating RDF Models from HTML Documents<br />
  29. 29. Generating RDF Models from HTML Documents<br /><ul><li>HTML is often poorly structured
  30. 30. Need a Document Object Model
  31. 31. Therefore Tidy it!</li></li></ul><li>Generating RDF Models from HTML Documents<br /><ul><li>Identify document segments for extraction
  32. 32. 1 window = Info about 1 person
  33. 33. Get Xpath expression to the window</li></li></ul><li>Generating RDF Models from HTML Documents<br /><ul><li>Extract information using a Hidden Markov Model
  34. 34. E.g. name, email, www, location
  35. 35. Train model parameters: Transition probs, emission probs, start probs
  36. 36. Use Viterbi algorithm to label tokens with states
  37. 37. Returns most likely state sequence</li></li></ul><li>Generating RDF Models from HTML Documents<br />M Rowe. Data.dcs: Converting Legacy Data into Linked Data. In proceedings of Linked Data on the Web Workshop, World Wide Web Conference 2010. Raleigh, USA. (2010)<br />
  38. 38. Disambiguating Identity Web References<br />
  39. 39. Disambiguation 1: Inference Rules<br />1. Extract instances from Seed Data<br />2. For each instance, build a rule:<br /><ul><li>Build a skeleton rule
  40. 40. Add triples to the rule
  41. 41. Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
  42. 42. Disambiguation 1: Inference Rules<br />1. Extract instances from Seed Data<br />2. For each instance, build a rule:<br /><ul><li>Build a skeleton rule
  43. 43. Add triples to the rule
  44. 44. Create a new rule is a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
  45. 45. Disambiguation 1: Inference Rules<br />PREFIX foaf:<http://xmlns.com/foaf/0.1/><br />CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }<br />WHERE {<br /><http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .<br /> ?urlfoaf:topic ?p .<br /> ?pfoaf:name ?n .<br /><http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .<br /> ?qfoaf:name ?m .<br /> ?urlfoaf:topic ?r .<br /> ?rfoaf:name ?m<br />}<br />1. Extract instances<br />2. For each instance, build a rule:<br /><ul><li>Build a skeleton rule
  46. 46. Add triples to the rule
  47. 47. Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
  48. 48. Disambiguation 1: Inference Rules<br />PREFIX foaf:<http://xmlns.com/foaf/0.1/><br />CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }<br />WHERE {<br /><http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .<br /> ?urlfoaf:topic ?p .<br /> ?pfoaf:name ?n .<br /><http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .<br /> ?qfoaf:homepage ?h .<br /> ?urlfoaf:topic ?r .<br /> ?rfoaf:homepage ?h<br />}<br />1. Extract instances<br />2. For each instance, build a rule:<br /><ul><li>Build a skeleton rule
  49. 49. Add triples to the rule
  50. 50. Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules to the web resources<br />
  51. 51. Disambiguation 1: Inference Rules<br />1. Extract instances<br />2. For each instance, build a rule:<br /><ul><li>Build a skeleton rule
  52. 52. Add triples to the rule
  53. 53. Create a new rule if a triple’s predicate is Inverse Functional</li></ul>3. Apply the rules<br />PREFIX foaf:<http://xmlns.com/foaf/0.1/><br />CONSTRUCT { <http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:page ?url }<br />WHERE {<br /><http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:name ?n .<br /> ?urlfoaf:topic ?p .<br /> ?pfoaf:name ?n .<br /><http://www.dcs.shef.ac.uk/~mrowe/foaf.rdf#me> foaf:knows ?q .<br /> ?qfoaf:homepage ?h .<br /> ?urlfoaf:topic ?r .<br /> ?rfoaf:homepage ?h<br />}<br />
  54. 54. Disambiguation 1: Inference Rules<br />Advantages:<br /><ul><li>Highly precise
  55. 55. Applies graph patterns</li></ul>Disadvantages:<br /><ul><li>Does not learn from past decisions (supervised)
  56. 56. Strict matching: lack of generalisation</li></ul>M Rowe. Inferring Web Citations using Social Data and SPARQL Rules. In proceedings of Linking of User Profiles and Applications in the Social Semantic Web, Extended Semantic Web Conference 2010. Heraklion, Crete. (2010)<br />
  57. 57. Disambiguation 2: Random Walks<br />Seed data and web resources are RDF<br />RDF has a graph structure:<br /><subject, predicate, object><br /><source_node, edge, target_node><br />Graph-based disambiguation techniques:<br />E.g. [Jiang et al, 2009]<br />Build a graph-space<br />Partition data points in the graph-space<br />Requires methods to:<br />Compile a graph-space<br />Compare nodes<br />Cluster nodes<br />
  58. 58. Disambiguation 2: Random Walks<br /><ul><li>Link the social graph with the web resources
  59. 59. Via common resources/literals</li></li></ul><li>Disambiguation 2: Random Walks<br />
  60. 60. Disambiguation: Random Walks<br />
  61. 61. Disambiguation 2: Random Walks<br /><ul><li>Graph space may contain islands of nodes
  62. 62. Inhibit transitions through the graph space
  63. 63. Get the component containing the social graph</li></li></ul><li>Disambiguation 2: Random Walks<br /><ul><li>Perform Random Walks through the graph</li></ul>Derive Adjacency Matrix <br />Derive Diagonal Degree Matrix <br />Compute Transition Probability Matrix <br />
  64. 64. Disambiguation 2: Random Walks<br /><ul><li>Measure Distances:
  65. 65. Commute Time distance
  66. 66. Leave node i : reach node j : return to node i
  67. 67. Optimum Transitions
  68. 68. Move through the graph until probability peaks</li></li></ul><li>Disambiguation: Random Walks<br /><ul><li>Measure Distances:
  69. 69. Commute Time distance
  70. 70. Leave node i : reach node j : return to node i
  71. 71. Optimum Transitions
  72. 72. Move through the graph until P peaks</li></li></ul><li>Disambiguation 2: Random Walks<br /><ul><li>Group web resources with social graph
  73. 73. Via agglomerative clustering
  74. 74. Every point is in a cluster
  75. 75. Merge clusters until none can be merged</li></li></ul><li>Disambiguation 2: Random Walks<br />Advantages:<br /><ul><li>Semi-supervised
  76. 76. Exploits the graph structure of RDF</li></ul>Disadvantages:<br /><ul><li>Computationally heavy (Matrix powers!)
  77. 77. Relies on tuning clustering threshold</li></ul>M Rowe. Applying Semantic Social Graphs to Disambiguate Identity References. In proceedings of European Semantic Web Conference 2009, Heraklion, Crete. (2009)<br />
  78. 78. Disambiguation 3: Self-training<br />Classic ML scenario:<br />Lots of unlabelled data<br />Limited labelled data<br />Disambiguating identity web references is just the same!<br />Possible web citations = large<br />Social data = small<br />Semi-supervised learning is a solution<br />Train a classifier<br />Using labelled and unlabelled data!<br />Classification task is binary<br />Does this web resource refer to person X or not?<br />
  79. 79. Positive training data = seed data<br />Generate negative training data:<br />Via Rocchio classification:<br />Build centroid vectors: positive set and negative set<br />Negative set = unlabelled data<br />Compare possible web citations with vectors<br />Choose strongest negatives<br />Disambiguation 3: Self-training<br />
  80. 80. Positive training data = seed data<br />Generate negative training data:<br />Via Rocchio classification:<br />Build centroid vectors: positive set and negative set<br />Negative set = unlabelled data<br />Compare possible web citations with vectors<br />Choose strongest negatives<br />Disambiguation 3: Self-training<br />
  81. 81. Positive training data = seed data<br />Generate negative training data:<br />Via Rocchio classification:<br />Build centroid vectors: positive set and negative set<br />Negative set = unlabelled data<br />Compare possible web citations with vectors<br />Choose strongest negatives<br />Disambiguation 3: Self-training<br />
  82. 82. Positive training data = seed data<br />Generate negative training data:<br />Via Rocchio classification:<br />Build centroid vectors: positive set and negative set<br />Negative set = unlabelled data<br />Compare possible web citations with vectors<br />Choose strongest negatives<br />Disambiguation 3: Self-training<br />
  83. 83. Begin Self-training:<br />Train the Classifier<br />Classify the web resources<br />Rank classifications<br />Enlarge training sets<br />Repeat steps 1-4 <br />Disambiguation 3: Self-training<br />
  84. 84. Training/Testing data is RDF<br />Convert to a machine learning dataset<br />Features = RDF instances<br />Vary the feature similarity measure:<br />Jaccard Similarity<br />Inverse Functional Property Matching<br />RDF Entailment<br />Tested three different classifiers:<br />Perceptron<br />Support Vector Machine<br />Naïve Bayes<br />Disambiguation 3: Self-training<br />
  85. 85. Advantages<br />Directly learn from disambiguation decisions<br />Utilise abundance of unlabelled data<br />Disadvantages<br />Requires reliable negatives<br />Mistakes can reinforce themselves<br />M Rowe and F Ciravegna. Harnessing the Social Web: The Science of Identity Disambiguation. In proceedings of Web Science Conference 2010. Raleigh, USA. (2010)<br />Disambiguation 3: Self-training<br />
  86. 86. Evaluation<br />Measures:<br />Precision, Recall, F-Measure<br />Dataset<br />50 participants from the Semantic Web and Web 2.0 communities<br />~17300 web resources: 346 web resources for each participant<br />Baselines<br />Baseline 1: Person name as positive classification<br />Baseline 2: Hierarchical Clustering using Person Names<br />[Malin, 2005]<br />Baseline 3: Human Processing <br />
  87. 87. Evaluation: Inference Rules<br />High precision<br />Better than humans<br />Precise graph pattern matching<br />Low recall<br />Rules are strict<br />No room for variability<br />Hard to generalise<br />No learning from disambiguation decisions<br />
  88. 88. Evaluation: Random Walks<br />High recall<br />Higher than humans<br />Incorporates unlabelled data into random walks<br />Uses features not in the seed data<br />Precision<br />Lower than humans and rules<br />Ambiguous name literals lead to false positives<br />
  89. 89. Evaluation: Self-training<br />High Recall<br />SVM + Entailment classifies 91% of references<br />High F-Measure <br />Higher than humans<br />Perceptron + Entailment and SVM + Entailment<br />
  90. 90. Conclusions: Research Questions<br />Alleviate human processing:<br /><ul><li>Can automated techniques replace humans?</li></ul>Performance is comparable to humans<br />Suited to low web presence <br />Supervision:<br /><ul><li>Can automated techniques function independently?</li></ul>Inference Rules : Induce rules from seed data<br />Random Walks : Graph space built from models<br />Self-training : Learn + retrain a classifier<br />Seed Data:<br /><ul><li>How can this be gathered inexpensively?</li></ul>Utilise Social Web platforms<br />Digital identities are similar to real world identities<br />Interpretation:<br /><ul><li>How can automated techniques interpret information?</li></ul>Solution = Semantic Web technologies<br />Convert web resources into metadata models<br />
  91. 91. Conclusions: Claims<br />Automated disambiguation techniques are able to replace human processing<br />Techniques are comparable to humans<br />Overcome manual processing<br />Data found on Social Web platforms is representative of real identity information<br />77% of a real world social network is covered online<br />Social data provides the background knowledge required by automated disambiguation techniques<br />Techniques function using social data<br />Biographical and social network enables disambiguation<br />
  92. 92. Dissemination and Impact<br />Published 21 peer-reviewed publications<br />Paper in the Journal of Web Semantics (impact: 3.5)<br />Presented work at many international conferences<br />Program committee member for 5 international workshops<br />Invited Expert for the World Wide Web Consortium’s Social Web Incubator Group<br />Listed as one of top 100 visionaries “discussing the future of the web”<br />http://www.semanticweb.com/semanticweb100/<br />Linked Data service for the DCS<br />Best Poster at the Extended Semantic Web Conference 2010<br />http://data.dcs.shef.ac.uk<br />Tools widely used by the Semantic Web community<br />FOAF Generator<br />Social Identity Schema Mapping (SISM) Vocabulary<br />
  93. 93. Twitter: @mattroweshow<br />Web: http://www.dcs.shef.ac.uk/~mrowe<br />Email: m.rowe@dcs.shef.ac.uk<br />Questions?<br />For a condensed version of my thesis:<br />M Rowe and F Ciravegna. Disambiguating Identity Web References using Web 2.0 Data and Semantics. In Press for special issue on "Web 2.0" in the Journal of Web Semantics. (2010)<br />

×