SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Unsupervised Learning
        of Social Networks
from a Multiple-Source News Corpus

             Hristo Tanev

          European Commission
          Joint Research Centre
            hristo.tanev@jrc.it
Introduction
Social networks provide an intuitive
picture of inferred relationships between
entities, such as people and organizations.
Social network analysis uses Social
Networks to identify underlying groups,
communication patterns, and other
information.
Manual construction of a social network is
very laborious task. Algorithms for
automatic detection of relations may be
used to save time and human efforts.
Introduction
We present an unsupervised methodology
for automatic learning of social networks
We use multiple-source syntactically
parsed news corpus.
In order to overcome the efficiency
problems which emerge from using
syntactic information on real-world data,
we put forward an efficient graph
matching algorithm.
Related work
Learning social networks from
Friend-Of-A-Friend links (Mika 2005)
or statistical co-occurrences
Disadvantage: cannot detect the
type of the relation
Related work
Support Vector Machines (SVM)
provide more accurate means for
relation extraction (Zelenko et.al.
2003)
Disadvantages:
• require a sufficient amount of annotated
  data
• each pair of named entities should be
  evaluated separately, which slows down
  the relation extraction
Related work
(Romano et.al. 2006) propose a generic
unsupervised method for learning of
syntactic patterns for relation extraction
Disadvantages:
• they use the Web as a training corpus, which
  makes the learning very slow
• they match each pattern against each
  sentence which is not efficient when matching
  many templates against a big corpus
Unsupervised learning of social
          networks
Our algorithm is unsupervised – it accepts on its
input one, two, or other small number of two-slot
seed syntactic templates which express certain
semantic relation.
The algorithm uses news clusters to learn new
syntactic patterns expressing the same semantic
relation.
When the patterns are learned we apply a novel
efficient methodology for pattern matching to
extract related person names from the text.
Extracted relations are aggregated in a social
network.
EMM news clusters
European Media Monitor downloads
news from different sources around
the clock.
Every day 4000-5000 English
language news are downloaded.
The news articles are grouped into
topic clusters.
Parsing the corpus
The training and the test corpus
consist of English-language news
articles from 200 sources.
Articles are parsed with a full
dependency parser, MiniPar.
                  meet
           subj           obj

                  in
         Bush              Blair

                  March
Learning patterns
Provide manually a very small
number of seed syntactic templates
which express the main relation.
For example, for the relation “X
supports Y” we use the syntactic
patterns:
   X    subj support obj Y
  X    subj praise obj Y
Learning patterns
Match these templates against the
news clusters in the corpus. Each
pair of person names which fill the
slots X and Y is called an anchor
pair.
From “Bush praised the Prime
Minister Hamid Karzai”, the
algorithm will extract the anchor
pair (X:Bush; Y:Hamid Karzai)
Learning patterns
Normalize the anchor pairs using
the information in the EMM
database.
After this step, the example anchor
pair will become (X:George W.
Bush; Y:Hamid Karzai).
Learning patterns
For each extracted anchor pair,
search in the same cluster all the
sentences where both names of the
anchor pair occur.
The assumption is that the same
relation will hold between the same
pairs of names in the whole news
cluster, since all articles in it have
the same topic.
Learning patterns
From all the sentences in which at least
one anchor pair appears, learn syntactic
pattern using our pattern-learning
algorithm similar to the General
Structure Learning algorithm (GSL)
described in (Szpektor et.al. 2006)
Example: X subj-agree-with Y
Each pattern obtains as a score the
number of different anchor pairs which
support it
Learning patterns
Pattern selection and filtering
• Filter out all templates which appear for
  less than 2 anchor pairs.
• Take out generic patterns like “X say Y”,
  “X have Y”, “X is Y”, etc. using a a
  predefined template list
Syntactic Network model
“Prodi met          “Berlusconi met
President Bush in   President Chirac”
September”
Syntactic Network model
Adding syntactic templates
Efficiency
The worst case time complexity of building
SyntNet is O(|w| log |w|), where |w| is the
number of the words in the parsed corpus
The worst case time complexity of the syntactic
matching algorithm is bounded by O((|s|+|t|)
(log MaxArcO)), where |s| is the number of the
sentences in the corpus, |t| is the number of the
templates, and the MaxArcO is the maximum
number of occurrences of an SyntNet arc, i.e. the
size of the maximal index set of a SyntNet arc
Evaluation schema

To evaluate our algorithm we learned syntactic
patterns for “meeting” and “support”
relationships between people
We evaluate the algorithm how well it captures
relationship between the top 33 VIP from our
database
We do not evaluate how it captures relation
mentions
If a specific relation (e.g. “meeting”) holds
between a pair of people X and Y, it is sufficient
that the algorithm finds at least one mention of
this relation between X and Y
Experiments
For paraphrase learning we used a training
corpus of 98'000 English-language news articles
clustered in 22'000 EMM topic clusters published
in the period 01/May/2006 – 03/Oct/2006.
For testing the method, we used 125'000
English-language news articles published in the
period 03/Oct/2006 – 31/Oct/2006.
To read the test corpus and the templates in the
memory and to build SyntNet+ it took 9 min and
3 sec. It took only 45 seconds to match the 101
syntactic templates against the test corpus of
about 1'080'000 parsed sentences.
We normalized extracted names using the EMM
database
Relationship extraction evaluation on the top
         33 VIP from the EMM DB
           Precision Recall       F1


           0.61       0.56        0.58
meeting


           0.57       0.10        0.17
support


           0.60       0.32        0.42
overall
Using the social network view
We run the PageRank algorithm on
the automatically extracted
“meeting” network and found the top
5 ranked people
We compared this ranking with
simple frequency-based people
ranking
Comparing two people ranking
            schemas
Pagerank         Frequency

C. Rice          G.W. Bush

G.W. Bush        T. Blair

V. Putin         C. Rice

E. Olmert        N. al-Maliki

T. Blair         S. Hussein
Conclusions and future work
We presented an unsupervised method for
social network learning from news clusters
We presented very efficient syntactic
pattern matching algorithm
Automatically learned social networks can
be used for some analyst tasks
In our future work we will try to consider
more types of relations
We consider learning and using more
abstract patterns
THANK YOU!

Más contenido relacionado

La actualidad más candente

DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred toolRaf Guns
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisFred Stutzman
 
712201907
712201907712201907
712201907IJRAT
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction TechniquesIRJET Journal
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)dnac
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...Daniel Katz
 
Interpreting sslar
Interpreting sslarInterpreting sslar
Interpreting sslarRatzman III
 
Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"butest
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBijcsit
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measuresdnac
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)Duke Network Analysis Center
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 

La actualidad más candente (20)

DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
 
Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
 
Ijetcas14 639
Ijetcas14 639Ijetcas14 639
Ijetcas14 639
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
712201907
712201907712201907
712201907
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
mlss
mlssmlss
mlss
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)08 Exponential Random Graph Models (ERGM)
08 Exponential Random Graph Models (ERGM)
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
 
Interpreting sslar
Interpreting sslarInterpreting sslar
Interpreting sslar
 
Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"Modelling and Analyzing Complex Networks"
Modelling and Analyzing Complex Networks"
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 

Similar a Unsupervised Learning of a Social Network from a Multiple-Source News Corpus

IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNRounak Dhaneriya
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)IJERA Editor
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesSushant Shankar
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)es712
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Cuong Tran Van
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similaritypathsproject
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping dannyijwest
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system forAlexander Decker
 
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Ayman El-Kilany
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceVijay Prakash Dwivedi
 
Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Jinho Choi
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 

Similar a Unsupervised Learning of a Social Network from a Multiple-Source News Corpus (20)

IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Textual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNNTextual Document Categorization using Bigram Maximum Likelihood and KNN
Textual Document Categorization using Bigram Maximum Likelihood and KNN
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Using NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion housesUsing NLP to find contextual relationships between fashion houses
Using NLP to find contextual relationships between fashion houses
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)
 
Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...Adaptive named entity recognition for social network analysis and domain onto...
Adaptive named entity recognition for social network analysis and domain onto...
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual SimilaritySemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping
 
Automatic multiple choice question generation system for
Automatic multiple choice question generation system forAutomatic multiple choice question generation system for
Automatic multiple choice question generation system for
 
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
 
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector SpaceBeyond Word2Vec: Embedding Words and Phrases in Same Vector Space
Beyond Word2Vec: Embedding Words and Phrases in Same Vector Space
 
Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...Improving Question Answering by Bridging Linguistic Structures with Statistic...
Improving Question Answering by Bridging Linguistic Structures with Statistic...
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 

Último

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Unsupervised Learning of a Social Network from a Multiple-Source News Corpus

  • 1. Unsupervised Learning of Social Networks from a Multiple-Source News Corpus Hristo Tanev European Commission Joint Research Centre hristo.tanev@jrc.it
  • 2. Introduction Social networks provide an intuitive picture of inferred relationships between entities, such as people and organizations. Social network analysis uses Social Networks to identify underlying groups, communication patterns, and other information. Manual construction of a social network is very laborious task. Algorithms for automatic detection of relations may be used to save time and human efforts.
  • 3. Introduction We present an unsupervised methodology for automatic learning of social networks We use multiple-source syntactically parsed news corpus. In order to overcome the efficiency problems which emerge from using syntactic information on real-world data, we put forward an efficient graph matching algorithm.
  • 4. Related work Learning social networks from Friend-Of-A-Friend links (Mika 2005) or statistical co-occurrences Disadvantage: cannot detect the type of the relation
  • 5. Related work Support Vector Machines (SVM) provide more accurate means for relation extraction (Zelenko et.al. 2003) Disadvantages: • require a sufficient amount of annotated data • each pair of named entities should be evaluated separately, which slows down the relation extraction
  • 6. Related work (Romano et.al. 2006) propose a generic unsupervised method for learning of syntactic patterns for relation extraction Disadvantages: • they use the Web as a training corpus, which makes the learning very slow • they match each pattern against each sentence which is not efficient when matching many templates against a big corpus
  • 7. Unsupervised learning of social networks Our algorithm is unsupervised – it accepts on its input one, two, or other small number of two-slot seed syntactic templates which express certain semantic relation. The algorithm uses news clusters to learn new syntactic patterns expressing the same semantic relation. When the patterns are learned we apply a novel efficient methodology for pattern matching to extract related person names from the text. Extracted relations are aggregated in a social network.
  • 8. EMM news clusters European Media Monitor downloads news from different sources around the clock. Every day 4000-5000 English language news are downloaded. The news articles are grouped into topic clusters.
  • 9. Parsing the corpus The training and the test corpus consist of English-language news articles from 200 sources. Articles are parsed with a full dependency parser, MiniPar. meet subj obj in Bush Blair March
  • 10. Learning patterns Provide manually a very small number of seed syntactic templates which express the main relation. For example, for the relation “X supports Y” we use the syntactic patterns: X subj support obj Y X subj praise obj Y
  • 11. Learning patterns Match these templates against the news clusters in the corpus. Each pair of person names which fill the slots X and Y is called an anchor pair. From “Bush praised the Prime Minister Hamid Karzai”, the algorithm will extract the anchor pair (X:Bush; Y:Hamid Karzai)
  • 12. Learning patterns Normalize the anchor pairs using the information in the EMM database. After this step, the example anchor pair will become (X:George W. Bush; Y:Hamid Karzai).
  • 13. Learning patterns For each extracted anchor pair, search in the same cluster all the sentences where both names of the anchor pair occur. The assumption is that the same relation will hold between the same pairs of names in the whole news cluster, since all articles in it have the same topic.
  • 14. Learning patterns From all the sentences in which at least one anchor pair appears, learn syntactic pattern using our pattern-learning algorithm similar to the General Structure Learning algorithm (GSL) described in (Szpektor et.al. 2006) Example: X subj-agree-with Y Each pattern obtains as a score the number of different anchor pairs which support it
  • 15. Learning patterns Pattern selection and filtering • Filter out all templates which appear for less than 2 anchor pairs. • Take out generic patterns like “X say Y”, “X have Y”, “X is Y”, etc. using a a predefined template list
  • 16. Syntactic Network model “Prodi met “Berlusconi met President Bush in President Chirac” September”
  • 19. Efficiency The worst case time complexity of building SyntNet is O(|w| log |w|), where |w| is the number of the words in the parsed corpus The worst case time complexity of the syntactic matching algorithm is bounded by O((|s|+|t|) (log MaxArcO)), where |s| is the number of the sentences in the corpus, |t| is the number of the templates, and the MaxArcO is the maximum number of occurrences of an SyntNet arc, i.e. the size of the maximal index set of a SyntNet arc
  • 20. Evaluation schema To evaluate our algorithm we learned syntactic patterns for “meeting” and “support” relationships between people We evaluate the algorithm how well it captures relationship between the top 33 VIP from our database We do not evaluate how it captures relation mentions If a specific relation (e.g. “meeting”) holds between a pair of people X and Y, it is sufficient that the algorithm finds at least one mention of this relation between X and Y
  • 21. Experiments For paraphrase learning we used a training corpus of 98'000 English-language news articles clustered in 22'000 EMM topic clusters published in the period 01/May/2006 – 03/Oct/2006. For testing the method, we used 125'000 English-language news articles published in the period 03/Oct/2006 – 31/Oct/2006. To read the test corpus and the templates in the memory and to build SyntNet+ it took 9 min and 3 sec. It took only 45 seconds to match the 101 syntactic templates against the test corpus of about 1'080'000 parsed sentences. We normalized extracted names using the EMM database
  • 22. Relationship extraction evaluation on the top 33 VIP from the EMM DB Precision Recall F1 0.61 0.56 0.58 meeting 0.57 0.10 0.17 support 0.60 0.32 0.42 overall
  • 23. Using the social network view We run the PageRank algorithm on the automatically extracted “meeting” network and found the top 5 ranked people We compared this ranking with simple frequency-based people ranking
  • 24. Comparing two people ranking schemas Pagerank Frequency C. Rice G.W. Bush G.W. Bush T. Blair V. Putin C. Rice E. Olmert N. al-Maliki T. Blair S. Hussein
  • 25. Conclusions and future work We presented an unsupervised method for social network learning from news clusters We presented very efficient syntactic pattern matching algorithm Automatically learned social networks can be used for some analyst tasks In our future work we will try to consider more types of relations We consider learning and using more abstract patterns