SlideShare a Scribd company logo
1 of 1
Impulse Technologies
                                      Beacons U to World of technology
        044-42133143, 98401 03301,9841091117 ieeeprojects@yahoo.com www.impulse.net.in
      Efficient and Effective Duplicate Detection in Hierarchical Data
   Abstract
          Although there is a long line of work on identifying duplicates in relational
   data, only a few solutions focus on duplicate detection in more complex
   hierarchical structures, like XML data. In this paper, we present a novel method for
   XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to
   determine the probability of two XML elements being duplicates, considering not
   only the information within the elements, but also the way that information is
   structured. In addition, to improve the efficiency of the network evaluation, a novel
   pruning strategy, capable of significant gains over the unoptimized version of the
   algorithm, is presented. Through experiments, we show that our algorithm is able
   to achieve high precision and recall scores in several datasets. XMLDup is also
   able to outperform another state of the art duplicate detection solution, both in
   terms of efficiency and of effectiveness. Finally, we also study how important the
   structure of elements is in the duplicate detection process. We observe that, not
   only structure can clearly influence the outcome, but also that, by ensuring a
   structure that is adequate to the characteristics of the data, we can actually improve
   the quality of the results.




  Your Own Ideas or Any project from any company can be Implemented
at Better price (All Projects can be done in Java or DotNet whichever the student wants)
                                                                                          1

More Related Content

What's hot

Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkagePapitha Velumani
 
Master Thesis Abstract
Master Thesis AbstractMaster Thesis Abstract
Master Thesis AbstractBruno Dzogovic
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKMULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKNexgen Technology
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data setsIjripublishers Ijri
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEEFINALYEARSTUDENTPROJECTS
 

What's hot (10)

Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkage
 
Master Thesis Abstract
Master Thesis AbstractMaster Thesis Abstract
Master Thesis Abstract
 
Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
Spe165 t
Spe165 tSpe165 t
Spe165 t
 
Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORKMULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
MULTILABEL CLASSIFICATION VIA CO-EVOLUTIONARY MULTILABEL HYPERNETWORK
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
 

Viewers also liked

Metodología
MetodologíaMetodología
MetodologíaZevas AJa
 
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014Adi Yoffe
 
Flipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital PlatformFlipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital PlatformHoney de Peralta
 
Sri Lanka: The Land of Serendip
Sri Lanka:  The Land of SerendipSri Lanka:  The Land of Serendip
Sri Lanka: The Land of SerendipMarian Jensen
 
Ancillary film magazine
Ancillary film magazineAncillary film magazine
Ancillary film magazinehannahodlin
 
Wallace and Gromit Exhibition
Wallace and Gromit ExhibitionWallace and Gromit Exhibition
Wallace and Gromit ExhibitionJade Delaney
 
Mon portfolio - JB THOMAS
Mon portfolio - JB THOMASMon portfolio - JB THOMAS
Mon portfolio - JB THOMASjbthomas38
 
Deutsche sprache
Deutsche  spracheDeutsche  sprache
Deutsche spracheAnddel
 
Ibogaine
IbogaineIbogaine
IbogaineiVeada
 
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメントサイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメントCicom Brains Inc.
 
Draft film poster 2
Draft film poster 2Draft film poster 2
Draft film poster 2hannahodlin
 
Film noir target audience
Film noir target audienceFilm noir target audience
Film noir target audienceSFDobson94
 
Film production risk assessment
Film production risk assessmentFilm production risk assessment
Film production risk assessmenthannahodlin
 

Viewers also liked (18)

Favourites 6 vo
Favourites 6 voFavourites 6 vo
Favourites 6 vo
 
Metodología
MetodologíaMetodología
Metodología
 
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
ההיררכיות מתפרקות | מתוך מדד המותגים של גלובס 2014
 
Flipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital PlatformFlipside's PEPCON presentation - Migrating Content into the Digital Platform
Flipside's PEPCON presentation - Migrating Content into the Digital Platform
 
Sri Lanka: The Land of Serendip
Sri Lanka:  The Land of SerendipSri Lanka:  The Land of Serendip
Sri Lanka: The Land of Serendip
 
Drogas
DrogasDrogas
Drogas
 
Ancillary film magazine
Ancillary film magazineAncillary film magazine
Ancillary film magazine
 
Wallace and Gromit Exhibition
Wallace and Gromit ExhibitionWallace and Gromit Exhibition
Wallace and Gromit Exhibition
 
Mon portfolio - JB THOMAS
Mon portfolio - JB THOMASMon portfolio - JB THOMAS
Mon portfolio - JB THOMAS
 
Deutsche sprache
Deutsche  spracheDeutsche  sprache
Deutsche sprache
 
Ibogaine
IbogaineIbogaine
Ibogaine
 
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメントサイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
サイコム・ブレインズ株式会社 国内におけるグローバル化支援&異文化マネジメント
 
Documento
DocumentoDocumento
Documento
 
Manusia purba
Manusia purbaManusia purba
Manusia purba
 
Draft film poster 2
Draft film poster 2Draft film poster 2
Draft film poster 2
 
Film noir target audience
Film noir target audienceFilm noir target audience
Film noir target audience
 
Articulo triboniano caceres
Articulo triboniano caceresArticulo triboniano caceres
Articulo triboniano caceres
 
Film production risk assessment
Film production risk assessmentFilm production risk assessment
Film production risk assessment
 

Similar to 3

RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESijwscjournal
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESijwscjournal
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarityitrejos
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Dotnet a graph-based consensus maximization approach for combining multiple ...
Dotnet  a graph-based consensus maximization approach for combining multiple ...Dotnet  a graph-based consensus maximization approach for combining multiple ...
Dotnet a graph-based consensus maximization approach for combining multiple ...Ecwaytech
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecway2004
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...ecwayprojects
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytechnoz
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytech
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwayt
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwaytechnoz
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...Ecwayt
 

Similar to 3 (20)

K04302082087
K04302082087K04302082087
K04302082087
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULES
 
RELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULESRELATIONAL STORAGE FOR XML RULES
RELATIONAL STORAGE FOR XML RULES
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Marvin_Capstone
Marvin_CapstoneMarvin_Capstone
Marvin_Capstone
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
Zhao huang deep sim deep learning code functional similarity
Zhao huang deep sim   deep learning code functional similarityZhao huang deep sim   deep learning code functional similarity
Zhao huang deep sim deep learning code functional similarity
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
2
22
2
 
2
22
2
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
Dotnet a graph-based consensus maximization approach for combining multiple ...
Dotnet  a graph-based consensus maximization approach for combining multiple ...Dotnet  a graph-based consensus maximization approach for combining multiple ...
Dotnet a graph-based consensus maximization approach for combining multiple ...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 
A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...A graph based consensus maximization approach for combining multiple supervis...
A graph based consensus maximization approach for combining multiple supervis...
 

More from IMPULSE_TECHNOLOGY (20)

17
1717
17
 
16
1616
16
 
15
1515
15
 
25
2525
25
 
24
2424
24
 
23
2323
23
 
22
2222
22
 
21
2121
21
 
20
2020
20
 
19
1919
19
 
18
1818
18
 
16
1616
16
 
15
1515
15
 
14
1414
14
 
13
1313
13
 
12
1212
12
 
11
1111
11
 
10
1010
10
 
9
99
9
 
8
88
8
 

Recently uploaded

Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 

Recently uploaded (20)

Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 

3

  • 1. Impulse Technologies Beacons U to World of technology 044-42133143, 98401 03301,9841091117 ieeeprojects@yahoo.com www.impulse.net.in Efficient and Effective Duplicate Detection in Hierarchical Data Abstract Although there is a long line of work on identifying duplicates in relational data, only a few solutions focus on duplicate detection in more complex hierarchical structures, like XML data. In this paper, we present a novel method for XML duplicate detection, called XMLDup. XMLDup uses a Bayesian network to determine the probability of two XML elements being duplicates, considering not only the information within the elements, but also the way that information is structured. In addition, to improve the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup is also able to outperform another state of the art duplicate detection solution, both in terms of efficiency and of effectiveness. Finally, we also study how important the structure of elements is in the duplicate detection process. We observe that, not only structure can clearly influence the outcome, but also that, by ensuring a structure that is adequate to the characteristics of the data, we can actually improve the quality of the results. Your Own Ideas or Any project from any company can be Implemented at Better price (All Projects can be done in Java or DotNet whichever the student wants) 1