SlideShare una empresa de Scribd logo
1 de 12
NAME : DEEPALI RAIKAR ROLL NO : 11150157 MSC.IT(PART – I )
SIGNATURE FILES
Typically “SIGNATURE FILE” is just a “BAG OF WORDS” Signature files is a technique applied for “Document Retrieval”. The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user. This is done by creating a signature for each document.
A signature is created as an “abstraction” of a document. A signature is a compressed version of a database. All signatures that represent the documents  are kept in a file called “SIGNATURE FILES”. The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.
Characteristics of signature file Word oriented index structure Low overhead Suitable for not very large text Suitable for conventional databases For most applications inverted files       outperform the signature file.
There are various types of signatures, namely : Word signatures Is a fixed-length bit-string representation of word Document Signatures Query Signatures
How Word Signatures are  generated Using “TRIPLETS” of word. Each word is divided into the overlapping      triplet of characters triplet is given some numeric value Use the number as the input to the Hash Function The hash function produces a number  which represents the bit position of the triplet in the word signature
Example of a word signature 111000111001 is a signature created for word “SIGNATURE” RE* *SI SIG IGN GNA NAT ATU TUR URE 12        3           7          3           2         9           1         12        8    Numeric value  of each triplet 111000111001 final word signature generated  using hash function
Document signature Can be created using two methods Concatenation of word signature Superimposed coding Characteristics of Document signatures The length can vary A fixed number of bits may precede Fixing the length of the document signature is possible The length can be set to the longest document in the collection For shorter documents extra “0” can be added.
Example of signature file
Which is better	 inverted file or signature file Inverted Files Accurate Easy to maintain Slow retrieval  Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”
Signature files

Más contenido relacionado

La actualidad más candente

Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
KU Leuven
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
KU Leuven
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 

La actualidad más candente (20)

Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
IRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptxIRS-Cataloging and Indexing-2.1.pptx
IRS-Cataloging and Indexing-2.1.pptx
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Web search vs ir
Web search vs irWeb search vs ir
Web search vs ir
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
The impact of web on ir
The impact of web on irThe impact of web on ir
The impact of web on ir
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
Tesxt mining
Tesxt miningTesxt mining
Tesxt mining
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Text categorization
Text categorizationText categorization
Text categorization
 
automatic classification in information retrieval
automatic classification in information retrievalautomatic classification in information retrieval
automatic classification in information retrieval
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
similarity measure
similarity measure similarity measure
similarity measure
 
Information retrieval (introduction)
Information  retrieval (introduction) Information  retrieval (introduction)
Information retrieval (introduction)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 

Destacado

A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...
Mumbai Academisc
 

Destacado (7)

A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...A signature based indexing method for efficient content-based retrieval of re...
A signature based indexing method for efficient content-based retrieval of re...
 
E tutorial - digital signature
E tutorial - digital signatureE tutorial - digital signature
E tutorial - digital signature
 
R-Trees and Geospatial Data Structures
R-Trees and Geospatial Data StructuresR-Trees and Geospatial Data Structures
R-Trees and Geospatial Data Structures
 
B-tree & R-tree
B-tree & R-treeB-tree & R-tree
B-tree & R-tree
 
RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC RTree Spatial Indexing with MongoDB - MongoDC
RTree Spatial Indexing with MongoDB - MongoDC
 
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSWResumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
Resumes Suck! 7 Ways to Find a Job in Social Media from 2016 SXSW
 
Digital Signature
Digital SignatureDigital Signature
Digital Signature
 

Último

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Último (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

Signature files

  • 1. NAME : DEEPALI RAIKAR ROLL NO : 11150157 MSC.IT(PART – I )
  • 3. Typically “SIGNATURE FILE” is just a “BAG OF WORDS” Signature files is a technique applied for “Document Retrieval”. The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user. This is done by creating a signature for each document.
  • 4. A signature is created as an “abstraction” of a document. A signature is a compressed version of a database. All signatures that represent the documents are kept in a file called “SIGNATURE FILES”. The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.
  • 5. Characteristics of signature file Word oriented index structure Low overhead Suitable for not very large text Suitable for conventional databases For most applications inverted files outperform the signature file.
  • 6. There are various types of signatures, namely : Word signatures Is a fixed-length bit-string representation of word Document Signatures Query Signatures
  • 7. How Word Signatures are generated Using “TRIPLETS” of word. Each word is divided into the overlapping triplet of characters triplet is given some numeric value Use the number as the input to the Hash Function The hash function produces a number which represents the bit position of the triplet in the word signature
  • 8. Example of a word signature 111000111001 is a signature created for word “SIGNATURE” RE* *SI SIG IGN GNA NAT ATU TUR URE 12 3 7 3 2 9 1 12 8 Numeric value of each triplet 111000111001 final word signature generated using hash function
  • 9. Document signature Can be created using two methods Concatenation of word signature Superimposed coding Characteristics of Document signatures The length can vary A fixed number of bits may precede Fixing the length of the document signature is possible The length can be set to the longest document in the collection For shorter documents extra “0” can be added.
  • 11. Which is better inverted file or signature file Inverted Files Accurate Easy to maintain Slow retrieval Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”