3. Typically “SIGNATURE FILE” is just a “BAG OF WORDS” Signature files is a technique applied for “Document Retrieval”. The main idea behind Signature Files is to create a quick link to the documents which match the queries passed by the user. This is done by creating a signature for each document.
4. A signature is created as an “abstraction” of a document. A signature is a compressed version of a database. All signatures that represent the documents are kept in a file called “SIGNATURE FILES”. The signatures created are stored in the form of “HASH TABLES” to make it easy for retrieving the documents.
5. Characteristics of signature file Word oriented index structure Low overhead Suitable for not very large text Suitable for conventional databases For most applications inverted files outperform the signature file.
6. There are various types of signatures, namely : Word signatures Is a fixed-length bit-string representation of word Document Signatures Query Signatures
7. How Word Signatures are generated Using “TRIPLETS” of word. Each word is divided into the overlapping triplet of characters triplet is given some numeric value Use the number as the input to the Hash Function The hash function produces a number which represents the bit position of the triplet in the word signature
8. Example of a word signature 111000111001 is a signature created for word “SIGNATURE” RE* *SI SIG IGN GNA NAT ATU TUR URE 12 3 7 3 2 9 1 12 8 Numeric value of each triplet 111000111001 final word signature generated using hash function
9. Document signature Can be created using two methods Concatenation of word signature Superimposed coding Characteristics of Document signatures The length can vary A fixed number of bits may precede Fixing the length of the document signature is possible The length can be set to the longest document in the collection For shorter documents extra “0” can be added.
11. Which is better inverted file or signature file Inverted Files Accurate Easy to maintain Slow retrieval Inverted files is the most popular storage structure for “INFORMATION RETRIEVAL”