Set-theoretic models represent documents as sets of words or phrases. Similarities are usually derived from set-theoretic operations on those sets.
Algebraic models represent documents and queries usually as vectors, matrices, or tuples. The similarity of the query vector and document vector is represented as a scalar value
Probabilistic models treat the process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query.
In fuzzy-set theory, an element has a varying degree of membership, say dA, to a given set A instead of the traditional membership choice (is an element/is not an element).
A statistical language model assigns a probability to a sequence of m words by means of a probability distribution.
Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.