Editing Information retrieval (section)

=== First dimension: mathematical basis ===
* ''Set-theoretic'' models represent documents as [[Set (mathematics)|set]]s of words or phrases. Similarities are usually derived from set-theoretic operations on those sets. Common models are:
** [[Standard Boolean model]]
** [[Extended Boolean model]]
** [[Fuzzy retrieval]]
* ''Algebraic models'' represent documents and queries usually as vectors, matrices, or tuples. The similarity of the query vector and document vector is represented as a scalar value.
** [[Vector space model]]
** [[Generalized vector space model]]
** [[Topic-based vector space model|(Enhanced) Topic-based Vector Space Model]]
** [[Extended Boolean model]]
** [[Latent semantic indexing]] a.k.a. [[latent semantic analysis]]
* ''Probabilistic models'' treat the process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query. Probabilistic theorems like [[Bayes' theorem]] are often used in these models.
** [[Binary Independence Model]]
** [[Probabilistic relevance model]] on which is based the [[Probabilistic relevance model (BM25)|okapi (BM25)]] relevance function
** [[Uncertain inference]]
** [[Language model]]s
** [[Divergence-from-randomness model]]
** [[Latent Dirichlet allocation]]
* ''Feature-based retrieval models'' view documents as vectors of values of ''feature functions'' (or just ''features'') and seek the best way to combine these features into a single relevance score, typically by [[learning to rank]] methods. Feature functions are arbitrary functions of document and query, and as such can easily incorporate almost any other retrieval model as just another feature.