Models
=======

.. automodule:: replay.models

RePlay Recommenders
___________________

.. csv-table::
   :header: "Algorithm", "Implementation"
   :widths: 10, 10

    "Popular Recommender", "PySpark"
    "Popular By Users", "PySpark"
    "Wilson Recommender", "PySpark"
    "Random Recommender", "PySpark"
    "K-Nearest Neighbours", "PySpark"
    "Alternating Least Squares", "PySpark"
    "SLIM", "PySpark"
    "Word2Vec Recommender", "PySpark"
    "Association Rules Item-to-Item Recommender", "PySpark"
    "Cluster Recommender", "PySpark"
    "Neural Matrix Factorization", "Python CPU/GPU"
    "MultVAE", "Python CPU/GPU"
    "ADMM SLIM", "Python CPU"
    "Обертка Implicit", "Python CPU"
    "Обертка LightFM", "Python CPU"


To get more info on how to choose base model, please see this  :doc:`page </pages/useful_data/algorithm_selection>`.

Recommender interface
____________________________

.. autoclass:: replay.models.Recommender
    :members:

.. autoclass:: replay.models.base_rec.BaseRecommender
    :members: optimize
    :noindex: optimize

Distributed models
__________________
Models with both training and inference implemented in pyspark.

Popular Recommender
```````````````````
.. autoclass:: replay.models.PopRec

User Popular Recommender
````````````````````````
.. autoclass:: replay.models.UserPopRec

Wilson Recommender
``````````````````
Confidence interval for binomial distribution can be calculated as:

.. math::
    WilsonScore = \frac{\widehat{p}+\frac{z_{ \frac{\alpha}{2}}^{2}}{2n}\pm z_
    {\frac{\alpha}{2}}\sqrt{\frac{\widehat{p}(1-\widehat{p})+\frac{z_
    {\frac{\alpha}{2}}^{2}}{4n}}{n}} }{1+\frac{z_{ \frac{\alpha}{2}}^{2}}{n}}


Where :math:`\hat{p}` -- is an observed fraction of positive ratings.

:math:`z_{\alpha}` 1-alpha quantile of normal distribution.

.. autoclass:: replay.models.Wilson


Random Recommender
``````````````````
.. autoclass:: replay.models.RandomRec
   :special-members: __init__


K Nearest Neighbours
````````````````````
.. autoclass:: replay.models.KNN
    :special-members: __init__

.. _als-rec:

Alternating Least Squares
`````````````````````````
.. autoclass:: replay.models.ALSWrap
    :special-members: __init__

SLIM
````
SLIM Recommender calculates similarity between objects to produce recommendations :math:`W`.

Loss function is:

.. math::
    L = \frac 12||A - A W||^2_F + \frac \beta 2 ||W||_F^2+
    \lambda
    ||W||_1

:math:`W` -- item similarity matrix

:math:`A` -- interaction matrix

Finding :math:`W` can be splitted into solving separate linear regressions with ElasticNet regularization.
Thus each row in :math:`W` is optimized with

.. math::
    l = \frac 12||a_j - A w_j||^2_2 + \frac \beta 2 ||w_j||_2^2+
    \lambda ||w_j||_1

To remove trivial solution, we add an extra requirements :math:`w_{jj}=0`,
and :math:`w_{ij}\ge 0`


.. autoclass:: replay.models.SLIM
    :special-members: __init__


Word2Vec Recommender
````````````````````
.. autoclass:: replay.models.Word2VecRec
    :special-members: __init__


Association Rules Item-to-Item Recommender
``````````````````````````````````````````
.. autoclass:: replay.models.AssociationRulesItemRec
    :special-members: __init__
    :members: get_nearest_items


Cluster Recommender
```````````````````
.. autoclass:: replay.models.ClusterRec
    :special-members: __init__


Neural models with distributed inference
________________________________________
Models implemented in pytorch with distributed inference in pyspark.

Neural Matrix Factorization
```````````````````````````
.. autoclass:: replay.models.NeuroMF
    :special-members: __init__

Mult-VAE
````````
Variation AutoEncoder

.. image:: /images/vae-gaussian.png

**Problem formulation**

We have a sample of independent equally distributed random values from true distribution
:math:`x_i \sim p_d(x)`, :math:`i = 1, \dots, N`.

Build a probability model :math:`p_\theta(x)` for true distribution :math:`p_d(x)`.

Distribution :math:`p_\theta(x)` allows both to estimate probability density for a given item :math:`x`,
and to sample :math:`x \sim p_\theta(x)`.

**Probability model**

:math:`z \in \mathbb{R}^d` - is a local latent variable, one for each item :math:`x`.

Generative process for variational autoencoder:

1. Sample :math:`z \sim p(z)`.
2. Sample :math:`x \sim p_\theta(x | z)`.

Distribution parameters :math:`p_\theta(x | z)` are defined with neural net weights :math:`\theta`, with input :math:`z`.

Item probability density
:math:`x`:

.. math::
    p_\theta(x) = \mathbb{E}_{z \sim p(z)} p_\theta(x | z)

Use lower estimate bound for the log likelihood.

.. math::
    \log p_\theta(x) = \mathbb{E}_{z \sim q_\phi(z | x)} \log p_\theta(
    x) = \mathbb{E}_{z \sim q_\phi(z | x)} \log \frac{p_\theta(x,
    z) q_\phi(z | x)} {q_\phi(z | x) p_\theta(z | x)} = \\
    = \mathbb{E}_{z
    \sim q_\phi(z | x)} \log \frac{p_\theta(x, z)}{q_\phi(z | x)} + KL(
    q_\phi(z | x) || p_\theta(z | x))

.. math::
    \log p_\theta(x) \geqslant \mathbb{E}_{z \sim q_\phi(z | x)}
    \log \frac{p_\theta(x | z)p(z)}{q_\phi(z | x)} =
    \mathbb{E}_{z \sim q_\phi(z | x)} \log p_\theta(x | z) -
    KL(q_\phi(z | x) || p(z)) = \\
    = L(x; \phi, \theta) \to \max\limits_{\phi, \theta}

:math:`q_\phi(z | x)` is a proposal or a recognition distribution. It is a gaussian with weights :math:`\phi`:
:math:`q_\phi(z | x) = \mathcal{N}(z | \mu_\phi(x), \sigma^2_\phi(x)I)`.

Difference between lower estimate bound :math:`L(x; \phi, \theta)` and log likelihood
:math:`\log p_\theta(x)` - is a KL-divergence between a proposal and aposteriory distribution on :math:`z`:
:math:`KL(q_\phi(z | x) || p_\theta(z | x))`. Maximum value
:math:`L(x; \phi, \theta)` for fixed model parameters
:math:`\theta`
is reached with :math:`q_\phi(z | x) = p_\theta(z | x)`, but explicit calculation of
:math:`p_\theta(z | x)` is not efficient to calculate,
so it is also optimized by :math:`\phi`. The closer :math:`q_\phi(z | x)` to
:math:`p_\theta(z | x)`, the better the estimate.

We usually take normal distribution for :math:`p(z)`:

.. math::
    \varepsilon \sim \mathcal{N}(\varepsilon | 0, I)

.. math::
    z = \mu + \sigma \varepsilon \Rightarrow z \sim \mathcal{N}(z | \mu,
    \sigma^2I)

.. math::
    \frac{\partial}{\partial \phi} L(x; \phi, \theta) = \mathbb{E}_{
    \varepsilon \sim \mathcal{N}(\varepsilon | 0, I)} \frac{\partial}
    {\partial \phi} \log p_\theta(x | \mu_\phi(x) + \sigma_\phi(x)
    \varepsilon) - \frac{\partial}{\partial \phi} KL(q_\phi(z | x) ||
    p(z))

.. math::
    \frac{\partial}{\partial \theta} L(x; \phi, \theta) = \mathbb{E}_{z
    \sim q_\phi(z | x)} \frac{\partial}{\partial \theta} \log
    p_\theta(x | z)

In this case

.. math::
    KL(q_\phi(z | x) || p(z)) = -\frac{1}{2}\sum_{i=1}^{dimZ}(1+
    log(\sigma_i^2) - \mu_i^2-\sigma_i^2)

KL-divergence coefficient can also not be equal to one, in this case:

.. math::
    L(x; \phi, \theta) =
    \mathbb{E}_{z \sim q_\phi(z | x)} \log p_\theta(x | z) -
    \beta \cdot KL(q_\phi(z | x) || p(z)) \to \max\limits_{\phi, \theta}

With :math:`\beta = 0` VAE is the same as the
Denoising AutoEncoder.


.. autoclass:: replay.models.MultVAE
    :special-members: __init__


Wrappers and other models with distributed inference
____________________________________________________
Wrappers for popular recommendation libraries and algorithms
implemented in python with distributed inference in pyspark.

ADMM SLIM
`````````
.. autoclass:: replay.models.ADMMSLIM
    :special-members: __init__

LightFM
```````
.. autoclass:: replay.models.LightFMWrap
    :special-members: __init__

implicit
````````
.. autoclass:: replay.models.ImplicitWrap
    :special-members: __init__