Time Smoothing

time module provides function to apply time smoothing to item or interaction relevance.

replay.time.smoothe_time(log, decay=30, limit=0.1, kind='exp')

Weighs relevance column with a time-dependent weight.

Parameters
  • log (Union[DataFrame, DataFrame]) – interactions log

  • decay (float) – number of days after which the weight is reduced by half, must be grater than 1

  • limit (float) – minimal value the weight can reach

  • kind (str) – type of smoothing, one of [power, exp, linear]. Corresponding functions are power: age^c, exp: c^age, linear: 1-c*age

Returns

modified DataFrame

>>> import pandas as pd
>>> d = {}
>>> d["item_idx"] = [1, 1, 2, 3, 3]
>>> d["timestamp"] = ["2099-03-19", "2099-03-20", "2099-03-22", "2099-03-27", "2099-03-25"]
>>> d["relevance"] = [1, 1, 1, 1, 1]
>>> df = pd.DataFrame(d)
>>> df
   item_idx   timestamp  relevance
0         1  2099-03-19          1
1         1  2099-03-20          1
2         2  2099-03-22          1
3         3  2099-03-27          1
4         3  2099-03-25          1

Power smoothing falls quickly in the beginning but decays slowly afterwards as age^c.

>>> smoothe_time(df, kind="power").orderBy("timestamp").show()
+--------+-------------------+------------------+
|item_idx|          timestamp|         relevance|
+--------+-------------------+------------------+
|       1|2099-03-19 00:00:00|0.6390430306850825|
|       1|2099-03-20 00:00:00| 0.654567945027101|
|       2|2099-03-22 00:00:00|0.6940913454809814|
|       3|2099-03-25 00:00:00|0.7994016704292545|
|       3|2099-03-27 00:00:00|               1.0|
+--------+-------------------+------------------+

Exponential smoothing is the other way around. Old objects decay more quickly as c^age.

>>> smoothe_time(df, kind="exp").orderBy("timestamp").show()
+--------+-------------------+------------------+
|item_idx|          timestamp|         relevance|
+--------+-------------------+------------------+
|       1|2099-03-19 00:00:00|0.8312378961427874|
|       1|2099-03-20 00:00:00|0.8506671609508554|
|       2|2099-03-22 00:00:00| 0.890898718140339|
|       3|2099-03-25 00:00:00|0.9548416039104165|
|       3|2099-03-27 00:00:00|               1.0|
+--------+-------------------+------------------+

Last type is a linear smoothing: 1 - c*age.

>>> smoothe_time(df, kind="linear").orderBy("timestamp").show()
+--------+-------------------+------------------+
|item_idx|          timestamp|         relevance|
+--------+-------------------+------------------+
|       1|2099-03-19 00:00:00|0.8666666666666667|
|       1|2099-03-20 00:00:00|0.8833333333333333|
|       2|2099-03-22 00:00:00|0.9166666666666666|
|       3|2099-03-25 00:00:00|0.9666666666666667|
|       3|2099-03-27 00:00:00|               1.0|
+--------+-------------------+------------------+

These examples use constant relevance 1, so resulting weight equals the time dependent weight. But actually this value is an updated relevance.

>>> d = {}
>>> d["item_idx"] = [1, 2, 3]
>>> d["timestamp"] = ["2099-03-19", "2099-03-20", "2099-03-22"]
>>> d["relevance"] = [10, 3, 0.1]
>>> df = pd.DataFrame(d)
>>> df
   item_idx   timestamp  relevance
0         1  2099-03-19       10.0
1         2  2099-03-20        3.0
2         3  2099-03-22        0.1
>>> smoothe_time(df).orderBy("timestamp").show()
+--------+-------------------+------------------+
|item_idx|          timestamp|         relevance|
+--------+-------------------+------------------+
|       1|2099-03-19 00:00:00| 9.330329915368074|
|       2|2099-03-20 00:00:00|2.8645248117312496|
|       3|2099-03-22 00:00:00|               0.1|
+--------+-------------------+------------------+
replay.time.get_item_recency(log, decay=30, limit=0.1, kind='exp')

Calculate item weight showing when the majority of interactions with this item happened.

Parameters
  • log (Union[DataFrame, DataFrame]) – interactions log

  • decay (float) – number of days after which the weight is reduced by half, must be grater than 1

  • limit (float) – minimal value the weight can reach

  • kind (str) – type of smoothing, one of [power, exp, linear] Corresponding functions are power: age^c, exp: c^age, linear: 1-c*age

Returns

DataFrame with item weights

>>> import pandas as pd
>>> d = {}
>>> d["item_idx"] = [1, 1, 2, 3, 3]
>>> d["timestamp"] = ["2099-03-19", "2099-03-20", "2099-03-22", "2099-03-27", "2099-03-25"]
>>> d["relevance"] = [1, 1, 1, 1, 1]
>>> df = pd.DataFrame(d)
>>> df
   item_idx   timestamp  relevance
0         1  2099-03-19          1
1         1  2099-03-20          1
2         2  2099-03-22          1
3         3  2099-03-27          1
4         3  2099-03-25          1

Age in days is calculated for every item, which is transformed into a weight using some function. There are three types of smoothing types available: power, exp and linear. Each type calculates a parameter c based on the decay argument, so that an item with age==decay has weight 0.5.

Power smoothing falls quickly in the beginning but decays slowly afterwards as age^c.

>>> get_item_recency(df, kind="power").orderBy("item_idx").show()
+--------+-------------------+------------------+
|item_idx|          timestamp|         relevance|
+--------+-------------------+------------------+
|       1|2099-03-19 12:00:00|0.6632341020947187|
|       2|2099-03-22 00:00:00|0.7203662792445817|
|       3|2099-03-26 00:00:00|               1.0|
+--------+-------------------+------------------+

Exponential smoothing is the other way around. Old objects decay more quickly as c^age.

>>> get_item_recency(df, kind="exp").orderBy("item_idx").show()
+--------+-------------------+------------------+
|item_idx|          timestamp|         relevance|
+--------+-------------------+------------------+
|       1|2099-03-19 12:00:00|0.8605514372443298|
|       2|2099-03-22 00:00:00|0.9117224885582166|
|       3|2099-03-26 00:00:00|               1.0|
+--------+-------------------+------------------+

Last type is a linear smoothing: 1 - c*age.

>>> get_item_recency(df, kind="linear").orderBy("item_idx").show()
+--------+-------------------+------------------+
|item_idx|          timestamp|         relevance|
+--------+-------------------+------------------+
|       1|2099-03-19 12:00:00|0.8916666666666666|
|       2|2099-03-22 00:00:00|0.9333333333333333|
|       3|2099-03-26 00:00:00|               1.0|
+--------+-------------------+------------------+

This function does not take relevance values of interactions into account. Only item age is used.