Time Smoothing
time
module provides function to apply time smoothing to item or interaction relevance.
- replay.time.smoothe_time(log, decay=30, limit=0.1, kind='exp')
Weighs
relevance
column with a time-dependent weight.- Parameters
log (
Union
[DataFrame
,DataFrame
]) – interactions logdecay (
float
) – number of days after which the weight is reduced by half, must be grater than 1limit (
float
) – minimal value the weight can reachkind (
str
) – type of smoothing, one of [power, exp, linear]. Corresponding functions arepower
:age^c
,exp
:c^age
,linear
:1-c*age
- Returns
modified DataFrame
>>> import pandas as pd >>> d = {} >>> d["item_idx"] = [1, 1, 2, 3, 3] >>> d["timestamp"] = ["2099-03-19", "2099-03-20", "2099-03-22", "2099-03-27", "2099-03-25"] >>> d["relevance"] = [1, 1, 1, 1, 1] >>> df = pd.DataFrame(d) >>> df item_idx timestamp relevance 0 1 2099-03-19 1 1 1 2099-03-20 1 2 2 2099-03-22 1 3 3 2099-03-27 1 4 3 2099-03-25 1
Power smoothing falls quickly in the beginning but decays slowly afterwards as
age^c
.>>> smoothe_time(df, kind="power").orderBy("timestamp").show() +--------+-------------------+------------------+ |item_idx| timestamp| relevance| +--------+-------------------+------------------+ | 1|2099-03-19 00:00:00|0.6390430306850825| | 1|2099-03-20 00:00:00| 0.654567945027101| | 2|2099-03-22 00:00:00|0.6940913454809814| | 3|2099-03-25 00:00:00|0.7994016704292545| | 3|2099-03-27 00:00:00| 1.0| +--------+-------------------+------------------+
Exponential smoothing is the other way around. Old objects decay more quickly as
c^age
.>>> smoothe_time(df, kind="exp").orderBy("timestamp").show() +--------+-------------------+------------------+ |item_idx| timestamp| relevance| +--------+-------------------+------------------+ | 1|2099-03-19 00:00:00|0.8312378961427874| | 1|2099-03-20 00:00:00|0.8506671609508554| | 2|2099-03-22 00:00:00| 0.890898718140339| | 3|2099-03-25 00:00:00|0.9548416039104165| | 3|2099-03-27 00:00:00| 1.0| +--------+-------------------+------------------+
Last type is a linear smoothing:
1 - c*age
.>>> smoothe_time(df, kind="linear").orderBy("timestamp").show() +--------+-------------------+------------------+ |item_idx| timestamp| relevance| +--------+-------------------+------------------+ | 1|2099-03-19 00:00:00|0.8666666666666667| | 1|2099-03-20 00:00:00|0.8833333333333333| | 2|2099-03-22 00:00:00|0.9166666666666666| | 3|2099-03-25 00:00:00|0.9666666666666667| | 3|2099-03-27 00:00:00| 1.0| +--------+-------------------+------------------+
These examples use constant relevance 1, so resulting weight equals the time dependent weight. But actually this value is an updated relevance.
>>> d = {} >>> d["item_idx"] = [1, 2, 3] >>> d["timestamp"] = ["2099-03-19", "2099-03-20", "2099-03-22"] >>> d["relevance"] = [10, 3, 0.1] >>> df = pd.DataFrame(d) >>> df item_idx timestamp relevance 0 1 2099-03-19 10.0 1 2 2099-03-20 3.0 2 3 2099-03-22 0.1 >>> smoothe_time(df).orderBy("timestamp").show() +--------+-------------------+------------------+ |item_idx| timestamp| relevance| +--------+-------------------+------------------+ | 1|2099-03-19 00:00:00| 9.330329915368074| | 2|2099-03-20 00:00:00|2.8645248117312496| | 3|2099-03-22 00:00:00| 0.1| +--------+-------------------+------------------+
- replay.time.get_item_recency(log, decay=30, limit=0.1, kind='exp')
Calculate item weight showing when the majority of interactions with this item happened.
- Parameters
log (
Union
[DataFrame
,DataFrame
]) – interactions logdecay (
float
) – number of days after which the weight is reduced by half, must be grater than 1limit (
float
) – minimal value the weight can reachkind (
str
) – type of smoothing, one of [power, exp, linear] Corresponding functions arepower
:age^c
,exp
:c^age
,linear
:1-c*age
- Returns
DataFrame with item weights
>>> import pandas as pd >>> d = {} >>> d["item_idx"] = [1, 1, 2, 3, 3] >>> d["timestamp"] = ["2099-03-19", "2099-03-20", "2099-03-22", "2099-03-27", "2099-03-25"] >>> d["relevance"] = [1, 1, 1, 1, 1] >>> df = pd.DataFrame(d) >>> df item_idx timestamp relevance 0 1 2099-03-19 1 1 1 2099-03-20 1 2 2 2099-03-22 1 3 3 2099-03-27 1 4 3 2099-03-25 1
Age in days is calculated for every item, which is transformed into a weight using some function. There are three types of smoothing types available: power, exp and linear. Each type calculates a parameter
c
based on thedecay
argument, so that an item withage==decay
has weight 0.5.Power smoothing falls quickly in the beginning but decays slowly afterwards as
age^c
.>>> get_item_recency(df, kind="power").orderBy("item_idx").show() +--------+-------------------+------------------+ |item_idx| timestamp| relevance| +--------+-------------------+------------------+ | 1|2099-03-19 12:00:00|0.6632341020947187| | 2|2099-03-22 00:00:00|0.7203662792445817| | 3|2099-03-26 00:00:00| 1.0| +--------+-------------------+------------------+
Exponential smoothing is the other way around. Old objects decay more quickly as
c^age
.>>> get_item_recency(df, kind="exp").orderBy("item_idx").show() +--------+-------------------+------------------+ |item_idx| timestamp| relevance| +--------+-------------------+------------------+ | 1|2099-03-19 12:00:00|0.8605514372443298| | 2|2099-03-22 00:00:00|0.9117224885582166| | 3|2099-03-26 00:00:00| 1.0| +--------+-------------------+------------------+
Last type is a linear smoothing:
1 - c*age
.>>> get_item_recency(df, kind="linear").orderBy("item_idx").show() +--------+-------------------+------------------+ |item_idx| timestamp| relevance| +--------+-------------------+------------------+ | 1|2099-03-19 12:00:00|0.8916666666666666| | 2|2099-03-22 00:00:00|0.9333333333333333| | 3|2099-03-26 00:00:00| 1.0| +--------+-------------------+------------------+
This function does not take relevance values of interactions into account. Only item age is used.