Compare Results
- class replay.experiment.Experiment(test, metrics, calc_median=False, calc_conf_interval=None)
This class calculates and stores metric values. Initialize it with test data and a dictionary mapping metrics to their depth cut-offs.
Results are available with
pandas_df
attribute.Example:
>>> import pandas as pd >>> from replay.metrics import Coverage, NDCG, Precision, Surprisal >>> log = pd.DataFrame({"user_idx": [2, 2, 2, 1], "item_idx": [1, 2, 3, 3], "relevance": [5, 5, 5, 5]}) >>> test = pd.DataFrame({"user_idx": [1, 1, 1], "item_idx": [1, 2, 3], "relevance": [5, 3, 4]}) >>> pred = pd.DataFrame({"user_idx": [1, 1, 1], "item_idx": [4, 1, 3], "relevance": [5, 4, 5]}) >>> recs = pd.DataFrame({"user_idx": [1, 1, 1], "item_idx": [1, 4, 5], "relevance": [5, 4, 5]}) >>> ex = Experiment(test, {NDCG(): [2, 3], Surprisal(log): 3}) >>> ex.add_result("baseline", recs) >>> ex.add_result("model", pred) >>> ex.results NDCG@2 NDCG@3 Surprisal@3 baseline 0.613147 0.469279 1.000000 model 0.386853 0.530721 0.666667 >>> ex.compare("baseline") NDCG@2 NDCG@3 Surprisal@3 baseline – – – model -36.91% 13.09% -33.33% >>> ex = Experiment(test, {Precision(): [3]}, calc_median=True, calc_conf_interval=0.95) >>> ex.add_result("baseline", recs) >>> ex.add_result("model", pred) >>> ex.results Precision@3 Precision@3_median Precision@3_0.95_conf_interval baseline 0.333333 0.333333 0.0 model 0.666667 0.666667 0.0 >>> ex = Experiment(test, {Coverage(log): 3}, calc_median=True, calc_conf_interval=0.95) >>> ex.add_result("baseline", recs) >>> ex.add_result("model", pred) >>> ex.results Coverage@3 Coverage@3_median Coverage@3_0.95_conf_interval baseline 1.0 1.0 0.0 model 1.0 1.0 0.0
- __init__(test, metrics, calc_median=False, calc_conf_interval=None)
- Parameters
test (
Any
) – test DataFramemetrics (
Dict
[Metric
,Union
[Iterable
[int
],int
]]) – Dictionary of metrics to calculate. Key – metric, value –int
or a list of ints.calc_median (
bool
) – flag to calculate median value across userscalc_conf_interval (
Optional
[float
]) – quantile value for the calculation of the confidence interval. Resulting value is the half of confidence interval.
- add_result(name, pred)
Calculate metrics for predictions
- Parameters
name (
str
) – name of the run to store in the resulting DataFramepred (
Any
) – model recommendations
- Return type
None
- compare(name)
Show results as a percentage difference to record
name
.- Parameters
name (
str
) – name of the baseline record- Return type
DataFrame
- Returns
results table in a percentage format