Using Bandit Algorithms on Changing Reward Rates

One of the problems we have at System1 is updating our estimate of a feature’s performance over time. Even if our initial estimate is correct, the performance of the feature could change at a future point.