In a multi-armed bandit, the objective is to maximize the expected cumulative reward. This objective is usually (equivalently?) stated in terms of expected cumulative regret.
Question: Why not just deal with the reward? Why formulate the objective in terms of regret?
I would like to perform the following function on a dataframe.
Calculate the cumulative sum of a column, notice:
It looks at the previous index only, not including the current one, e.g. the very first one will be zero as there is no previous data to look at.
When it doesn’t cumulate, e.g the increment is zero, it restarts the count.
Number Cumulative 0 1 0 1 1 1 2 1 2 3 0 3 4 0 0 5 1 0 6 1 1 7 0 2
I know there is an expanding function, but it doesnt restart when it sees zero