The framework of OCO with memory is:
- At each round $t$, player chooses $x_t \in K \subset \mathbb{R}^n$
- Then, a loss $f_t: K^{m+1} \rightarrow \mathbb{R}$ is revealed and player suffers loss of $f_t(x_{t-m}, \cdots, x_t)$
- Assume $0 \in K$, and $f_t(x_0, \cdots, x_m) \in [0,1]$ for any $x_0, \cdots, x_m \in K$
- Assume that after $f_t$ is revealed, the player is aware of the loss she would suffer had she played any sequence of decisions $x_{t-m}, \cdots, x_t$ (counterfactual feedback model)
Minimize policy regret $$R_{T, m} = \sum_{t=m}^T f_t(x_{t-m}, \cdots, x_t) - \min_{x \in K} \sum_{t=m}^T f_t(x, \cdots, x) $$
The functions $f_t$ are convex loss function with memory, i.e. the function $\tilde{f}_t = f_t(x, \cdots, x)$ is convex in $x$.
Algorithm
The algorithm is simply online gradient descent
Online Gradient Descent with Memory: Step size $\eta$, functions $\{f_t\}_{t=m}^T$
- Initialize $x_0, \cdots, x_{m-1} \in K$ arbitrarily
- for $t=m, \cdots, T$
- Play $x_t$, suffer loss $f_t(x_{t-m}, \cdots, x_t)$
- Set $x_{t+1} = \Pi_{K}(x_t - \eta\nabla \tilde{f}_t(x))$
- end
Regret Analysis
The functions $\{f_t\}_{t=1}^T$ are coordinate-wise lipschitz continuous i.e. for any $j$ $$|f_t(x_0, \cdots, x_j, \cdots, x_m) - f_t(x_0, \cdots,\tilde{x}_j, \cdots, y_m)| \leq L ||x_j - \tilde{x}_j|| $$
Denote $$G = \sup_{t \in \{1, \cdots, T\}, x \in K} ||\nabla \tilde{f}_t(x)||$$ $$D = \sup_{x, y \in K} ||x - y||$$
Regret Guarantee: The algorithm generates a sequence $x_1, \cdots, x_T$ such that $$R_{T, m} \leq \frac{D^2}{\eta} + TG^2\eta + Lm^2\eta GT$$ Furthermore, setting $\eta = \frac{D}{\sqrt{G(G + Lm^2)T}}$ implies that $$R_{T, m} \leq O(D\sqrt{G(G + Lm^2)T})$$
Proof: From OGD analysis we have for lipschitz loss functions (refer to this note) $$\sum_{t=m}^T \tilde{f}_t(x_t) - \min_{x \in K} \sum_{t=m}^T \tilde{f}_t(x) \leq \frac{D^2}{\eta} + TG^2\eta$$
We know from the coordinate-wise lipschitz assumption $$\begin{aligned} |f_t(x_{t-m}, \cdots, x_t) - f_t(x_t, \cdots, x_t)| &\leq L\sum_{j=1}^m||x_t - x_{t-j}|| \leq L\sum_{j=1}^m\sum_{l=1}^j||x_{t-l+1} - x_{t-l}|| \\ &\leq L\sum_{j=1}^m\sum_{l=1}^j\eta||\nabla \tilde{f}_{t-l}(x_{t-l})|| \leq Lm^2\eta G \end{aligned}$$
The first inequality is using the coordinate-wise lipschitz assumption, the second inequality is simply a version of triangle inequality, the third inequality is using the projection lemma (refer to this note) and the final inequality from lipschitzness of $\tilde{f}_t$.
Summing up we get $$|\sum_{t=m}^T f_t(x_{t-m}, \cdots, x_t) - \sum_{t=m}^T f_t(x_t, \cdots, x_t)| \leq Lm^2\eta GT$$ Combining the above with the guarantee we have from OGD analysis we get $$\sum_{t=m}^T f_t(x_{t-m}, \cdots, x_t) - \min_{x \in K} \sum_{t=m}^T f_t(x, \cdots, x) \leq \frac{D^2}{\eta} + TG^2\eta + Lm^2\eta GT$$ $\square$