Online Learning for Adversaries with Past Memory

The framework of OCO with memory is:

At each round $t$, player chooses $x_t \in K \subset \mathbb{R}^n$
Then, a loss $f_t: K^{m+1} \rightarrow \mathbb{R}$ is revealed and player suffers loss of $f_t(x_{t-m}, \cdots, x_t)$
Assume $0 \in K$, and $f_t(x_0, \cdots, x_m) \in [0,1]$ for any $x_0, \cdots, x_m \in K$
Assume that after $f_t$ is revealed, the player is aware of the loss she would suffer had she played any sequence of decisions $x_{t-m}, \cdots, x_t$ (counterfactual feedback model)

Minimize policy regret $$R_{T, m} = \sum_{t=m}^T f_t(x_{t-m}, \cdots, x_t) - \min_{x \in K} \sum_{t=m}^T f_t(x, \cdots, x) $$

The functions $f_t$ are convex loss function with memory, i.e. the function $\tilde{f}_t = f_t(x, \cdots, x)$ is convex in $x$.

Algorithm

The algorithm is simply online gradient descent

Online Gradient Descent with Memory: Step size $\eta$, functions $\{f_t\}_{t=m}^T$

Initialize $x_0, \cdots, x_{m-1} \in K$ arbitrarily
for $t=m, \cdots, T$
Play $x_t$, suffer loss $f_t(x_{t-m}, \cdots, x_t)$
Set $x_{t+1} = \Pi_{K}(x_t - \eta\nabla \tilde{f}_t(x))$
end

Regret Analysis

The functions $\{f_t\}_{t=1}^T$ are coordinate-wise lipschitz continuous i.e. for any $j$ $$|f_t(x_0, \cdots, x_j, \cdots, x_m) - f_t(x_0, \cdots,\tilde{x}_j, \cdots, y_m)| \leq L ||x_j - \tilde{x}_j|| $$

Denote $$G = \sup_{t \in \{1, \cdots, T\}, x \in K} ||\nabla \tilde{f}_t(x)||$$ $$D = \sup_{x, y \in K} ||x - y||$$

Regret Guarantee: The algorithm generates a sequence $x_1, \cdots, x_T$ such that $$R_{T, m} \leq \frac{D^2}{\eta} + TG^2\eta + Lm^2\eta GT$$ Furthermore, setting $\eta = \frac{D}{\sqrt{G(G + Lm^2)T}}$ implies that $$R_{T, m} \leq O(D\sqrt{G(G + Lm^2)T})$$

Proof: From OGD analysis we have for lipschitz loss functions (refer to this note) $$\sum_{t=m}^T \tilde{f}_t(x_t) - \min_{x \in K} \sum_{t=m}^T \tilde{f}_t(x) \leq \frac{D^2}{\eta} + TG^2\eta$$

We know from the coordinate-wise lipschitz assumption $$\begin{aligned} |f_t(x_{t-m}, \cdots, x_t) - f_t(x_t, \cdots, x_t)| &\leq L\sum_{j=1}^m||x_t - x_{t-j}|| \leq L\sum_{j=1}^m\sum_{l=1}^j||x_{t-l+1} - x_{t-l}|| \\ &\leq L\sum_{j=1}^m\sum_{l=1}^j\eta||\nabla \tilde{f}_{t-l}(x_{t-l})|| \leq Lm^2\eta G \end{aligned}$$

The first inequality is using the coordinate-wise lipschitz assumption, the second inequality is simply a version of triangle inequality, the third inequality is using the projection lemma (refer to this note) and the final inequality from lipschitzness of $\tilde{f}_t$.

Summing up we get $$|\sum_{t=m}^T f_t(x_{t-m}, \cdots, x_t) - \sum_{t=m}^T f_t(x_t, \cdots, x_t)| \leq Lm^2\eta GT$$ Combining the above with the guarantee we have from OGD analysis we get $$\sum_{t=m}^T f_t(x_{t-m}, \cdots, x_t) - \min_{x \in K} \sum_{t=m}^T f_t(x, \cdots, x) \leq \frac{D^2}{\eta} + TG^2\eta + Lm^2\eta GT$$ $\square$

Online Learning for Adversaries with Past Memory

2020/07/09

Algorithm

Regret Analysis

References