Turns out we can do better √T regret we saw for OGD analysis for convex lipschitz losses if they possess some curvature that can be exploited.
Strong Convexity
f is μ-strongly convex over convex set V w.r.t norm ||⋅|| if ∀x,y∈V and g∈∂f(x) f(y)≥f(x)+⟨g,y−x⟩+μ2||x−y||2
In other words, a strongly convex function is lower bounded by a quadratic (instead of linear like convex functions). Hence, we have a tighter lower bound. Actually, several possible quadratic lower bounds since there can be more than one subgradient.
Lemma: If f is twice-differentiable then a sufficient condition for μ-strong convexity is that ∀x,y,∈V we have ⟨∇2f(x)y,y⟩≥μ||y||2
Regret Analysis
Theorem: Assume losses ℓt are μt-strongly convex w.r.t ||⋅||2 where μt>0. Using OGD with stepsizes equal to ηt=1∑ti=1μi. Then for any u∈V, T∑t=1(ℓt(xt)−ℓt(u))≤12T∑t=1ηt||gt||22
Proof: Observe that η1=1μ1⟹12η1−μ12=0⟹12ηt−μt2=12ηt−1 for t=2,⋯,T. We can use the instantaneous regret lemma (refer to this note) and the strong-convexity condition to get T∑t=1(ℓt(xt)−ℓt(u))≤T∑t=112ηt||xt−u||22−12ηt||xt+1−u||22−μt2||xt−u||22+ηt2||gt||22=−12η1||x2−u||22+T∑t=2(12ηt−1||xt−u||22−12ηt||xt+1−u||22)+T∑t=1ηt2||gt||22=12T∑t=1ηt||gt||22 ◻
We can obtain a more interpretable regret upper bound if the loss functions are lipschitz in addition to being strongly convex.
Corollary: If we have μt=μ>0 for all t and ℓt is L-lipschitz w.r.t ||⋅||2 for t=1,⋯T. Then T∑t=1(ℓt(xt)−ℓt(u))≤L22μ(1+logT)
Proof: ℓt is L-lipschitz gives us ||gt||22≤L2 and thus, using the above theorem we have T∑t=1(ℓt(xt)−ℓt(u))≤L22μT∑t=11t We can bound this using the inequality ∑Tt=11t≤1+logT to get T∑t=1(ℓt(xt)−ℓt(u))≤L22μ(1+logT) ◻
Thus, OGD with strongly-convex and lipschitz loss functions has regret with a logarithmic dependence on T! Much better than √T dependence for convex, lipschitz loss functions.
Very important to see that corollary requires bounded domain otherwise loss functions will not be lipschitz given that they are also strongly-convex.