Processing math: 100%

Online Gradient Descent with Strongly Convex Functions

2020/07/03

Reading time: 2 minutes.

Turns out we can do better T regret we saw for OGD analysis for convex lipschitz losses if they possess some curvature that can be exploited.

Strong Convexity

f is μ-strongly convex over convex set V w.r.t norm |||| if x,yV and gf(x) f(y)f(x)+g,yx+μ2||xy||2

In other words, a strongly convex function is lower bounded by a quadratic (instead of linear like convex functions). Hence, we have a tighter lower bound. Actually, several possible quadratic lower bounds since there can be more than one subgradient.

Lemma: If f is twice-differentiable then a sufficient condition for μ-strong convexity is that x,y,V we have 2f(x)y,yμ||y||2

Regret Analysis

Theorem: Assume losses t are μt-strongly convex w.r.t ||||2 where μt>0. Using OGD with stepsizes equal to ηt=1ti=1μi. Then for any uV, Tt=1(t(xt)t(u))12Tt=1ηt||gt||22

Proof: Observe that η1=1μ112η1μ12=012ηtμt2=12ηt1 for t=2,,T. We can use the instantaneous regret lemma (refer to this note) and the strong-convexity condition to get Tt=1(t(xt)t(u))Tt=112ηt||xtu||2212ηt||xt+1u||22μt2||xtu||22+ηt2||gt||22=12η1||x2u||22+Tt=2(12ηt1||xtu||2212ηt||xt+1u||22)+Tt=1ηt2||gt||22=12Tt=1ηt||gt||22

We can obtain a more interpretable regret upper bound if the loss functions are lipschitz in addition to being strongly convex.

Corollary: If we have μt=μ>0 for all t and t is L-lipschitz w.r.t ||||2 for t=1,T. Then Tt=1(t(xt)t(u))L22μ(1+logT)

Proof: t is L-lipschitz gives us ||gt||22L2 and thus, using the above theorem we have Tt=1(t(xt)t(u))L22μTt=11t We can bound this using the inequality Tt=11t1+logT to get Tt=1(t(xt)t(u))L22μ(1+logT)

Thus, OGD with strongly-convex and lipschitz loss functions has regret with a logarithmic dependence on T! Much better than T dependence for convex, lipschitz loss functions.

Very important to see that corollary requires bounded domain otherwise loss functions will not be lipschitz given that they are also strongly-convex.

References

  1. Logarithmic regret algorithms for online convex optimization
  2. Adaptive online gradient descent
  3. Francesco Orabona’s Notes