Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add momentum FISTA #2057

Closed
wants to merge 3 commits into from
Closed

Add momentum FISTA #2057

wants to merge 3 commits into from

Conversation

epapoutsellis
Copy link
Contributor

@epapoutsellis epapoutsellis commented Feb 2, 2025

Description

Since the beginning of CIL, FISTA (or now APGD) is implemented with the default momentum ($beta_{k}$ below)
Screenshot 2025-02-04 at 12 28 15

I suggest to replace this with a momentum callback similar to the step_size which is already available.

from abc import ABC, abstractmethod
from dataclasses import dataclass

@dataclass
class MomentumCoefficient(ABC):
    momentum: float = None

    @abstractmethod
    def __call__(self, algo=None):
        pass

class ConstantMomentum(MomentumCoefficient):

    def __call__(self, algo):
        return self.momentum
        
class Nesterov(MomentumCoefficient):

    def __call__(self, algo=None):
        t_old = algo.t        
        algo.t = 0.5*(1 + numpy.sqrt(1 + 4*(t_old**2)))  
        return (t_old-1)/algo.t

@property
def momentum(self):        
   return self._momentum  

def set_momentum(self, momentum):

    if momentum is None:
        self._momentum = Nesterov()
    else:
        if isinstance(momentum, Number):
            self._momentum = ConstantMomentum(momentum)
        else:
            self._momentum = momentum 

In case of deterministic optimisation we can implement easily other FISTA versions ([restart schemes] (https://arxiv.org/pdf/1204.3982)), modified schemes with/out backtracking.

In case of stochastic optimisation, the default momentum is not stable when combined with stochastic estimators, e.g., SGFunction, SAGAFunction etc. For FISTA+SVRG we need a constant momentum as suggested here for strongly convex objectives.

For stochastic gradients (Amortized Nesterov) or other variance reduced stochastic estimators (Accelerating Variance-Reduced Stochastic Gradient Methods), convergence analysis is based on equivalent formulation of the Nesterov momentum acceleration and in particular the AuslenderTeboulle scheme,
Screenshot 2025-02-04 at 13 15 39
Therefore, using callback step_sizes and momentum we can use our stochastic framework with FISTA/APGD algo.

Example Usage

from cil.optimisation.algorithms.FISTA import MomentumCoefficient
class DossalChambolle(MomentumCoefficient):        
    def __call__(self, algo=None):
        return (algo.iteration-1)/(algo.iteration+50)
momentum = DossalChambolle()
fista_dc = FISTA(initial=ig.allocate(), f=f, g=g, update_objective_interval=1, momentum=momentum)
fista_dc.run(500)

❤️ Thanks for your contribution!

@samdporter
Copy link
Contributor

samdporter commented Feb 4, 2025

Hey all,
I've been having a look at implementing a few different types of momentum within this new framework. I've just come across one that Nesterov outlines in this paper and is suggested for use with ordered subsets by Kim et al. in this paper. It uses an accumulation of gradients as opposed to the previous two iterates alone and so might lend itself to stochastic reconstruction (I'll have a look at it once I figure out how to implement it).

The problem is that it can't be implemented easily within the currently suggested framework for momentum. Could we instead do something similar to the Preconditioner.apply method where Momentum.apply spits out the momentum corrected update as opposed to just the momemtum coefficient?

i.e
rather than (at the end of the APGD.update method)

self.x.subtract(self.x_old, out=self.y)
momentum = self.momentum(self)
self.y.sapyb(momentum, self.x, 1.0, out=self.y)

we would have something like

self.momentum.apply(self, out = self.y)

which can then accommodate more complex types of momentum.
What do you think?

P.S. Would it be more appropriate to create a discussion for this?

@MargaretDuff
Copy link
Member

Closed due to superseded by #2061

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants