Optimizers

An optimizer essentially performs stochastic gradient descent. It takes one-dimensional arrays for the weights and their gradients, along with an optional identifier key. The optimizer is expected to update the weights and zero the gradients in place. The optimizers are registered in the function registry and can also be used via Thinc’s config mechanism.

Optimizer functions

SGD function

If a hyperparameter specifies a schedule as a list or generator, its value will be replaced with the next item on each call to Optimizer.step_schedules. Once the schedule is exhausted, its last value will be used.

Examplefrom thinc.api import SGD

optimizer = SGD(
    learn_rate=0.001,
    L2=1e-6,
    grad_clip=1.0
)
config.cfg[optimizer]
@optimizers = SGD.v1
learn_rate = 0.001
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
L2Union[float, List[float], Generator]The L2 regularization term.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
L2_is_weight_decayboolWhether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

Adam function

Function to create an Adam optimizer. Returns an instance of Optimizer. If a hyperparameter specifies a schedule as a list or generator, its value will be replaced with the next item on each call to Optimizer.step_schedules. Once the schedule is exhausted, its last value will be used.

Examplefrom thinc.api import Adam

optimizer = Adam(
    learn_rate=0.001,
    beta1=0.9,
    beta2=0.999,
    eps=1e-08
    L2=1e-6,
    grad_clip=1.0,
    use_averages=True,
    L2_is_weight_decay=True
)
config.cfg[optimizer]
@optimizers = Adam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
L2Union[float, List[float], Generator]The L2 regularization term.
beta1Union[float, List[float], Generator]First-order momentum.
beta2Union[float, List[float], Generator]Second-order momentum.
epsUnion[float, List[float], Generator]Epsilon term for Adam etc.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
L2_is_weight_decayboolWhether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

RAdam function

Function to create an RAdam optimizer. Returns an instance of Optimizer. If a hyperparameter specifies a schedule as a list or generator, its value will be replaced with the next item on each call to Optimizer.step_schedules. Once the schedule is exhausted, its last value will be used.

Examplefrom thinc.api import RAdam

optimizer = RAdam(
    learn_rate=0.001,
    beta1=0.9,
    beta2=0.999,
    eps=1e-08
    weight_decay=1e-6,
    grad_clip=1.0,
    use_averages=True,
)
config.cfg[optimizer]
@optimizers = RAdam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
weight_decay = 1e-6
grad_clip = 1.0
use_averages = true
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
beta1Union[float, List[float], Generator]First-order momentum.
beta2Union[float, List[float], Generator]Second-order momentum.
epsUnion[float, List[float], Generator]Epsilon term for Adam etc.
weight_decayUnion[float, List[float], Generator]Weight decay term.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

Optimizer class

Do various flavors of stochastic gradient descent, with first and second order momentum. Currently support “vanilla” SGD, Adam, and RAdam.

Optimizer.__init__ method

Initialize an optimizer. If a hyperparameter specifies a schedule as a list or generator, its value will be replaced with the next item on each call to Optimizer.step_schedules. Once the schedule is exhausted, its last value will be used.

Examplefrom thinc.api import Optimizer

optimizer = Optimizer(learn_rate=0.001, L2=1e-6, grad_clip=1.0)
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
L2Union[float, List[float], Generator]The L2 regularization term.
beta1Union[float, List[float], Generator]First-order momentum.
beta2Union[float, List[float], Generator]Second-order momentum.
epsUnion[float, List[float], Generator]Epsilon term for Adam etc.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
use_radamboolWhether to use the RAdam optimizer.
L2_is_weight_decayboolWhether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

Optimizer.__call__ method

Call the optimizer function, updating parameters using the current parameter gradients. The key is the identifier for the parameter, usually the node ID and parameter name.

ArgumentTypeDescription
keyTuple[int, str]The parameter identifier.
weightsFloatsXdThe model’s current weights.
gradientFloatsXdThe model’s current gradient.
keyword-only
lr_scalefloatRescale the learning rate. Defaults to 1.0.

Optimizer.step_schedules method

Replace the the named hyperparameters with the next item from the schedules iterator, if available. Once the schedule is exhausted, its last value will be used.

Examplefrom thinc.api import Optimizer, decaying

optimizer = Optimizer(learn_rate=decaying(0.001, 1e-4), grad_clip=1.0)
assert optimizer.learn_rate == 0.001
optimizer.step_schedules()
assert optimizer.learn_rate == 0.000999900009999  # using a schedule
assert optimizer.grad_clip == 1.0                 # not using a schedule

Optimizer.to_gpu method

Transfer the optimizer to a given GPU device.

Exampleoptimizer.to_gpu()

Optimizer.to_cpu method

Copy the optimizer to CPU.

Exampleoptimizer.to_cpu()

Optimizer.to_gpu method

Transfer the optimizer to a given GPU device.

Exampleoptimizer.to_gpu()

Optimizer.to_cpu method

Copy the optimizer to CPU.

Exampleoptimizer.to_cpu()