Optimizers

An optimizer essentially performs stochastic gradient descent. It takes one-dimensional arrays for the weights and their gradients, along with an optional identifier key. The optimizer is expected to update the weights and zero the gradients in place. The optimizers are registered in the function registry and can also be used via Thinc’s config mechanism.

Optimizer functions

SGD function

Function to create a SGD optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import SGD

optimizer = SGD(
    learn_rate=0.001,
    L2=1e-6,
    grad_clip=1.0
)
config.cfg[optimizer]
@optimizers = SGD.v1
learn_rate = 0.001
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
L2Union[float, List[float], Generator]The L2 regularization term.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
L2_is_weight_decayboolWhether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

Adam function

Function to create an Adam optimizer. Returns an instance of Optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import Adam

optimizer = Adam(
    learn_rate=0.001,
    beta1=0.9,
    beta2=0.999,
    eps=1e-08,
    L2=1e-6,
    grad_clip=1.0,
    use_averages=True,
    L2_is_weight_decay=True
)
config.cfg[optimizer]
@optimizers = Adam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
L2Union[float, List[float], Generator]The L2 regularization term.
beta1Union[float, List[float], Generator]First-order momentum.
beta2Union[float, List[float], Generator]Second-order momentum.
epsUnion[float, List[float], Generator]Epsilon term for Adam etc.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
L2_is_weight_decayboolWhether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

RAdam function

Function to create an RAdam optimizer. Returns an instance of Optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import RAdam

optimizer = RAdam(
    learn_rate=0.001,
    beta1=0.9,
    beta2=0.999,
    eps=1e-08,
    weight_decay=1e-6,
    grad_clip=1.0,
    use_averages=True,
)
config.cfg[optimizer]
@optimizers = RAdam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
weight_decay = 1e-6
grad_clip = 1.0
use_averages = true
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
beta1Union[float, List[float], Generator]First-order momentum.
beta2Union[float, List[float], Generator]Second-order momentum.
epsUnion[float, List[float], Generator]Epsilon term for Adam etc.
weight_decayUnion[float, List[float], Generator]Weight decay term.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

Optimizer class

Do various flavors of stochastic gradient descent, with first and second order momentum. Currently support “vanilla” SGD, Adam, and RAdam.

Optimizer.__init__ method

Initialize an optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import Optimizer

optimizer = Optimizer(learn_rate=0.001, L2=1e-6, grad_clip=1.0)
ArgumentTypeDescription
learn_rateUnion[float, List[float], Generator]The initial learning rate.
keyword-only
L2Union[float, List[float], Generator]The L2 regularization term.
beta1Union[float, List[float], Generator]First-order momentum.
beta2Union[float, List[float], Generator]Second-order momentum.
epsUnion[float, List[float], Generator]Epsilon term for Adam etc.
grad_clipUnion[float, List[float], Generator]Gradient clipping.
use_averagesboolWhether to track moving averages of the parameters.
use_radamboolWhether to use the RAdam optimizer.
L2_is_weight_decayboolWhether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
opsOptional[Ops]A backend object. Defaults to the currently selected backend.

Optimizer.__call__ method

Call the optimizer function, updating parameters using the current parameter gradients. The key is the identifier for the parameter, usually the node ID and parameter name.

ArgumentTypeDescription
keyTuple[int, str]The parameter identifier.
weightsFloatsXdThe model’s current weights.
gradientFloatsXdThe model’s current gradient.
keyword-only
lr_scalefloatRescale the learning rate. Defaults to 1.0.

Optimizer.last_score propertyNew: v9

Get or set the last evaluation score. The optimizer passes this score to the learning rate schedule, so that the schedule can take training dynamics into account (see e.g. the plateau schedule).

Examplefrom thinc.api import Optimizer, constant, plateau

schedule = plateau(2, 0.5, constant(1.0))
optimizer = Optimizer(learn_rate=schedule)
optimizer.last_score = (1000, 88.34)
ArgumentTypeDescription
RETURNSOptional[Tuple[int, float]]The step and score of the last evaluation.

Optimizer.step_schedules method

Increase the current step of the optimizer. This step will be used by schedules to determine their next value.

Examplefrom thinc.api import Optimizer, decaying

optimizer = Optimizer(learn_rate=decaying(0.001, 1e-4), grad_clip=1.0)
assert optimizer.learn_rate == 0.001
optimizer.step_schedules()
assert optimizer.learn_rate == 0.000999900009999  # using a schedule
assert optimizer.grad_clip == 1.0                 # not using a schedule

Optimizer.to_gpu method

Transfer the optimizer to a given GPU device.

Exampleoptimizer.to_gpu()

Optimizer.to_cpu method

Copy the optimizer to CPU.

Exampleoptimizer.to_cpu()

Optimizer.to_gpu method

Transfer the optimizer to a given GPU device.

Exampleoptimizer.to_gpu()

Optimizer.to_cpu method

Copy the optimizer to CPU.

Exampleoptimizer.to_cpu()