Optimizers

An optimizer essentially performs stochastic gradient descent. It takes one-dimensional arrays for the weights and their gradients, along with an optional identifier key. The optimizer is expected to update the weights and zero the gradients in place. The optimizers are registered in the function registry and can also be used via Thinc’s config mechanism.

Optimizer functions

SGD function

Function to create a SGD optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import SGD

optimizer = SGD(
    learn_rate=0.001,
    L2=1e-6,
    grad_clip=1.0
)

config.cfg[optimizer]
@optimizers = SGD.v1
learn_rate = 0.001
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true

Argument	Type	Description
`learn_rate`	`Union[float, List[float], Generator]`	The initial learning rate.
keyword-only
`L2`	`Union[float, List[float], Generator]`	The L2 regularization term.
`grad_clip`	`Union[float, List[float], Generator]`	Gradient clipping.
`use_averages`	`bool`	Whether to track moving averages of the parameters.
`L2_is_weight_decay`	`bool`	Whether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
`ops`	`Optional[Ops]`	A backend object. Defaults to the currently selected backend.

Adam function

Function to create an Adam optimizer. Returns an instance of Optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import Adam

optimizer = Adam(
    learn_rate=0.001,
    beta1=0.9,
    beta2=0.999,
    eps=1e-08,
    L2=1e-6,
    grad_clip=1.0,
    use_averages=True,
    L2_is_weight_decay=True
)

config.cfg[optimizer]
@optimizers = Adam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true

Argument	Type	Description
`learn_rate`	`Union[float, List[float], Generator]`	The initial learning rate.
keyword-only
`L2`	`Union[float, List[float], Generator]`	The L2 regularization term.
`beta1`	`Union[float, List[float], Generator]`	First-order momentum.
`beta2`	`Union[float, List[float], Generator]`	Second-order momentum.
`eps`	`Union[float, List[float], Generator]`	Epsilon term for Adam etc.
`grad_clip`	`Union[float, List[float], Generator]`	Gradient clipping.
`use_averages`	`bool`	Whether to track moving averages of the parameters.
`L2_is_weight_decay`	`bool`	Whether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
`ops`	`Optional[Ops]`	A backend object. Defaults to the currently selected backend.

RAdam function

Function to create an RAdam optimizer. Returns an instance of Optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import RAdam

optimizer = RAdam(
    learn_rate=0.001,
    beta1=0.9,
    beta2=0.999,
    eps=1e-08,
    weight_decay=1e-6,
    grad_clip=1.0,
    use_averages=True,
)

config.cfg[optimizer]
@optimizers = RAdam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
weight_decay = 1e-6
grad_clip = 1.0
use_averages = true

Argument	Type	Description
`learn_rate`	`Union[float, List[float], Generator]`	The initial learning rate.
keyword-only
`beta1`	`Union[float, List[float], Generator]`	First-order momentum.
`beta2`	`Union[float, List[float], Generator]`	Second-order momentum.
`eps`	`Union[float, List[float], Generator]`	Epsilon term for Adam etc.
`weight_decay`	`Union[float, List[float], Generator]`	Weight decay term.
`grad_clip`	`Union[float, List[float], Generator]`	Gradient clipping.
`use_averages`	`bool`	Whether to track moving averages of the parameters.
`ops`	`Optional[Ops]`	A backend object. Defaults to the currently selected backend.

Optimizer class

Do various flavors of stochastic gradient descent, with first and second order momentum. Currently support “vanilla” SGD, Adam, and RAdam.

Optimizer.init method

Initialize an optimizer. If a hyperparameter specifies a schedule, the step that is passed to the schedule will be incremented on each call to Optimizer.step_schedules.

Examplefrom thinc.api import Optimizer

optimizer = Optimizer(learn_rate=0.001, L2=1e-6, grad_clip=1.0)

Argument	Type	Description
`learn_rate`	`Union[float, List[float], Generator]`	The initial learning rate.
keyword-only
`L2`	`Union[float, List[float], Generator]`	The L2 regularization term.
`beta1`	`Union[float, List[float], Generator]`	First-order momentum.
`beta2`	`Union[float, List[float], Generator]`	Second-order momentum.
`eps`	`Union[float, List[float], Generator]`	Epsilon term for Adam etc.
`grad_clip`	`Union[float, List[float], Generator]`	Gradient clipping.
`use_averages`	`bool`	Whether to track moving averages of the parameters.
`use_radam`	`bool`	Whether to use the RAdam optimizer.
`L2_is_weight_decay`	`bool`	Whether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer.
`ops`	`Optional[Ops]`	A backend object. Defaults to the currently selected backend.

Optimizer.call method

Call the optimizer function, updating parameters using the current parameter gradients. The key is the identifier for the parameter, usually the node ID and parameter name.

Argument	Type	Description
`key`	`Tuple[int, str]`	The parameter identifier.
`weights`	`FloatsXd`	The model’s current weights.
`gradient`	`FloatsXd`	The model’s current gradient.
keyword-only
`lr_scale`	`float`	Rescale the learning rate. Defaults to `1.0`.

Optimizer.last_score propertyNew: v9

Get or set the last evaluation score. The optimizer passes this score to the learning rate schedule, so that the schedule can take training dynamics into account (see e.g. the plateau schedule).

Examplefrom thinc.api import Optimizer, constant, plateau

schedule = plateau(2, 0.5, constant(1.0))
optimizer = Optimizer(learn_rate=schedule)
optimizer.last_score = (1000, 88.34)

Argument	Type	Description
RETURNS	`Optional[Tuple[int, float]]`	The step and score of the last evaluation.

Optimizer.step_schedules method

Increase the current step of the optimizer. This step will be used by schedules to determine their next value.

Examplefrom thinc.api import Optimizer, decaying

optimizer = Optimizer(learn_rate=decaying(0.001, 1e-4), grad_clip=1.0)
assert optimizer.learn_rate == 0.001
optimizer.step_schedules()
assert optimizer.learn_rate == 0.000999900009999  # using a schedule
assert optimizer.grad_clip == 1.0                 # not using a schedule

Optimizer.to_gpu method

Transfer the optimizer to a given GPU device.

Exampleoptimizer.to_gpu()

Optimizer.to_cpu method

Copy the optimizer to CPU.

Exampleoptimizer.to_cpu()

Optimizer.to_gpu method

Transfer the optimizer to a given GPU device.

Exampleoptimizer.to_gpu()

Optimizer.to_cpu method

Copy the optimizer to CPU.

Exampleoptimizer.to_cpu()