Optimizers
An optimizer essentially performs stochastic gradient descent. It takes one-dimensional arrays for the weights and their gradients, along with an optional identifier key. The optimizer is expected to update the weights and zero the gradients in place. The optimizers are registered in the function registry and can also be used via Thinc’s config mechanism.
Optimizer functions
SGD function
Function to create a SGD optimizer. If a hyperparameter specifies a schedule,
the step that is passed to the schedule will be incremented on each call to
Optimizer.step_schedules
.
Examplefrom thinc.api import SGD
optimizer = SGD(
learn_rate=0.001,
L2=1e-6,
grad_clip=1.0
)
config.cfg[optimizer]
@optimizers = SGD.v1
learn_rate = 0.001
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true
Argument | Type | Description |
---|---|---|
learn_rate | Union[float, List[float], Generator] | The initial learning rate. |
keyword-only | ||
L2 | Union[float, List[float], Generator] | The L2 regularization term. |
grad_clip | Union[float, List[float], Generator] | Gradient clipping. |
use_averages | bool | Whether to track moving averages of the parameters. |
L2_is_weight_decay | bool | Whether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer. |
ops | Optional[Ops] | A backend object. Defaults to the currently selected backend. |
Adam function
Function to create an Adam optimizer. Returns an instance of
Optimizer
. If a hyperparameter specifies a schedule, the step
that is passed to the schedule will be incremented on each call to
Optimizer.step_schedules
.
Examplefrom thinc.api import Adam
optimizer = Adam(
learn_rate=0.001,
beta1=0.9,
beta2=0.999,
eps=1e-08,
L2=1e-6,
grad_clip=1.0,
use_averages=True,
L2_is_weight_decay=True
)
config.cfg[optimizer]
@optimizers = Adam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
L2 = 1e-6
L2_is_weight_decay = true
grad_clip = 1.0
use_averages = true
Argument | Type | Description |
---|---|---|
learn_rate | Union[float, List[float], Generator] | The initial learning rate. |
keyword-only | ||
L2 | Union[float, List[float], Generator] | The L2 regularization term. |
beta1 | Union[float, List[float], Generator] | First-order momentum. |
beta2 | Union[float, List[float], Generator] | Second-order momentum. |
eps | Union[float, List[float], Generator] | Epsilon term for Adam etc. |
grad_clip | Union[float, List[float], Generator] | Gradient clipping. |
use_averages | bool | Whether to track moving averages of the parameters. |
L2_is_weight_decay | bool | Whether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer. |
ops | Optional[Ops] | A backend object. Defaults to the currently selected backend. |
RAdam function
Function to create an RAdam optimizer. Returns an instance of
Optimizer
. If a hyperparameter specifies a schedule, the step
that is passed to the schedule will be incremented on each call to
Optimizer.step_schedules
.
Examplefrom thinc.api import RAdam
optimizer = RAdam(
learn_rate=0.001,
beta1=0.9,
beta2=0.999,
eps=1e-08,
weight_decay=1e-6,
grad_clip=1.0,
use_averages=True,
)
config.cfg[optimizer]
@optimizers = RAdam.v1
learn_rate = 0.001
beta1 = 0.9
beta2 = 0.999
eps = 1e-08
weight_decay = 1e-6
grad_clip = 1.0
use_averages = true
Argument | Type | Description |
---|---|---|
learn_rate | Union[float, List[float], Generator] | The initial learning rate. |
keyword-only | ||
beta1 | Union[float, List[float], Generator] | First-order momentum. |
beta2 | Union[float, List[float], Generator] | Second-order momentum. |
eps | Union[float, List[float], Generator] | Epsilon term for Adam etc. |
weight_decay | Union[float, List[float], Generator] | Weight decay term. |
grad_clip | Union[float, List[float], Generator] | Gradient clipping. |
use_averages | bool | Whether to track moving averages of the parameters. |
ops | Optional[Ops] | A backend object. Defaults to the currently selected backend. |
Optimizer class
Do various flavors of stochastic gradient descent, with first and second order momentum. Currently support “vanilla” SGD, Adam, and RAdam.
Optimizer.__init__ method
Initialize an optimizer. If a hyperparameter specifies a schedule, the step that
is passed to the schedule will be incremented on each call to
Optimizer.step_schedules
.
Examplefrom thinc.api import Optimizer
optimizer = Optimizer(learn_rate=0.001, L2=1e-6, grad_clip=1.0)
Argument | Type | Description |
---|---|---|
learn_rate | Union[float, List[float], Generator] | The initial learning rate. |
keyword-only | ||
L2 | Union[float, List[float], Generator] | The L2 regularization term. |
beta1 | Union[float, List[float], Generator] | First-order momentum. |
beta2 | Union[float, List[float], Generator] | Second-order momentum. |
eps | Union[float, List[float], Generator] | Epsilon term for Adam etc. |
grad_clip | Union[float, List[float], Generator] | Gradient clipping. |
use_averages | bool | Whether to track moving averages of the parameters. |
use_radam | bool | Whether to use the RAdam optimizer. |
L2_is_weight_decay | bool | Whether to interpret the L2 parameter as a weight decay term, in the style of the AdamW optimizer. |
ops | Optional[Ops] | A backend object. Defaults to the currently selected backend. |
Optimizer.__call__ method
Call the optimizer function, updating parameters using the current parameter
gradients. The key
is the identifier for the parameter, usually the node ID
and parameter name.
Argument | Type | Description |
---|---|---|
key | Tuple[int, str] | The parameter identifier. |
weights | FloatsXd | The model’s current weights. |
gradient | FloatsXd | The model’s current gradient. |
keyword-only | ||
lr_scale | float | Rescale the learning rate. Defaults to 1.0 . |
Optimizer.last_score propertyNew: v9
Get or set the last evaluation score. The optimizer passes this score to the
learning rate schedule, so that the schedule can take training dynamics into
account (see e.g. the plateau
schedule).
Examplefrom thinc.api import Optimizer, constant, plateau
schedule = plateau(2, 0.5, constant(1.0))
optimizer = Optimizer(learn_rate=schedule)
optimizer.last_score = (1000, 88.34)
Argument | Type | Description |
---|---|---|
RETURNS | Optional[Tuple[int, float]] | The step and score of the last evaluation. |
Optimizer.step_schedules method
Increase the current step of the optimizer. This step will be used by schedules to determine their next value.
Examplefrom thinc.api import Optimizer, decaying
optimizer = Optimizer(learn_rate=decaying(0.001, 1e-4), grad_clip=1.0)
assert optimizer.learn_rate == 0.001
optimizer.step_schedules()
assert optimizer.learn_rate == 0.000999900009999 # using a schedule
assert optimizer.grad_clip == 1.0 # not using a schedule
Optimizer.to_gpu method
Transfer the optimizer to a given GPU device.
Exampleoptimizer.to_gpu()
Optimizer.to_cpu method
Copy the optimizer to CPU.
Exampleoptimizer.to_cpu()
Optimizer.to_gpu method
Transfer the optimizer to a given GPU device.
Exampleoptimizer.to_gpu()
Optimizer.to_cpu method
Copy the optimizer to CPU.
Exampleoptimizer.to_cpu()