All Thinc models have a reference to an Ops instance, that provides access to
memory allocation and mathematical routines. The Model.ops instance
also keeps track of state and settings, so that you can have different models in
your network executing on different devices or delegating to different
underlying libraries.
Each Ops instance holds a reference to a numpy-like module (numpy or
cupy), which you can access at Model.ops.xp. This is enough to make most
layers work on both CPU and GPU devices. Additionally, there are several
routines that we have implemented as methods on the Ops object, so that
specialized versions can be called for different backends. You can also create
your own Ops subclasses with specialized routines for your layers, and use the
set_current_ops function to change the default.
Backend
CPU
GPU
TPU
Description
AppleOps
Use AMX matrix multiplication units on Apple Silicon Macs. Added in Thinc 9.0.
The Ops class is typically not used directly but via NumpyOps, AppleOps,
CupyOps or MPSOps, which are subclasses of Ops and implement a more
efficient subset of the methods. You also have access to the ops via the
Model.ops attribute. The documented methods
below list which backends provide optimized and more efficient versions
(indicated by ), and which use the default implementation.
Thinc also provides various helper functions for getting and setting
different backends.
Examplefrom thinc.api import Linear, get_ops, use_ops
model = Linear(4,2)
X = model.ops.alloc2f(10,2)
blis_ops = get_ops("numpy", use_blis=True)
use_ops(blis_ops)
Iterate slices from a sequence, optionally shuffled. Slices may be either views
or copies of the underlying data. Supports the batchable data types
Pairs, Ragged and
Padded, as well as arrays, lists and tuples. The
size argument may be either an integer, or a sequence of integers. If a
sequence, a new size is drawn before every output. If shuffle is True,
shuffled batches are produced by first generating an index array, shuffling it,
and then using it to slice into the sequence. An internal queue of buffer
items is accumulated before being each output. Buffering is useful for some
devices, to allow the network to run asynchronously without blocking on every
batch.
The method returns a SizedGenerator that
exposes a __len__ and is rebatched and reshuffled every time it’s executed,
allowing you to move the batching outside of the training loop.
Given an (M, N) sequence of vectors, return an (M, N*(nW*2+1)) sequence. The
new sequence is constructed by concatenating nW preceding and succeeding
vectors onto each column in the sequence, to extract a window of features.
The reverse/backward operation of the seq2col function: calculate the gradient
of the original (M, N) sequence, as a function of the gradient of the output
(M, N*(nW*2+1)) sequence.
Perform padding on a list of arrays so that they each have the same length, by
taking the maximum dimension across each axis. This only works on non-empty
sequences with the same ndim and dtype.
Create a random mask for applying dropout, with a certain percent of the mask
(defined by drop) will contain zeros. The neurons at those positions will be
deactivated during training, resulting in a more robust network and less
overfitting.
Allocate an array of a certain shape. If possible, you should always use the
type-specific methods listed below, as they make the code more readable and
allow more sophisticated static type checking of
the inputs and outputs.
Shortcuts to allocate an array of a certain shape and data type (f refers to
float32 and i to int32). For instance, Ops.alloc2f will allocate an
two-dimensional array of floats.
ExampleX = model.ops.alloc2f(10,2)# Floats2d
Y = model.ops.alloc1i(4)# Ints1d
Reshape an array and return an array containing the same data with the given
shape. If possible, you should always use the type-specific methods listed
below, as they make the code more readable and allow more sophisticated static
type checking of the inputs and outputs.
Shortcuts to reshape an array of a certain shape and data type (f refers to
float32 and i to int32). For instance, reshape2f can be used to reshape
an array of floats to a 2d-array of floats.
ExampleX = model.ops.reshape2f(X,10,2)# Floats2d
Y = model.ops.reshape1i(Y,4)# Ints1d
Ensure a given array is of the correct type, e.g. numpy.ndarray for NumpyOps
or cupy.ndarray for CupyOps. If possible, you should always use the
type-specific methods listed below, as they make the code more readable and
allow more sophisticated static type checking of
the inputs and outputs.
Shortcuts for specific dimensions and data types (f refers to float32 and
i to int32). For instance, Ops.asarray2f will return a two-dimensional
array of floats.
ExampleX = model.ops.asarray2f(X,10,2)# Floats2d
Y = model.ops.asarray1i(Y,4)# Ints1d
Allow the backend to make a contiguous copy of an array. Implementations of
Ops do not have to make a copy or make it contiguous if that would not improve
efficiency for the execution engine.
Swish (Ramachandran et al., 2017) is a
self-gating non-monotonic activation function similar to the GELU
activation: whereas GELU uses the CDF of the Gaussian distribution Φ
for self-gating x * Φ(x), Swish uses the logistic CDF x * σ(x). Sometimes
referred to as “SiLU” for “Sigmoid Linear Unit”.
Dish or “Daniël’s Swish-like activation” is an activation function with a
non-monotinic shape similar to GELU, Swish and Mish.
However, Dish does not rely on elementary functions like exp or erf, making
it much
faster to compute
in most cases.
GELU or “Gaussian Error Linear Unit”
(Hendrycks and Gimpel, 2016) is a
self-gating non-monotonic activation function similar to the Swish
activation: whereas GELU uses the CDF of the Gaussian distribution Φ
for self-gating x * Φ(x) the Swish activation uses the logistic CDF σ and
computes x * σ(x). Various approximations exist, but thinc implements the
exact GELU. The use of GELU is popular within transformer feed-forward blocks.
ReLU activation function with the maximum value clipped at k. A common choice
is k=6 introduced for convolutional deep belief networks
(Krizhevsky, 2010).
The resulting function relu6 is commonly used in low-precision scenarios.
Flexible clipped linear activation function of the form
max(min_value, min(max_value, x * slope + offset)). It is used to implement
the relu_k, hard_sigmoid, and
hard_tanh methods.