# Layers Weights layers, transforms, combinators and wrappers

This page describes functions for defining your model. Each layer is implemented
in its own module in
`thinc.layers`

and can be imported from `thinc.api`

. Most layer files define two public
functions: a **creation function** that returns a `Model`

instance, and a **forward function** that performs the computation.

Weights layers | Layers that use an internal weights matrix for their computations. |

Reduction operations | Layers that perform rank reductions, e.g. pooling from word to sentence vectors. |

Combinators | Layers that combine two or more existing layers. |

Data type transfers | Layers that transform data to different types. |

Wrappers | Wrapper layers for other libraries like PyTorch and TensorFlow. |

## Weights layers

### CauchySimilarity function

Compare input vectors according to the Cauchy similarity function proposed by
Chen (2013).
Primarily used within `siamese`

neural networks.

Argument | Type | Description |
---|---|---|

`nI` | `Optional[int]` | The size of the input vectors. |

RETURNS | `Model[Tuple[Floats2d, Floats2d], Floats1d]` | The created similarity layer. |

View on GitHub## thinc/layers/cauchysimilarity.py

`Can't fetch code example from GitHub :( Please use the link above to view the example. If you've come across a broken link, we always appreciate a pull request to the repository, or a report on the issue tracker. Thanks!`

### Dropout function

Helps prevent overfitting by adding a random distortion to the input data during
training. Specifically, cells of the input are zeroed with probability
determined by the `dropout_rate`

argument. Cells which are not zeroed are
rescaled by `1-rate`

. When not in training mode, the distortion is disabled (see
Hinton et al., 2012).

`Example````
from thinc.api import chain, Linear, Dropout
model = chain(Linear(10, 2), Dropout(0.2))
Y, backprop = model(X, is_train=True)
# Configure dropout rate via the dropout_rate attribute.
for node in model.walk():
if node.name == "dropout":
node.attrs["dropout_rate"] = 0.5
```

Argument | Type | Description |
---|---|---|

`dropout_rate` | `float` | The probability of zeroing the activations (default: 0). Higher dropout rates mean more distortion. Values around `0.2` are often good. |

RETURNS | `Model[ArrayXd, ArrayXd]` | The created dropout layer. |

View on GitHub## thinc/layers/dropout.py

`Can't fetch code example from GitHub :( Please use the link above to view the example. If you've come across a broken link, we always appreciate a pull request to the repository, or a report on the issue tracker. Thanks!`

### Embed function

Map integers to vectors, using a fixed-size lookup table. The input to the layer should be a two-dimensional array of integers, one column of which the embeddings table will slice as the indices.

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`nV` | `int` | Number of input vectors. Defaults to `1` . |

keyword-only | ||

`column` | `int` | The column to slice from the input, to get the indices. |

`initializer` | `Callable` | A function to initialize the internal parameters. Defaults to `uniform_init` . |

`dropout` | `Optional[float]` | Dropout rate to avoid overfitting (default `None` ). |

RETURNS | `Model[Union[Ints1d, Ints2d], Floats2d]` | The created embedding layer. |

View on GitHub## thinc/layers/embed.py

`Can't fetch code example from GitHub :( Please use the link above to view the example. If you've come across a broken link, we always appreciate a pull request to the repository, or a report on the issue tracker. Thanks!`

### FeatureExtractor function

spaCy-specific layer to extract arrays of input features from `Doc`

objects.
Expects a list of feature names to extract, which should refer to spaCy token
attributes.

Argument | Type | Description |
---|---|---|

`columns` | `List[Union[int, str]]` | The spaCy token attributes to extract. |

RETURNS | `Model[List[spacy.tokens.Doc], List[Ints2d]]` | The created feature extraction layer. |

View on GitHub## thinc/layers/featureextractor.py

### HashEmbed function

An embedding layer that uses the “hashing trick” to map keys to distinct values. The hashing trick involves hashing each key four times with distinct seeds, to produce four likely differing values. Those values are modded into the table, and the resulting vectors summed to produce a single result. Because it’s unlikely that two different keys will collide on all four “buckets”, most distinct keys will receive a distinct vector under this scheme, even when the number of vectors in the table is very low.

Argument | Type | Description |
---|---|---|

`nO` | `int` | The size of the output vectors. |

`nV` | `int` | Number of input vectors. |

keyword-only | ||

`seed` | `Optional[int]` | A seed to use for the hashing. |

`column` | `int` | The column to select features from. |

`initializer` | `Callable` | A function to initialize the internal parameters. Defaults to `uniform_init` . |

`dropout` | `Optional[float]` | Dropout rate to avoid overfitting (default `None` ). |

RETURNS | `Model[Union[Ints1d, Ints2d], Floats2d]` | The created embedding layer. |

View on GitHub## thinc/layers/hashembed.py

### LayerNorm function

Perform layer normalization on the inputs (Ba et al., 2016). This layer does not change the dimensionality of the vectors.

Argument | Type | Description |
---|---|---|

`nI` | `Optional[int]` | The size of the input vectors. |

RETURNS | `Model[Floats2d, Floats2d]` | The created normalization layer. |

View on GitHub## thinc/layers/layernorm.py

### Linear function

The `Linear`

layer multiplies inputs by a weights matrix `W`

and adds a bias
vector `b`

. In PyTorch this is called a `Linear`

layer, while Keras calls it a
`Dense`

layer.

`Example````
from thinc.api import Linear
model = Linear(10, 5)
Y = model.predict(model.ops.allocate(2, 5))
assert Y.shape == (2, 10)
```

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`nI` | `Optional[int]` | The size of the input vectors. |

keyword-only | ||

`init_W` | `Callable` | A function to initialize the weights matrix. Defaults to `glorot_uniform_init` . |

`init_b` | `Callable` | A function to initialize the bias vector. Defaults to `zero_init` . |

RETURNS | `Model[Floats2d, Floats2d]` | The created `Linear` layer. |

View on GitHub## thinc/layers/linear.py

### Logistic function

Apply the logistic function as an activation to the inputs. This is often used
as an output activation for multi-label classification, because each element of
the output vectors will be between `0`

and `1`

.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[Floats2d, Floats2d]` | The created `Logistic` layer. |

View on GitHub## thinc/layers/logistic.py

### LSTM and BiLSTM function

An LSTM recurrent neural network. The BiLSTM is bidirectional: that is, each
layer concatenated a forward LSTM with an LSTM running in the reverse direction.
If you are able to install PyTorch, you should usually prefer to use the
`PyTorchLSTM`

layer instead of Thinc’s implementations, as PyTorch’s LSTM
implementation is significantly faster.

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`nI` | `Optional[int]` | The size of the input vectors. |

keyword-only | ||

`bi` | `bool` | Use BiLSTM. |

`depth` | `int` | Number of layers (default `1` ). |

`dropout` | `float` | Dropout rate to avoid overfitting (default `0` ). |

RETURNS | `Model[Padded, Padded]` | The created LSTM layer(s). |

View on GitHub## thinc/layers/lstm.py

### Maxout function

A dense layer with a “maxout” activation
(Goodfellow et al, 2013). Maxout layers
require a weights array of shape `(nO, nP, nI)`

in order to compute outputs of
width `nO`

given inputs of width `nI`

. The extra multiple, `nP`

, determines the
number of “pieces” that the piecewise-linear activation will consider.

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`nI` | `Optional[int]` | The size of the input vectors. |

`nP` | `int` | Number of maxout pieces (default: 3). |

keyword-only | ||

`init_W` | `Callable` | A function to initialize the weights matrix. Defaults to `glorot_uniform_init` . |

`init_b` | `Callable` | A function to initialize the bias vector. Defaults to `zero_init` . |

`dropout` | `Optional[float]` | Dropout rate to avoid overfitting. |

`normalize` | `bool` | Whether or not to apply layer normalization, (default: False). |

RETURNS | `Model[Floats2d, Floats2d]` | The created maxout layer. |

View on GitHub## thinc/layers/maxout.py

### Mish function

A dense layer with Mish activation (Misra, 2019).

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`nI` | `Optional[int]` | The size of the input vectors. |

keyword-only | ||

`init_W` | `Callable` | A function to initialize the weights matrix. Defaults to `glorot_uniform_init` |

`init_b` | `Callable` | A function to initialize the bias vector. Defaults to `zero_init` . |

`dropout` | `Optional[float]` | Dropout rate to avoid overfitting. |

`normalize` | `bool` | Whether or not to apply layer normalization, (default: False). |

RETURNS | `Model[Floats2d, Floats2d]` | The created dense layer. |

View on GitHub## thinc/layers/mish.py

### MultiSoftmax function

Neural network layer that predicts several multi-class attributes at once. For instance, we might predict one class with six variables, and another with five. We predict the 11 neurons required for this, and then softmax them such that columns 0-6 make a probability distribution and columns 6-11 make another.

Argument | Type | Description |
---|---|---|

`nOs` | `Tuple[int, …]` | The sizes of the output vectors. |

`nI` | `Optional[int]` | The size of the input vectors. |

RETURNS | `Model[Floats2d, Floats2d]` | The created multi softmax layer. |

View on GitHub## thinc/layers/multisoftmax.py

### ParametricAttention function

A layer that uses the parametric attention scheme described by Yang et al. (2016). The layer learns a parameter vector that is used as the keys in a single-headed attention mechanism.

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

RETURNS | `Model[Ragged, Ragged]` | The created attention layer. |

View on GitHub## thinc/layers/parametricattention.py

### ReLu function

A dense layer with ReLu activation.

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`nI` | `Optional[int]` | The size of the input vectors. |

keyword-only | ||

`init_W` | `Callable` | A function to initialize the weights matrix. Defaults to `glorot_uniform_init` |

`init_b` | `Callable` | A function to initialize the bias vector. Defaults to `zero_init` . |

`dropout` | `Optional[float]` | Dropout rate to avoid overfitting. |

`normalize` | `bool` | Whether or not to apply layer normalization, (default: False). |

RETURNS | `Model[Floats2d, Floats2d]` | The created ReLu layer. |

View on GitHub## thinc/layers/relu.py

### Softmax function

A dense layer with a softmax activation. This is usually used as a prediction layer. Vectors produced by the softmax function sum to 1, and have values between 0 and 1, so each vector can be interpreted as a probability distribution.

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`nI` | `Optional[int]` | The size of the input vectors. |

keyword-only | ||

`init_W` | `Callable` | A function to initialize the weights matrix. Defaults to `zero_init` |

`init_b` | `Callable` | A function to initialize the bias vector. Defaults to `zero_init` . |

RETURNS | `Model[Floats2d, Floats2d]` | The created softmax layer. |

View on GitHub## thinc/layers/softmax.py

### SparseLinear function

A sparse linear layer using the “hashing trick”. Useful for tasks such as text
classification. Inputs to the layer should be a tuple of arrays
`(keys, values, lengths)`

, where the `keys`

and `values`

are arrays of the same
length, describing the concatenated batch of input features and their values.
The `lengths`

array should have one entry per sequence in the batch, and the sum
of the lengths should equal the length of the keys and values array.

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`length` | `int` | The size of the weights vector, to be tuned empirically. |

RETURNS | `Model[Tuple[ArrayXd, ArrayXd, ArrayXd], ArrayXd]` | The created layer. |

View on GitHub## thinc/layers/sparselinear.pyx

### StaticVectors function

Argument | Type | Description |
---|---|---|

`nO` | `Optional[int]` | The size of the output vectors. |

`vectors` | `Optional[Floats2d]` | The vectors. |

keyword-only | ||

`column` | `int` | The column of values to slice for the indices. |

`dropout` | `Optional[float]` | Dropout rate to avoid overfitting (default `None` ). |

RETURNS | `Model[Ints2d, Floats2d]` | The created embedding layer. |

View on GitHub## thinc/layers/staticvectors.py

## Reduction operations

### reduce_max function

Pooling layer that reduces the dimensions of the data by selecting the maximum value for each feature.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[Ragged, Floats2d]` | The created pooling layer. |

View on GitHub## thinc/layers/reduce_max.py

### reduce_mean function

Pooling layer that reduces the dimensions of the data by computing the average value of each feature.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[Ragged, Floats2d]` | The created pooling layer. |

View on GitHub## thinc/layers/reduce_mean.py

### reduce_sum function

Pooling layer that reduces the dimensions of the data by computing the sum for each feature.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[Ragged, Floats2d]` | The created pooling layer. |

View on GitHub## thinc/layers/reduce_sum.py

## Combinators

Combinators are layers that express **higher-order functions**: they take one or
more layers as arguments and express some relationship or perform some
additional logic around the child layers. Combinators can also be used to
overload operators. For example, binding `chain`

to `>>`

allows you to write `ReLu(512) >> Softmax()`

instead of
`chain(ReLu(512), Softmax())`

.

### add function

Compose two or more models `f`

, `g`

, etc, such that their outputs are added,
i.e. `add(f, g)(x)`

computes `f(x) + g(x)`

.

Argument | Type | Description |
---|---|---|

`*layers` | `Model[ArrayXd, ArrayXd]` | The models to compose. |

RETURNS | `Model[ArrayXd, ArrayXd]` | The composed model. |

View on GitHub## thinc/layers/add.py

### bidirectional function

Stitch two RNN models into a bidirectional layer. Expects squared sequences.

Argument | Type | Description |
---|---|---|

`l2r` | `Model[Padded, Padded]` | The first model. |

`r2l` | `Optional[Model[Padded, Padded]]` | The second model. |

RETURNS | `Model[Padded, Padded]` | The composed bidirectional layer. |

View on GitHub## thinc/layers/bidirectional.py

### chain function

Compose two models `f`

and `g`

such that they become layers of a single
feed-forward model that computes `g(f(x))`

.

Argument | Type | Description |
---|---|---|

`*layers` | `Model[ArrayXd, ArrayXd]` | The models to compose. |

RETURNS | `Model[ArrayXd, ArrayXd]` | The composed feed-forward model. |

View on GitHub## thinc/layers/chain.py

### clone function

Construct `n`

copies of a layer, with distinct weights. For example,
`clone(f, 3)(x)`

computes `f(f'(f''(x)))`

.

Argument | Type | Description |
---|---|---|

`orig` | `Model[ArrayXd, ArrayXd]` | The layer to copy. |

`n` | `int` | The number of copies to construct. |

RETURNS | `Model[ArrayXd, ArrayXd]` | The composed model. |

View on GitHub## thinc/layers/clone.py

### concatenate function

Compose two or more models `f`

, `g`

, etc, such that their outputs are
concatenated, i.e. `concatenate(f, g)(x)`

computes `hstack(f(x), g(x))`

.

Argument | Type | Description |
---|---|---|

`*layers` | `Model[ArrayXd, ArrayXd]` | The models to compose. |

RETURNS | `Model[ArrayXd, ArrayXd]` | The composed model. |

View on GitHub## thinc/layers/concatenate.py

### expand_window function

For each vector in an input, construct an output vector that contains the input
and a window of surrounding vectors. This is one step in a convolution. If the
`window_size`

is three, the output size `nO`

will be `nI * 7`

after concatenating three
contextual vectors from the left, and three from the right, to each input vector. In
general, `nO`

equals `nI * (2 * window_size + 1)`

.

Argument | Type | Description |
---|---|---|

`window_size` | `int` | The window size (default 1) that determines the number of surrounding vectors. |

RETURNS | `Model[Floats2d, Floats2d]` | The created layer for adding context to vectors. |

View on GitHub## thinc/layers/expand_window.py

### noop function

Transform a sequences of layers into a null operation.

Argument | Type | Description |
---|---|---|

`*layers` | `Model[ArrayXd, ArrayXd]` | The models to compose. |

RETURNS | `Model[ArrayXd, ArrayXd]` | The composed model. |

View on GitHub## thinc/layers/noop.py

### residual function

A unary combinator creating a residual connection. This converts a layer
computing `f(x)`

into one that computes `f(x)+x`

. Gradients flow through
residual connections directly, helping the network to learn more smoothly.

Argument | Type | Description |
---|---|---|

`layer` | `Model[T, T]` | A model with the same input and output types. |

RETURNS | `Model[T, T]` | A model with the unchanged input and output types. |

View on GitHub## thinc/layers/residual.py

### siamese function

Combine and encode a layer and a similarity function to form a siamese architecture. Typically used to learn symmetric relationships, such as redundancy detection.

Argument | Type | Description |
---|---|---|

`layer` | `Model` | The layer to run over the pair of inputs. |

`similarity` | `Model` | The similarity layer. |

RETURNS | `Model[Tuple, ArrayXd]` | The created siamese layer. |

View on GitHub## thinc/layers/siamese.py

### uniqued function

Group inputs to a layer, so that the layer only has to compute for the unique
values. The data is transformed back before output, and the same transformation
is applied for the gradient. Effectively, this is a cache local to each
minibatch. The `uniqued`

wrapper is useful for word inputs, because common words
are seen often, but we may want to compute complicated features for the words,
using e.g. character LSTM.

Argument | Type | Description |
---|---|---|

`layer` | `Model` | The layer. |

keyword-only | ||

`column` | `int` | The column. Defaults to `0` . |

RETURNS | `Model[ArrayXd, FloatsXd]` | The composed model. |

View on GitHub## thinc/layers/uniqued.py

## Data type transfers

### array_getitem, ints_getitem, floats_getitem function

Index into input arrays, and return the subarrays. Multi-dimensional indexing
can be performed by passing in a tuple, and slicing can be performed using the
slice object. For instance, `X[:, :-1]`

would be
`(slice(None, None), slice(None, -1))`

.

Argument | Type | Description |
---|---|---|

`index` | `Union[Union[int, slice, Sequence[int]], Tuple[Union[int, slice, Sequence[int]], …]` | A valid numpy-style index. |

View on GitHub## thinc/layers/array_getitem.py

### list2array function

Transform sequences to ragged arrays if necessary. If sequences are already
ragged, do nothing. A ragged array is a tuple `(data, lengths)`

, where `data`

is
the concatenated data.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[List[Array2d], Array2d]` | The layer to compute the transformation. |

View on GitHub## thinc/layers/list2array.py

### list2ragged function

Transform sequences to ragged arrays if necessary and return the ragged array.
If sequences are already ragged, do nothing. A ragged array is a tuple
`(data, lengths)`

, where `data`

is the concatenated data.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[List[Array2d], Ragged]` | The layer to compute the transformation. |

View on GitHub## thinc/layers/list2ragged.py

### list2padded function

Create a layer to convert a list of array inputs into
`Padded`

.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[List[Array2d], Padded]` | The layer to compute the transformation. |

View on GitHub## thinc/layers/list2padded.py

### ragged2list function

Transform sequences from a ragged format into lists.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[Ragged, List[Floats2d]]` | The layer to compute the transformation. |

View on GitHub## thinc/layers/ragged2list.py

### padded2list function

**Input:**`Padded`

**Output:**`List[Array]`

Create a layer to convert a `Padded`

input into a list
of arrays.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[Padded, List[Array]]` | The layer to compute the transformation. |

View on GitHub## thinc/layers/padded2list.py

### remap_ids function

**Input:**`Sequence[Any]`

**Output:**`Ints2d`

Remap string or integer inputs using a mapping table, usually as a preprocess
before embeddings. The mapping table can be passed in on input, or updated after
the layer has been created. The mapping table is stored in the `"mapping_table"`

attribute.

Argument | Type | Description |
---|---|---|

`mapping_table` | `Dict[Any, int]` | The mapping table to use. Can also be set after initialization by writing to `model.attrs["mapping_table"]` . |

`default` | `int` | The default value if the input does not have an entry in the mapping table. |

`dtype` | `DTypes` | The data type of the array. |

RETURNS | `Model[Sequence[Any], Ints2d]` | The layer to compute the transformation. |

View on GitHub## thinc/layers/remap_ids.py

### strings2arrays function

**Input:**`Sequence[Sequence[str]]`

**Output:**`List[Ints2d]`

Transform a sequence of string sequences to a list of arrays.

Argument | Type | Description |
---|---|---|

RETURNS | `Model[Sequence[Sequence[str]], List[Ints2d]]` | The layer to compute the transformation. |

View on GitHub## thinc/layers/strings2arrays.py

### with_array function

Transform sequence data into a contiguous two-dimensional array on the way into and out of a model. Handles a variety of sequence types: lists, padded and ragged. If the input is a two-dimensional array, it is passed through unchanged.

Argument | Type | Description |
---|---|---|

`layer` | `Model[Array2d, Array2d]` | The layer to wrap. |

keyword-only | ||

`pad` | `int` | The padding. Defaults to `0` . |

RETURNS | `Model` | The wrapped layer. |

View on GitHub## thinc/layers/with_array.py

### with_flatten function

**Input:**`Sequence[Sequence[Any]]`

**Output:**`List[Array2d]`

Flatten nested inputs on the way into a layer and reverse the transformation over the outputs.

Argument | Type | Description |
---|---|---|

`layer` | `Model` | The layer to wrap. |

RETURNS | `Model` | The wrapped layer. |

View on GitHub## thinc/layers/with_flatten.py

### with_padded function

Convert sequence input into the `Padded`

data type on
the way into a layer and reverse the transformation on the output.

Argument | Type | Description |
---|---|---|

`layer` | `Model[Padded, Padded]` | The layer to wrap. |

RETURNS | `Model` | The wrapped layer. |

View on GitHub## thinc/layers/with_padded.py

### with_ragged function

Convert sequence input into the `Ragged`

data type on
the way into a layer and reverse the transformation on the output.

Argument | Type | Description |
---|---|---|

`layer` | `Model[Ragged, Ragged]` | The layer to wrap. |

RETURNS | `Model` | The wrapped layer. |

View on GitHub## thinc/layers/with_ragged.py

### with_list function

Convert sequence input into lists on the way into a layer and reverse the transformation on the outputs.

Argument | Type | Description |
---|---|---|

`layer` | `Model[List[Array2d], List[Array2d]]` | The layer to wrap. |

RETURNS | `Model` | The wrapped layer. |

View on GitHub## thinc/layers/with_list.py

### with_getitem function

**Input:**`Tuple`

**Output:**`Tuple`

Transform data on the way into and out of a layer by plucking an item from a tuple.

Argument | Type | Description |
---|---|---|

`idx` | `int` | The index to pluck from the tuple. |

`layer` | `Model[ArrayXd, ArrayXd]` | The layer to wrap. |

RETURNS | `Model[Tuple, Tuple]` | The wrapped layer. |

View on GitHub## thinc/layers/with_getitem.py

### with_reshape function

Reshape data on the way into and out from a layer.

Argument | Type | Description |
---|---|---|

`layer` | `Model[Array2d, Array2d]` | The layer to wrap. |

RETURNS | `Model[Array3d, Array3d]` | The wrapped layer. |

View on GitHub## thinc/layers/with_reshape.py

### with_debug function

**Input:**`Any`

**Output:**`Any`

Debugging layer that wraps any layer and allows executing callbacks during the forward pass, backward pass and initialization. The callbacks will receive the same arguments as the functions they’re called in and are executed before the function runs.

`Example````
from thinc.api import Linear, with_debug
def on_init(model, X, Y):
print(f"X: {type(Y)}, Y ({type(Y)})")
model = with_debug(Linear(2, 5), on_init=on_init)
model.initialize()
```

Argument | Type | Description |
---|---|---|

`layer` | `Model` | The layer to wrap. |

`name` | `Optional[str]` | Optional name for the wrapped layer, will be prefixed by `debug:` . Defaults to name of the wrapped layer. |

keyword-only | ||

`on_init` | `Callable[[Model, Any, Any], None]` | Function called on initialization. Receives the model and the `X` and `Y` passed to `Model.initialize` , if available. |

`on_forward` | `Callable[[Model, Any, bool], None]` | Function called at the start of the forward pass. Receives the model, the inputs and the value of `is_train` . |

`on_backprop` | `Callable[[Any], None] = do_nothing` | Function called at the start of the backward pass. Receives the gradient. |

RETURNS | `Model` | The wrapped layer. |

View on GitHub## thinc/layers/with_debug.py

## Wrappers

### PyTorchWrapper, PyTorchRNNWrapper function

**Input:**`Any`

**Output:**`Any`

Wrap a PyTorch model so that it has the same API as
Thinc models. To optimize the model, you’ll need to create a PyTorch optimizer
and call `optimizer.step`

after each batch. The `PyTorchRNNWrapper`

has the same
signature as the `PyTorchWrapper`

and lets you to pass in a custom sequence
model that has the same inputs and output behavior as a
`torch.nn.RNN`

object.

Your PyTorch model’s forward method can take arbitrary positional arguments and
keyword arguments, but must return either a **single tensor** as output or a
**tuple**. You may find
PyTorch’s `register_forward_hook`

helpful if you need to adapt the output. The convert functions are used to map
inputs and outputs to and from your PyTorch model. Each function should return
the converted output, and a callback to use during the backward pass:

```
Xtorch, get_dX = convert_inputs(X)
Ytorch, torch_backprop = model.shims[0](Xtorch, is_train)
Y, get_dYtorch = convert_outputs(Ytorch)
```

To allow maximum flexibility, the `PyTorchShim`

expects
`ArgsKwargs`

objects on the way into the forward
and backward passes. The `ArgsKwargs`

objects will be passed straight into the
model in the forward pass, and straight into `torch.autograd.backward`

during
the backward pass.

Argument | Type | Description |
---|---|---|

`pytorch_model` | `Any` | The PyTorch model. |

`convert_inputs` | `Callable` | Function to convert inputs to PyTorch tensors (same signature as `forward` function). |

`convert_outputs` | `Callable` | Function to convert outputs from PyTorch tensors (same signature as `forward` function). |

RETURNS | `Model[Any, Any]` | The Thinc model. |

View on GitHub## thinc/layers/pytorchwrapper.py

### TensorFlowWrapper function

**Input:**`Any`

**Output:**`Any`

Wrap a TensorFlow model, so that it has the same API
as Thinc models. To optimize the model, you’ll need to create a TensorFlow
optimizer and call `optimizer.apply_gradients`

after each batch. To allow
maximum flexibility, the `TensorFlowShim`

expects
`ArgsKwargs`

objects on the way into the forward
and backward passes.

Argument | Type | Description |
---|---|---|

`tensorflow_model` | `Any` | The TensorFlow model. |

RETURNS | `Model[Any, Any]` | The Thinc model. |

View on GitHub## thinc/layers/tensorflowwrapper.py

### MXNetWrapper function

**Input:**`Any`

**Output:**`Any`

Wrap a MXNet model, so that it has the same API as
Thinc models. To optimize the model, you’ll need to create a MXNet optimizer and
call `optimizer.step()`

after each batch. To allow maximum flexibility, the
`MXNetShim`

expects
`ArgsKwargs`

objects on the way into the forward
and backward passes.

Argument | Type | Description |
---|---|---|

`tensorflow_model` | `Any` | The TensorFlow model. |

RETURNS | `Model[Any, Any]` | The Thinc model. |

View on GitHub## thinc/layers/mxnetwrapper.py