Defining and Using Models

Thinc’s model-definition API is based on functional programming. The library provides a compact set of useful combinator functions, which combine layers together in different ways. It’s also very easy to write your own combinators, to implement custom logic. Thinc also provides shim classes that let you wrap models from other libraries, allowing you to use them within Thinc.

There are a few great advantages to Thinc’s approach: there’s less syntax to remember, complex models can be defined very concisely, we can perform shape inference to save you from passing in redundant values, and we’re able to perform sophisticated network validation, making it easier to raise errors early if there’s a problem with your network.

Thinc’s approach does come with some disadvantages, however. If you write custom combinators, you’ll have to take care to pass your gradients through correctly during the backward pass. Thinc also doesn’t try to perform sophisticated graph optimizations, so “native” Thinc models may be slower than PyTorch or TensorFlow. That’s where the shim layers come in: you can use Thinc to make all the data manipulation and preprocessing operations easy and readable, and then call into TensorFlow or PyTorch for the expensive part of your model, such as your transformer layers or BiLSTM.

Intro to Thinc · Everything you need to know to get started. Composing and training a model on the MNIST data, using config files, registering custom functions and wrapping PyTorch, TensorFlow and MXNet models.
Intro to Thinc's Model class · Defining and composing models, the model methods and working with dimensions, parameters and gradients.
Basic CNN part-of-speech tagger · Implementing and training a basic CNN for part-of-speech tagging model without external dependencies and using different levels of Thinc's configuration system.
Training a part-of-speech tagger with transformers (BERT) · How to use Thinc, Transformers and PyTorch to train a part-of-speech tagger. From model definition and config to the training loop.

Composing models

Thinc follows a functional-programming approach to model definition. Its approach is especially effective for complicated network architectures, and use-cases where different data-types need to be passed through the network to reach specific subcomponents. However, individual Thinc components are often less performant than implementations from other libraries, so we suggest the use of wrapper objects around performance-sensitive components such as LSTM or transformer encoders. You then use Thinc to wire these building blocks together in complicated ways, taking advantage of the type checking, configuration system and concise syntax to make your code easier to write, read and maintain.

For instance, let’s say you want a simple three-layer network, where you apply two fully-connected layers, each with a non-linear activation function, and then an output layer with a softmax activation. Thinc provides creation functions that pair weights + activation with optional dropout and layer normalization, and the feed-forward relationship is expressed using the chain combinator. So the three layer network would be:

Simple modelmodel = chain(
    Relu(nO=hidden_width, dropout=0.2),
    Relu(nO=hidden_width, dropout=0.2),
    Softmax(nO=n_classes)
)

The chain function is similar to the Sequential class in PyTorch and Keras: it expresses a feed-forward relationship between the layers you pass in. We refer to wiring functions like chain as combinators. While most libraries provide a few combinators, Thinc takes this general approach a bit further. Instead of expressing operations over data, you’ll often create your network by expressing operations over functions. This is what we mean when we say Thinc favors a “functional” as opposed to “imperative” approach.

Imperativedef multiply_reshape_sum(linear, X, pieces=4):
    Y = linear.forward(X)
    Y = Y.reshape((Y.shape[0], -1, pieces))
    Y = Y.sum(axis=-1)
    return Y

Functionalmultiply_reshape_sum = chain(
    linear,
    reshape(lambda Y, pieces: (X.shape[0], -1, pieces), {"pieces": 4}),
    reduce_sum(axis=-1)
)

In the imperative code above, you write a function that takes a batch of data, do some things to it, and return the result. The functional code is one step more abstract: you define the function using a relationship (chain) and pass in functions to do each step. This approach is confusing at first, but we promise it pays off once the network gets more complicated, especially when combined with the option to define your own infix notation through operator overloading.

Tagger with multi-feature embedding and CNN encodingfrom thinc.api import HashEmbed, Maxout, Softmax, expand_window
from thinc.api import residual, with_array, clone, chain, concatenate

width = 128
depth = 4
n_tags = 17

def MultiEmbed(width):
    return concatenate(
        HashEmbed(width, 4000, column=0),
        HashEmbed(width // 2, 2000, column=1),
        HashEmbed(width // 2, 2000, column=2),
        HashEmbed(width // 2, 2000, column=3),
    )

def Hidden(nO, dropout=0.2):
    return Maxout(nO, pieces=3, normalize=True, dropout=dropout)

def CNN(width):
    return residual(chain(expand_window(1), Hidden(width)))

model = with_array(
    chain(
        MultiEmbed(width),
        Hidden(width),
        clone(CNN(width), depth),
        Softmax(n_tags)
    )
)

The example above shows the definition of a tagger model with a multi-feature CNN token-to-vector encoder, similar to the one we used in spaCy v2.x. Multiple numeric ID features are extracted for each word, and each feature is separately embedded. The separate vectors are concatenated and passed through a hidden layer, and then several convolutional layers are applied for contextual encoding. Each CNN layer performs a “sequence-to-column” transformation, where a window of surrounding words is concatenated to each vector. A hidden layer then maps the result back to the original dimensionality. Residual connections and layer normalization are used to assist convergence.

Overloading operators

The Model.define_operators classmethod allows you to bind arbitrary binary functions to Python operators, for use in any Model instance. The method can (and should) be used as a contextmanager, so that the overloading is limited to the immediate block. This allows concise and expressive model definitions, using custom infix notations.

from thinc.api import Model, chain, Relu, Softmax

with Model.define_operators({">>": chain}):
    model = Relu(512) >> Relu(512) >> Softmax()

Model.define_operators takes a dict of operators mapped to functions, typically combinators. Each function should expect two arguments, one of which is a Model instance. The second argument can be any type, but will usually be another model. Within the block you can now use the defined operators to compose layers – for instance, a >> b is equivalent to chain(a, b). The overloading is cleaned up again at the end of the block. The following operators are supported: +, -, *, @, /, //, %, **, <<, >>, &, ^ and |.

If your models are very complicated, operator overloading can make your code more concise and readable, while also making it easier to change things and experiment with different architectures. Here’s the same CNN-based tagger, written with operator overloading.

with operator overloadingfrom thinc.api import Model, HashEmbed, Maxout, Softmax, expand_window
from thinc.api import residual, with_array, clone, chain, concatenate

width = 128
depth = 4
n_tags = 17

def Hidden(nO, dropout=0.2):
    return Maxout(nO, pieces=3, normalize=True, dropout=dropout)

with Model.define_operators({">>": chain, "**": clone, "|": concatenate}):
    model = with_array(
        (
            HashEmbed(width, 4000, column=0)
            | HashEmbed(width // 2, 2000, column=1)
            | HashEmbed(width // 2, 2000, column=2)
            | HashEmbed(width // 2, 2000, column=3)
        )
        >> Hidden(width)
        >> residual(expand_window(1) >> Hidden(width)) ** depth
        >> Softmax(n_tags)
    )

You won’t always want to use operator overloading, but sometimes it’s the best way to show how information flows through the network. It can also help you toggle debugging, logging or other configuration over individual components. For instance, you might set up your operators so that you can write LSTM(width, width) @ 4 to set logging level 4 over just that component. Note that the binding is defined at the beginning of each block, so you’re free to bind operators to your own functions, allowing you to define something like a local domain-specific language.

Initialization and data validation

After defining your model, you can call Model.initialize to initialize the weights, calculate unset dimensions, set attributes or perform any other setup that’s required. Combinators are in charge of initializing their child layers. Model.initialize takes an optional sample of input and output data that’s used to infer missing shapes and validate your network. If possible, you should always provide at least some data to ensure all dimensions are set and to spot potential problems early on.

If a layer receives an unexpected data type, Thinc will raise a DataValidationError – like in this case where a Linear layer that expects a 2d array is initialized with a 3d array:

Invalid datafrom thinc.api import Linear
import numpy

X = numpy.zeros((1, 1, 1), dtype="f")
model = Linear(1, 2)
model.initialize(X=X)

ErrorData validation error in 'linear'
X: <class 'numpy.ndarray'>
Y: <class 'NoneType'>

X   wrong array dimensions (expected 2, got 3)

During initialization, the inputs and outputs that pass through the model are checked against the signature of the layer’s forward function. If a layer’s forward pass annotates the input as X: Floats2d but receives a 3d array of floats, an error is raised. Similarly, if the forward pass annotates its return value as -> Tuple[List[FloatsXd], Callable] but the model is initialized with an array as the output data sample, you’ll also see an error.

Because each layer is only responsible for itself (and its direct children), data validation also works out-of-the-box for complex and nested networks. That’s also where it’s most powerful, since it lets you detect problems as the data is transformed. In this example, the Relu layer outputs a 2d array, but the ParametricAttention layer expects a ragged array of data and lengths.

Invalid networkX = [numpy.zeros((4, 75), dtype="f")]
Y = numpy.zeros((1,), dtype="f")
model = chain(
    list2ragged(),
    reduce_sum(),
    Relu(12, dropout=0.5),  # -> Floats2d
    ParametricAttention(12)
)
model.initialize(X=X, Y=Y)

ErrorData validation error in 'para-attn'
X: <class 'numpy.ndarray'>
Y: <class 'NoneType'>

X   instance of Ragged expected

Note that if a layer accepts multiple types, the data will be validated against each type and if it doesn’t match any of them, you’ll see an error describing all mismatches. For instance, with_array accepts a ragged array, a padded array, a 2d array or a list of 2d arrays. If you pass in a 3d array, which is invalid, the error will look like this:

Invalid datafrom thinc.api import with_array, Linear
import numpy

X = numpy.zeros((1, 1, 1), dtype="f")
model = with_array(Linear())
model.initialize(X=X)

ErrorData validation error in 'with_array-linear'
X: <class 'numpy.ndarray'>
Y: <class 'numpy.ndarray'>

X   instance of Padded expected
X   instance of Ragged expected
X   value is not a valid list
X   wrong array dimensions (expected 2, got 3)

To take advantage of runtime validation, config validation and static type checking, you should add type hints to any custom layers, wrappers and functions you define. Type hints are optional, though, and if no types are provided, the data won’t be validated and any inputs will be accepted.

Defining new layers

Thinc favors a composition rather than inheritance approach to creating custom sublayers: the base Model class should be all you need. You can define new layers by simply passing in different data, especially the forward function, which is where you’ll implement the layer’s actual logic. You’ll usually want to make a function to put the pieces together. We refer to such functions as constructors. The constructor is responsible for defining the layer. Parameter allocation and initialization takes place in an optional init function, which is called by model.initialize.

Layer definitionmodel = Model(
    "layer-name",                   # string name of layer
    forward,                        # forward function
    init=init,                      # optional initialize function
    dims={"nO": 128, "nI": None},   # optional dimensions
    params={"W": None, "b": None},  # optional parameters
    attrs={"my_attr": True},        # optional non-parameter attributes
    refs={},                        # optional references to other layers
    layers=[],                      # child layers
    shims=[]                        # child shims
)

Constructor functions

Thinc layers are almost always instances of Model. You usually don’t need to create a subclass, although you can if you prefer. Because layers usually reuse the same Model class, the constructor takes on some responsibilities for defining the layer, even if all the data isn’t available. The refs, params and dim dictionaries are all mappings from string names to optional values (where “optional” means you can make the value None). You should use None to indicate the full set of names that should be present once the layer is fully initialized. However, you cannot pass None values for missing child layers or shims: these lists do not support None values.

Dimension, attribute and parameter names are identified using strings. Thinc’s built-in layers use the convention that "nI" refers to the model’s input width, and "nO" refers to the model’s output width. You should usually try to provide these, unless they are undefined for your model (for instance, if your model accepts an arbitrary unsized object like a database connector, it doesn’t make sense to provide an "nI" dimension.) Your constructor should define all dimensions and parameters you want to attach to the model, mapping them to None if the values aren’t available yet. You’ll usually map parameters to None, and only allocate them in your init function.

from thinc.api import Model

def random_chain(child_layer1: Model, child_layer2: Model, prob: float = 0.2) -> Model:
    """Randomly invert the order of two layers during training."""
    return Model(
        "random_order",
        random_chain_forward,
        init=init,
        attrs={"prob": prob},
        layers=[child_layer1, child_layer2],
    )

Many model instances will have one or more child layers. For composite models that have several distinct parts, you should usually write your creation functions to receive instances of the child layers, rather than settings to construct it. This will make your layer more modular, and let you take better advantage of Thinc’s config system. You can add or remove child layers after creation via the Model.layers list property, but you should usually prefer to set up the child layers on creation if possible.

In complicated networks, sometimes you need to refer back to specific parts of the model. For instance, you might want to access the embedding layer of a network directly. The refs dict lets you create named references to nodes. You can have nodes referring to their siblings or parents, so long as you don’t try to serialize only that component: when you call Model.to_bytes or Model.to_disk, all of the reference targets must be somewhere within the model’s tree. Under the hood, references are implemented using Python’s weakref feature, to avoid circular dependencies.

For instance, let’s say you needed each child layer in the random_order example above to refer to each other. You could do this by setting node references for them:

child_layer1.set_ref("sibling", child_layer2)
child_layer2.set_ref("sibling", child_layer1)

If you call model.to_bytes, both references will be within the tree, so there will be no problem. But you would not be able to call child_layer1.to_bytes or child_layer2.to_bytes, as the link targets aren’t reachable from Model.walk.

Thinc’s built-in layers follow a naming convention where combinators and stateless transformations are created from snake_case functions, while weights layers or higher level components are created from CamelCased names. This naming reflects the general usage purpose of the layer, rather than the details of exactly what top-level container is returned. Constructing models via functions allows your code to do some things that would be difficult or impossible with an API that exposes __init__ methods directly, because it’s difficult to return a different object instance from a class constructor. For instance, Thinc’s Relu layer accepts the options dropout and normalize. These operations are implemented as separate layers, so the constructor uses the chain combinator to put everything together. You should feel free to take this type of approach in your own constructors too: you can design the components of your network to be smaller reusable pieces, while making the user-facing API refer to larger units.

The forward function and backprop callback

Writing the forward function is the main part of writing a new layer — it’s where the computation actually takes place. The forward function is passed into Model.__init__, and then kept as a reference within the model instance. You won’t normally call your forward function directly. It will usually be invoked indirectly, via the __call__, predict and begin_update methods. The implementation of the Model class is pretty straightforward, so you can have a look at the code to see how it all fits together.

Because the forward function is invoked within the Model instance, it needs to stick to a strict signature. Your forward function needs to accept exactly three arguments: the model instance, the input data and a flag indicating whether the model is being invoked for training, or for prediction. It needs to return a tuple with the output, and the backprop callback.

Forward function for a Linear layerdef linear_forward(model: Model, X, is_train):
    W = model.get_param("W")
    b = model.get_param("b")
    Y = X @ W.T + b

    def backprop(dY):
        model.inc_grad("b", dY.sum(axis=0))
        model.inc_grad("W", dY.T @ X)
        return dY @ W

    return Y, backprop

Argument	Type	Description
`model`	`Model`	The model instance.
`X`	`Any`	The inputs.
`is_train`	`bool`	Whether the model is running in a training context.
RETURNS	`Tuple[Any, Callable]`	The output and the backprop callback.

The model won’t invoke your forward function with any extra positional or keyword arguments, so you’ll normally attach what you need to the model, as params, dims, attrs, layers, shims or refs. At the beginning of your function, you’ll fetch what you need from the model and carry out your computation. For example, the random_chain_forward function retrieves its child layers and the prob attribute, uses them to compute the output, and then returns it along with the backprop callback. The backprop callback uses some results from the forward’s scope (specifically the two child callbacks and the prob attribute), and returns the gradient of the input.

def random_chain_forward(model: Model, X, is_train: bool):
    child_layer1 = model.layers[0]
    child_layer2 = model.layers[1]
    prob = model.get_attr("prob")
    is_reversed = is_train and prob >= random.random()
    if is_reversed:
        Y, get_dX = child_layer2(X, is_train)
        Z, get_dY = child_layer1(Y, is_train)
    else:
        Y, get_dX = child_layer1(X, is_train)
        Z, get_dY = child_layer2(Y, is_train)

    def backprop(dZ):
        dY = get_dY(dZ)
        dX = get_dX(dY)
        return dX

    return Z, backprop

Instead of defining the forward function separately, it’s sometimes more elegant to write it as a closure within the constructor. This is especially helpful for quick utilities and transforms. If you don’t otherwise need a setting to be accessible from a model or serialized with it, you can simply reference it from the outer scope, rather than passing it in as an attribute. You should avoid referencing child layers in this way, however, as you do need to pass the child layers into the layers list – otherwise they will be not part of the model’s tree.

def random_chain(child_layer1, child_layer2, prob=0.2):
    ...
    def random_chain_forward(model: Model, X, is_train: bool):
        # You can define the `forward` function as a closure. If so, it's fine
        # to use attributes from the outer scope, but child layers should be
        # retrieved from the model.
        child_layer1, child_layer2 = model.layers
        is_reversed = is_train and prob >= random.random()
        ...
    ...

Another way to pass data from the constructor into the forward function is partial function application. This is the best approach for static data that will be reliably available at creation and does not need to be serialized. Partial application is also the best way to establish temporary buffers for your forward function to reuse. Reusable buffers can be helpful for performance tuning to prevent repeat additional memory allocation.

There are a few constraints that well-behaved forward functions should follow in order to make them interoperate better with other layers, and to help your code work smoothly in distributed settings. Many of these considerations are less relevant for quick hacks or experimentation, but even if everything stays between you and your editor, you’re likely to find your code easier to work with and reason about if you keep these rules in mind, as they all amount to “avoid unnecessary side-effects”.

Params can be read-only. The model.get_params method is allowed to return read-only arrays, or copies of the internal data. You should not assume that in-place changes to the params will have any effect.
Inputs can be read-only. You should avoid writing to input variables, but if it’s really necessary for efficiency, you should at least check the array.flags["WRITEABLE"] attribute, and make a copy if necessary. Equally, if it’s crucial for your layer that variables are not written to, use array.setflags(write=False) to prevent any shenanigans.
Writeable variables might get written to. You’ll often want to reference a variable returned by your forward function inside your backprop callback, as otherwise you’d have to recompute it. However, after your function has returned, the caller might write to the array, changing it out from under you. If the data is small, the best solution is to make a copy of variables you want to retain. If you’re really worried about efficiency, you can set the array to read-only using array.set_flags(write=False).
Avoid calls to Model.set_param. If you do have to change the params during the forward function or the backprop callback, set_param is the best way to do it – but you should try to prefer other designs, as other code may not expect params to change during forward or backprop.
Don’t call Model.set_dim. There’s not really a need to do this, and you’re likely to cause a lot of problems for other layers. If the parent layer checks a child dimension and then invokes it, the parent should not have to double check that the dimensions of the child have changed.
It’s okay to update the Model.attrs. If you do need to change state during the forward pass (for instance, for batch normalization), model.attrs["some_attr"] = new_value is the best approach. Of course, all else being equal, side-effects are worse than no side-effects – but sometimes it’s just what you need to do. Consequently, your layer should expect that child layers are allowed to modify or set attrs during their forward or backward functions.

You can avoid any of the read-only stuff by following a policy of never modifying your inputs, and always making a copy of your outputs if you need to retain them. This is the most conservative approach, and for most situations it’s what we would recommend – it ensures your layer will work well even when combined with other layers that aren’t written carefully. However, you’ll sometimes be writing code where it’s reasonable to worry about unnecessary copies. In these situations, the read-only flags work sort of like traffic rules. If everyone cooperates, you can go faster, but if someone forgets to indicate, there might be a crash.

Writing the backprop callback

Your forward function must return a callback to compute gradients of the parameters and weights during training. The callback must accept inputs that match the outputs of the forward pass and return outputs that match the inputs to the forward pass (the specifics of “matching” might differ between types, but for arrays, assume it means the same shape and type). If the forward function is Y = forward(X), then the backprop callback should be dX = backprop(dY), with X.shape == dX.shape and Y.shape == dY.shape.

Your backprop callback will often refer to variables in the outer scope. This allows you to easily reuse state from the forward pass. The Python runtime will increment the reference count of all variables that your backprop callback references, and then decrement the reference counts once the callback is destroyed. We can see this working by attaching a __del__ method to some classes, which show when the objects are being destroyed.

class Thing:
    def __del__(self):
        print("Deleted thing")

class Other:
    def __del__(self):
        print("Deleted other")

def outer():
    thing = Thing()
    other = Other()
    def inner():
        return thing
    return inner

>>> callback = outer()
Deleted other

>>> callback = None
Deleted thing

This behavior makes managing memory very easy: objects you reference will be kept alive, and objects you don’t are eligible to be freed. You should therefore avoid unnecessary references if possible. For instance, if you only need the shape of an array, it is better to assign that to a local variable, rather than accessing it via the parent array.

Your backprop callback is not guaranteed to be called, so you should not rely on it to compute side-effects. It is also valid for code to execute the backprop callback more than once, so your function should be prepared for that. However, it is not valid to call the backprop callback if the forward function was executed with is_train=False, so you can implement predict-only optimizations. It is also invalid for layers to change each others’ parameters or dimensions during execution, so your function does not need to be prepared for that.

Thinc does leave you with the responsibility for calculating the gradients correctly. If you do not get them right, your layer will not learn correctly. If you’re having trouble, you might find Thinc’s built-in layers a helpful reference, as they show how the backward pass should look for a number of different situations. They also serve as examples for how we would suggest you structure your code to make calculating the backward pass easier. Naming is especially important: you need to see the order of steps in the forward pass and unwind them in reverse. For complex cases, it also helps a lot to break out calculations into helper functions that return a backprop callback. Then your outer layer can simply call the callbacks in reverse. It also helps to follow a consistent naming scheme for these callbacks. We usually either name our callbacks by the result returned (like get_dX), or the variable that you’ll pass in (like bp_dY).

Often you’ll write layers that are not meaningfully differentiable, or for which you do not need the gradients of the inputs. For instance, you might have a layer that lower-cases textual inputs. In these cases, the backprop callback should return some appropriate falsy value of the same type as the input to avoid raising spurious type errors. For instance, if the input is a list of strings, you would return an empty list; if the input is an array, you can return an empty array.

The initialize function

The last function you may need to define is the initializer, or “init”, function. Like the forward function, your init function will be stored on the model instance, and then called by the Model.initialize method. You do not have to expect that the function will be called in other situations.

Your init function will be called with an instance of your model and two optional arguments, which may provide an example batch of inputs (X) and an example batch of outputs (Y). The arguments may be provided positionally or by keyword (so the naming is significant: you must call the arguments X and Y).

Your model can use the provided example data to help calculate unset dimensions, assist with parameter initialization, or calculate attributes. It is valid for your construction function to return your model with missing information, hoping that the information will be filled in later or at initialization via the example data. If the example data is not provided and you’re left with unset dimensions or other incomplete state, you should raise an error. It is up to you to decide how you should handle receiving conflict information at construction and initialization time. Sometimes it will be better to overwrite the construction data, and other times it will be better to raise an error.

Initialize functiondef random_chain_init(model, X=None, Y=None):
    if X is not None and model.has_dim("nI") is None:
        model.set_dim("nI", X.shape[1])
    if Y is not None and model.has_dim("nO") is None:
        model.set_dim("nO", Y.shape[1])
    for child_layer in model.layers:
        child_layer.initialize(X=X, Y=Y)

The model.initialize method will not call your init function with any extra arguments, but you will often want to parameterize your init with settings from the constructor. This is especially common for weight initialization: there are many possible schemes, and the choice is often an important hyper-parameter. While you can communicate this information via the attrs dict, our favorite solution is to use functools.partial. Partial function application lets you fill in a function’s arguments at two different times, instead of doing it all at once — which is exactly what we need here. This lets you write the init function with extra arguments, which you provide at creation time when the information is available.

Passing init args using partialfrom functools import partialfrom thinc.api import Model

def constructor(nO=None, initial_value=1):
    return Model(
        "my-model",
        forward,
        dims={"nO": nO},
        params={"b": None},
        init=partial(init, initial_value=initial_value)    )

def forward(model, X, is_train):
    ...

def init(initial_value, model, X=None, Y=None):    if Y is not None and model.has_dim("nO") is None:
        model.set_dim("nO", None)
    if not model.get_dim("nO"):
        raise ValueError(f"Cannot initialize {model.name}: dimension nO unset")
    b = model.ops.alloc1f(model.get_dim("nO"))
    b += initial_value
    model.set_param("b", b)

A call to Model.initialize will trigger subsequent calls down the model tree, as each layer is responsible for calling the initialize method of each of its children (although not necessarily in order). Prior to training, you can rely on your model’s init function being called at least once in between the constructor and the first execution of your forward function. It is invalid for a parent to not call initialize on one of its children, and it is invalid to start training without calling Model.initialize first.

However, it is valid to create a layer and then load its state back with Model.from_dict, Model.from_bytes or Model.from_disk without calling model.initialize first. This means that you should not write any side-effects in your init function that will not be replicated by deserialization. Deserialization will restore dimensions, parameters, node references and serializable attributes, but it will not replicate any changes you have made to the layer’s node structure, such as changes to the model.layers list. You should therefore avoid making those changes within the init, or your model will not deserialize correctly.

It is valid for your init function to be called more than once. This usually happens because the same model instance occurs twice within a a tree (which is allowed). Your initialize function should therefore be prepared to run twice. If you’re simply setting dimensions and allocating and initializing parameters, having the init function run will generally be unproblematic. However, in special situations you may need to call external setup code, in which case having your init function run twice could be problematic. The best solution would probably be to set a global variable, possibly using the model.id attribute to allow the init to run once per model instance. You could also use partial function application to attach the flag in a mutable argument variable.

Inspecting and updating model state

As you build more complicated models, you’ll often need to inspect your model in various ways. This is especially important when you’re writing your own layers. Here’s a quick summary of the different types of information you can attach and query.


`Model.id`	A numeric identifier, to distinguish different model instances. During `Model.__init__`, the `Model.global_id` class attribute is incremented and the next value is used for the `model.id` value.
`Model.name`	A string name for the model.
`Model.layers` `Model.walk`	List the immediate sublayers of a model, or iterate over the model’s whole subtree (including the model itself).
`Model.shims`	Wrappers for external libraries, such as PyTorch and TensorFlow. `Shim` objects hold a reference to the external object, and provide a consistent interface for Thinc to work with, while also letting Thinc treat them separately from `Model` instances for the purpose of serialization and optimization.
`Model.has_dim` `Model.get_dim` `Model.set_dim` `Model.dim_names`	Check, get, set and list the layer’s dimensions. A dimension is an integer value that affects a model’s parameters or the shape of its input data.
`Model.has_param` `Model.get_param` `Model.set_param` `Model.param_names`	Check, get, set and list the layer’s weights parameters. A parameter is an array that can have a gradient and can be optimized.
`Model.has_grad` `Model.get_grad` `Model.set_grad` `Model.inc_grad` `Model.grad_names`	Check, get, set, increment and list the layer’s weights gradients. A gradient is an array of the same shape as a weights parameter, that increments values used to update the parameter during training.
`Model.has_ref` `Model.get_ref` `Model.set_ref` `Model.ref_names`	Check, get, set and list the layer’s node references. A node reference lets you easily refer to particular nodes within your model’s subtree. For instance, if you want to expose the embedding table from your model, you can add a reference to it.
`Model.attrs`	A dict of the layer’s attributes. Attributes are other information the layer needs, such as configuration or settings. You should ensure that attribute values you set are either JSON-serializable, or support a `to_bytes` method, or the attribute will prevent model serialization.

Naming conventions

Thinc names dimensions and parameters with strings, so you can use arbitrary names on your models. For the built-in layers and combinators, we use the following conventions:


`nO`	The width of output arrays from the layer.
`nI`	The width of input arrays from the layer.
`nP`	Number of “pieces”. Used in the `Maxout` layer.
`W`	A 2-dimensional weights parameter, for connection weights.
`b`	A 1-dimensional weights parameter, for biases.
`E`	A 2-dimensional weights parameter, for an embedding table.

Serializing models and data

Of course, training a model isn’t very useful without a way to save out the weights and load them back in later. Thinc supports three ways of saving and loading your model:

The most flexible is to use the Model.to_bytes method, which saves the model state to a byte string, serialized using the msgpack library. The result can then be loaded back using the Model.from_bytes method.
The Model.to_disk method works similarly, except the result is saved to a path you provide instead. The result can be loaded back using the Model.from_disk method.
Pickle the Model instance. This should work, but is not our recommendation for most use-cases. Pickle is inefficient in both time and space, does not work reliably across Python versions or platforms, and is not suitable for untrusted inputs, as unpickling an object allows arbitrary code execution by design.

The from_bytes and from_disk methods are intended to be relatively safe: unlike formats such as Pickle, untrusted inputs are not intended to allow arbitrary code execution. This means you have to create the Model object yourself first, and then use that object to load in the state.

To make this easier, you’ll usually want to put your model creation code inside a function, and then register it. The registry allows you to look up the function by name later, so you can pass along all the details to recreate your model in one message. Check out our guide on the config system for more details.

Serializing attributes

When you call Model.to_bytes or Model.to_disk, the model and its layers, weights, parameters and attributes will be serialized to a byte string. Calling Model.from_bytes or Model.from_disk lets you load a model back in. By default, Thinc uses MessagePack, which works out-of-the-box for all JSON-serializable data types. The serialize_attr and deserialize_attr functions that Thinc uses under the hood are single-dispatch generic functions. This means that you can register different versions of them that are chosen based on the value and type of the attribute.

For example, let’s say your model takes attributes that are DataFrame objects:

Model with custom attrsfrom thinc.api import Model
import pandas as pd

attrs = {"df": pd.DataFrame([10, 20, 30], columns=["a"])}model = Model("custom-model", lambda X: (X, lambda dY: dY), attrs=attrs)

To tell Thinc how to save and load them, you can use the @serialize_attr.register and @deserialize_attr.register decorators with the type pd.DataFrame. Whenever Thinc encounters an attribute value that’s a dataframe, it will use these functions to serialize and deserialize it.

Custom attr serializationfrom thinc.api import serialize_attr, deserialize_attr
import pandas as pd
import numpy

@serialize_attr.register(pd.DataFrame)def serialize_dataframe(_, value, name, model):    """Serialize the value (a dataframe) to bytes."""
    rec = value.to_records(index=False)
    return rec.tostring()

@deserialize_attr.register(pd.DataFrame)def deserialize_dataframe(_, value, name, model):    """Deserialize bytes to a dataframe."""
    rec = numpy.frombuffer(value, dtype="i")
    return pd.DataFrame().from_records(rec)

The first argument of the function is always an instance of the attribute. This is used to decide which function to call. The value is the value to save or load – a dataframe to serialize or the bytestring to deserialize. The functions also give you access to the string name of the current attribute and the Model instance. This is useful if you need additional information or if you want to perform other side-effects when loading the data back in. For example, you could check if the model has another attribute specifying the data type of the array and use that when loading back the data:

@deserialize_attr.register(pd.DataFrame)
def deserialize_dataframe(_, value, name, model):
    """Deserialize bytes to a dataframe."""
    dtype = model.attrs.get("dtype", "i")    rec = numpy.frombuffer(value, dtype=dtype)    return pd.DataFrame().from_records(rec)

Since the attribute value is used to decide which serialization and deserialization function to use, make sure that your model defines default values for its attributes. This way, the correct function will be called when you run model.from_bytes or model.from_disk to load in the data.

- attrs = {"df": None}
+ attrs = {"df": pd.DataFrame([10, 20, 30], columns=["a"])}
model = Model("custom-model", lambda X: (X, lambda dY: dY), attrs=attrs)