{"componentChunkName":"component---src-templates-docs-js","path":"/docs/concept","result":{"data":{"site":{"siteMetadata":{"sidebar":[{"label":"Get started","items":[{"text":"Introduction","url":"/docs/"},{"text":"Concept & Design","url":"/docs/concept"},{"text":"Installation & Setup","url":"/docs/install"},{"text":"Examples & Tutorials","url":"https://github.com/explosion/thinc/#-selected-examples-and-notebooks"},{"text":"Backprop 101","url":"/docs/backprop101"}]},{"label":"Usage","items":[{"text":"Configuration System","url":"/docs/usage-config"},{"text":"Defining & Using Models","url":"/docs/usage-models"},{"text":"Training Models","url":"/docs/usage-training"},{"text":"PyTorch, TensorFlow etc.","url":"/docs/usage-frameworks"},{"text":"Variable-length Sequences","url":"/docs/usage-sequences"},{"text":"Type Checking","url":"/docs/usage-type-checking"}]},{"label":"API","items":[{"text":"Model","url":"/docs/api-model"},{"text":"Layers","url":"/docs/api-layers"},{"text":"Optimizers","url":"/docs/api-optimizers"},{"text":"Initializers","url":"/docs/api-initializers"},{"text":"Schedules","url":"/docs/api-schedules"},{"text":"Losses","url":"/docs/api-loss"},{"text":"Config & Registry","url":"/docs/api-config"},{"text":"Types & Dataclasses","url":"/docs/api-types"},{"text":"Backends & Math","url":"/docs/api-backends"},{"text":"Utilities & Extras","url":"/docs/api-util"}]}]}},"markdownRemark":{"htmlAst":{"type":"root","children":[{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Thinc is built on a fairly simple conceptual model that’s a little bit different\nfrom other neural network libraries. On this page, we build up the library from\nfirst principles, so you can see how everything fits together. This page assumes\nsome conceptual familiarity with "},{"type":"element","tagName":"a","properties":{"href":"/docs/backprop101"},"children":[{"type":"text","value":"backpropagation"}]},{"type":"text","value":", but you\nshould be able to follow along even if you’re hazy on some of the details."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"The model composition problem"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The central problem for a neural network implementation is this: during the\n"},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"forward pass"}]},{"type":"text","value":", you compute results that will later be useful during the\n"},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"backward pass"}]},{"type":"text","value":". How do you keep track of this arbitrary state, while making\nsure that layers can be cleanly composed?"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Instead of starting with the problem directly, let’s start with a simple and\nobvious approach, so that we can run into the problem more naturally. The most\nobvious idea is that we have some thing called a "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"model"}]},{"type":"text","value":", and this thing holds\nsome parameters (“weights”) and has a method to predict from some inputs to some\noutputs using the current weights. So far so good. But we also need a way to\nupdate the weights. The most obvious API for this is to add an "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"update"}]},{"type":"text","value":" method,\nwhich will take a batch of inputs and a batch of correct labels, and compute the\nweight update."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python"},"children":[{"type":"text","value":"class UncomposableModel:\n    def __init__(self, W):\n        self.W = W\n\n    def predict(self, inputs):\n        return inputs @ self.W.T\n\n    def update(self, inputs, targets, learn_rate=0.001):\n        guesses = self.predict(inputs)\n        d_guesses = (guesses-targets) / targets.shape[0]  # gradient of loss w.r.t. output\n        # The @ is newish Python syntax for matrix multiplication\n        d_inputs = d_guesses @ self.W\n        dW = d_guesses.T @ inputs  # gradient of parameters\n        self.W -= learn_rate * dW  # update weights\n        return d_inputs\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"This API design works in itself, but the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"update()"}]},{"type":"text","value":" method only works as the\nouter-level API. You wouldn’t be able to put another layer with the same API\nafter this one and backpropagate through both of them. Let’s look at the steps\nfor backpropagating through two matrix multiplications:"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python"},"children":[{"type":"text","value":"def backprop_two_layers(W1, W2, inputs, targets):\n    hiddens = inputs @ W1.T\n    guesses = hiddens @ W2.T\n    d_guesses = (guesses-targets) / targets.shape[0]  # gradient of loss w.r.t. output\n    dW2 = d_guesses @ hiddens.T\n    d_hiddens = d_guesses @ W2\n    dW1 = d_hiddens @ inputs.T\n    d_inputs = d_hiddens @ W1\n    return dW1, dW2, d_inputs\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"In order to update the first layer, we need to know the gradient with respect to\nits output. We can’t calculate that value until we’ve finished the full forward\npass, calculated the gradient of the loss, and then backpropagated through the\nsecond layer. This is why the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"UncomposableModel"}]},{"type":"text","value":" is uncomposable: the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"update"}]},{"type":"text","value":"\nmethod expects the input and the target to both be available. That only works\nfor the outermost API – the same API can’t work for intermediate layers."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Although nobody thinks of it this way, reverse-model auto-differentiation (as\nsupported by PyTorch, Tensorflow, etc) can be seen as a solution to this API\nproblem. The solution is to base the API around the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"predict"}]},{"type":"text","value":" method, which\ndoesn’t have the same composition problem: there’s no problem with writing\n"},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"model3.predict(model2.predict(model1.predict(X)))"}]},{"type":"text","value":", or\n"},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"model3.predict(model2.predict(X) + model1.predict(X))"}]},{"type":"text","value":", etc. We can easily\nbuild a larger model from smaller functions when we’re programming the forward\ncomputations, and so that’s exactly the API that reverse-mode\nautodifferentiation was invented to offer."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The key idea behind Thinc is that it’s possible to just fix the API problem\ndirectly, so that models can be composed cleanly both forwards and backwards.\nThis results in an interestingly different developer experience: the code is far\nmore explicit and there are very few details of the framework to consider.\nThere’s potentially more flexibility, but potentially lost performance and\nsometimes more opportunities to make mistakes."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"We don’t want to suggest that Thinc’s approach is uniformly better than a\nhigh-performance computational graph engine such as PyTorch or Tensorflow. It\nisn’t. The trick is to use them together: you can use PyTorch, Tensorflow or\nsome other library to do almost all of the actual computation, while doing\nalmost all of your programming with a much more transparent, flexible and\nsimpler system. Here’s how it works."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"No (explicit) computational graph – just higher order functions"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The API design problem we’re facing here is actually pretty basic. We’re trying\nto compute two values, but before we can compute the second one, we need to pass\ncontrol back to the caller, so they can use the first value to give us an extra\ninput. The general solution to this type of problem is a "},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"callback"}]},{"type":"text","value":", and in\nfact a callback is exactly what we need here."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Specifically, we need to make sure our model functions return a result, and then\na callback that takes a gradient of outputs, and computes the corresponding\ngradient of inputs."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python"},"children":[{"type":"text","value":"def forward(X: InT) -> Tuple[OutT, Callable[[OutT], InT]]:\n    Y: OutT = _do_whatever_computation(X)\n\n    def backward(dY: OutT) -> InT:\n        dX: InputType = _do_whatever_backprop(dY, X)\n        return dX\n\n    return Y, backward\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"To make this less abstract, here are two "},{"type":"element","tagName":"a","properties":{"href":"/docs/api-layers"},"children":[{"type":"text","value":"layers"}]},{"type":"text","value":" following\nthis signature. For now, we’ll stick to layers that don’t introduce any\ntrainable weights, to keep things simple."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python","title":"reduce_sum layer"},"children":[{"type":"text","value":"def reduce_sum(X: Floats3d) -> Tuple[Floats2d, Callable[[Floats2d], Floats3d]]:\n    Y = X.sum(axis=1)\n    X_shape = X.shape\n\n    def backprop_reduce_sum(dY: Floats2d) -> Floats3d:\n        dX = zeros(X_shape)\n        dX += dY.reshape((dY.shape[0], 1, dY.shape[1]))\n        return dX\n\n    return Y, backprop_reduce_sum\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python","title":"Relu layer"},"children":[{"type":"text","value":"def relu(inputs: Floats2d) -> Tuple[Floats2d, Callable[[Floats2d], Floats2d]]:\n    mask = inputs >= 0\n    def backprop_relu(d_outputs: Floats2d) -> Floats2d:\n        return d_outputs * mask\n    return inputs * mask, backprop_relu\n\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Notice that the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"reduce_sum"}]},{"type":"text","value":" layer’s output is a different shape from its input.\nThe forward pass runs from input to output, while the backward pass runs from\ngradient-of-output to gradient-of-input. This means that we’ll always have two\nmatching pairs: "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"(input_to_forward, output_of_backprop)"}]},{"type":"text","value":" and\n"},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"(output_of_forward, input_of_backprop)"}]},{"type":"text","value":". These pairs must match in type. If our\nfunctions obey this invariant, we’ll be able to write\n"},{"type":"element","tagName":"a","properties":{"href":"/docs/api-layers#combinators"},"children":[{"type":"text","value":"combinator functions"}]},{"type":"text","value":" that can wire together\nlayers in standard ways."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The most basic way we’ll want to combine layers is a feed-forward relationship.\nWe call this combinator "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"chain"}]},{"type":"text","value":", after the chain rule:"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python","title":"Chain combinator"},"children":[{"type":"text","value":"def chain(layer1, layer2):\n    def forward_chain(X):\n        Y, get_dX = layer1(X)\n        Z, get_dY = layer2(Y)\n\n        def backprop_chain(dZ):\n            dY = get_dY(dZ)\n            dX = get_dX(dY)\n            return dX\n\n        return Z, backprop_chain\n\n    return forward_chain\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"We can use the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"chain"}]},{"type":"text","value":" combinator to build a function that runs our "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"reduce_sum"}]},{"type":"text","value":"\nand "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"relu"}]},{"type":"text","value":" layers in succession:"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python"},"children":[{"type":"text","value":"chained = chain(reduce_sum, relu)\nX = uniform((2, 10, 6)) # (batch_size, sequence_length, width)\ndZ = uniform((2, 6))    # (batch_size, width)\nZ, get_dX = chained(X)\ndX = get_dX(dZ)\nassert dX.shape == X.shape\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Our "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"chain"}]},{"type":"text","value":" combinator works easily because our layers return callbacks. The\ncallbacks ensure that there is no distinction in API between the outermost layer\nand a layer that’s part of a larger network. We can see this clearly by\nimagining the alternative, where the function expects the gradient with respect\nto the output along with its input:"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python","title":"Problem without callbacks","highlight":"15-19"},"children":[{"type":"text","value":"def reduce_sum_no_callback(X, dY):\n    Y = X.sum(axis=1)\n    X_shape = X.shape\n    dX = zeros(X_shape)\n    dX += dY.reshape((dY.shape[0], 1, dY.shape[1]))\n    return Y, dX\n\ndef relu_no_callback(inputs, d_outputs):\n    mask = inputs >= 0\n    outputs = inputs * mask\n    d_inputs = d_outputs * mask\n    return outputs, d_inputs\n\ndef chain_no_callback(layer1, layer2):\n    def chain_forward_no_callback(X, dZ):\n        # How do we call layer1? We can't, because its signature expects dY\n        # as part of its input – but we don't know dY yet! We can only\n        # compute dY once we have Y. That's why layers must return callbacks.\n        raise CannotBeImplementedError()\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"reduce_sum"}]},{"type":"text","value":" and "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"relu"}]},{"type":"text","value":" layers are easy to work with, because they don’t\nintroduce any parameters. But networks that don’t have any parameters aren’t\nvery useful. So how should we handle them? We can’t just say that parameters are\njust another type of input variable, because that’s not how we want to use the\nnetwork. We want the parameters of a layer to be an internal detail – "},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"we don’t\nwant to have to pass in the parameters on each input"}]},{"type":"text","value":"."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Parameters need to be handled differently from input variables, because we want\nto specify them at different times. We’d like to specify the parameters once\nwhen we create the function, and then have them be an internal detail that\ndoesn’t affect the function’s signature. The most direct approach is to\nintroduce another layer of closures, and make the parameters and their gradients\narguments to the outer layer. The gradients can then be incremented during the\nbackward pass:"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python"},"children":[{"type":"text","value":"def Linear(W, b, dW, db):\n    def forward_linear(X):\n\n        def backward_linear(dY):\n            dW += dY.T @ X\n            db += dY.sum(axis=0)\n            return dY @ W\n\n        return X @ W.T + b, backward_linear\n    return forward_linear\n\nn_batch = 128\nn_in = 16\nn_out = 32\nW = uniform((n_out, n_in))\nb = uniform((n_out,))\ndW = zeros(W.shape)\ndb = zeros(b.shape)\nX = uniform((n_batch, n_in))\nY_true = uniform((n_batch, n_out))\n\nlinear = Linear(W, b, dW, db)\nY_out, get_dX = linear(X)\n\n# Now we could calculate a loss and backpropagate\ndY = (Y_out - Y_true) / Y_true.shape[0]\ndX = get_dX(dY)\n\n# Now we could do an optimization step like\nW -= 0.001 * dW\nb -= 0.001 * db\ndW.fill(0.0)\ndb.fill(0.0)\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"While the above approach would work, handling the parameters and their gradients\nexplicitly will quickly get unmanageable. To make things easier, we need to\nintroduce a "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Model"}]},{"type":"text","value":" class, so that we can "},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"keep track of the parameters,\ngradients, dimensions"}]},{"type":"text","value":" and other attributes that each layer might require."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The most obvious thing to do at this point would be to introduce one class per\nlayer type, with the forward pass implemented as a method on the class. While\nthis approach would work reasonably well, we’ve preferred a slightly different\nimplementation, that relies on composition rather than inheritance. The\nimplementation of the "},{"type":"element","tagName":"a","properties":{"href":"/docs/api-layers#linear"},"children":[{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Linear"}]},{"type":"text","value":" layer"}]},{"type":"text","value":" provides a good\nexample."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Instead of defining a subclass of "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"thinc.model.Model"}]},{"type":"text","value":", the layer provides a\nfunction "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Linear"}]},{"type":"text","value":" that constructs a "},{"type":"element","tagName":"a","properties":{"href":"/docs/api-model"},"children":[{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Model"}]},{"type":"text","value":" instance"}]},{"type":"text","value":", passing\nin the function "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"forward"}]},{"type":"text","value":" in "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"thinc.layers.linear"}]},{"type":"text","value":":"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python"},"children":[{"type":"text","value":"def forward(model: Model, X: InputType, is_train: bool):\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The function receives a "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"model"}]},{"type":"text","value":" instance as its first argument, which provides\nyou access to the dimensions, parameters, gradients, attributes and layers. The\nsecond argument is the input data, and the third argument is a boolean that lets\nlayers run differently during training and prediction – an important requirement\nfor layers like dropout and batch normalization."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"As well as the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"forward"}]},{"type":"text","value":" function, the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Model"}]},{"type":"text","value":" also lets you pass in a function\n"},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"init"}]},{"type":"text","value":", allowing us to support "},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"shape inference"}]},{"type":"text","value":"."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python","title":"Linear","highlight":"3-4"},"children":[{"type":"text","value":"model = Model(\n    \"linear\",\n    forward,\n    init=init,\n    dims={\"nO\": nO, \"nI\": nI},\n    params={\"W\": None, \"b\": None},\n)\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"We want to be able to define complex networks concisely, passing in "},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"only\ngenuine configuration"}]},{"type":"text","value":" — we shouldn’t have to pass in a lot of variables whose\nvalues are dictated by the rest of the network. The more redundant the\nconfiguration, the more ways the values we pass in can be invalid. In the\nexample above, there are many different ways for the inputs to "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Linear"}]},{"type":"text","value":" to be\ninvalid: the "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"W"}]},{"type":"text","value":" and "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"dW"}]},{"type":"text","value":" variables could be different shapes, the size of "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"b"}]},{"type":"text","value":"\ncould fail to match the first dimension of "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"W"}]},{"type":"text","value":", the second dimension of "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"W"}]},{"type":"text","value":"\ncould fail to match the second dimension of the input, etc. With inputs like\nthese, there’s no way we can expect functions to validate their inputs reliably,\nleading to unpredictable logic errors that make the calling code difficult to\ndebug."}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"In a network with two "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Linear"}]},{"type":"text","value":" layers, only one dimension is an actual\nhyperparameter. The input size to the first layer and the output size of the\nsecond layer are both "},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"determined by the shape of the data"}]},{"type":"text","value":". The only choice\nto make is the number of “hidden units”, which will determine the output size of\nthe first layer and the input size of the second layer. So we want to be able to\nwrite something like this:"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python"},"children":[{"type":"text","value":"model = chain(Linear(nO=n_hidden), Linear())\n"}]}]},{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"… and have the missing dimensions "},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"inferred later"}]},{"type":"text","value":", based on the input and\noutput data. In order to make this work, we need to specify initialization logic\nfor each layer we define. For example, here’s the initialization logic for the\n"},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"Linear"}]},{"type":"text","value":" and "},{"type":"element","tagName":"code","properties":{},"children":[{"type":"text","value":"chain"}]},{"type":"text","value":" layers:"}]},{"type":"text","value":"\n"},{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["language-python"],"lang":"python","title":"Initialization logic"},"children":[{"type":"text","value":"from typing import Optional\nfrom thinc.api import Model, glorot_uniform_init\nfrom thinc.types import Floats2d\nfrom thinc.util import get_width\n\ndef init(model: Model, X: Optional[Floats2d] = None, Y: Optional[Floats2d] = None) -> None:\n    if X is not None:\n        model.set_dim(\"nI\", get_width(X))\n    if Y is not None:\n        model.set_dim(\"nO\", get_width(Y))\n    W = model.ops.alloc2f(model.get_dim(\"nO\"), model.get_dim(\"nI\"))\n    b = model.ops.alloc1f(model.get_dim(\"nO\"))\n    glorot_uniform_init(model.ops, W.shape)\n    model.set_param(\"W\", W)\n    model.set_param(\"b\", b)\n"}]}]}],"data":{"quirksMode":false}},"frontmatter":{"title":"Concept and Design","teaser":"Thinc's conceptual model and how it works","next":"/docs/install"}},"allMarkdownRemark":{"nodes":[{"fields":{"slug":"/docs/api-initializers"},"frontmatter":{"title":"Initializers"}},{"fields":{"slug":"/docs/api-config"},"frontmatter":{"title":"Config & Registry"}},{"fields":{"slug":"/docs/api-loss"},"frontmatter":{"title":"Loss Calculators"}},{"fields":{"slug":"/docs/api-model"},"frontmatter":{"title":"Model"}},{"fields":{"slug":"/docs/api-optimizers"},"frontmatter":{"title":"Optimizers"}},{"fields":{"slug":"/docs/api-schedules"},"frontmatter":{"title":"Schedules"}},{"fields":{"slug":"/docs/api-types"},"frontmatter":{"title":"Types & Dataclasses"}},{"fields":{"slug":"/docs/api-util"},"frontmatter":{"title":"Utilities & Extras"}},{"fields":{"slug":"/docs/backprop101"},"frontmatter":{"title":"Backpropagation 101"}},{"fields":{"slug":"/docs/concept"},"frontmatter":{"title":"Concept and Design"}},{"fields":{"slug":"/docs/"},"frontmatter":{"title":"Introduction"}},{"fields":{"slug":"/docs/install"},"frontmatter":{"title":"Installation & Setup"}},{"fields":{"slug":"/docs/usage-config"},"frontmatter":{"title":"Configuration System"}},{"fields":{"slug":"/docs/usage-frameworks"},"frontmatter":{"title":"PyTorch, TensorFlow & MXNet"}},{"fields":{"slug":"/docs/usage-models"},"frontmatter":{"title":"Defining and Using Models"}},{"fields":{"slug":"/docs/usage-sequences"},"frontmatter":{"title":"Variable-length sequences"}},{"fields":{"slug":"/docs/usage-training"},"frontmatter":{"title":"Training Models"}},{"fields":{"slug":"/docs/usage-type-checking"},"frontmatter":{"title":"Type Checking"}},{"fields":{"slug":"/docs/api-layers"},"frontmatter":{"title":"Layers"}},{"fields":{"slug":"/docs/api-backends"},"frontmatter":{"title":"Backends & Math"}}]},"headerTopRight":{"childImageSharp":{"fluid":{"base64":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAeCAYAAAAsEj5rAAAACXBIWXMAAAsTAAALEwEAmpwYAAAE5ElEQVRIx53Uf1DTZRwH8Pc2yj+6K//oyq46vbzqss660zO6FMW8y0KZBekhJGE/+akJIZAwkCnboo1tbEO+m9v4NVB+OYYwhuD4LUj+CAQFxnJM9t2GRhooyp7O/VNKf8jed597nr9e93mez3MP8J/ww8U4GiaBSFXP5GaUIoH89XTF8BVLxdUBoh26PN9H2z0z92c8t+dm7rdfc5DSPnthcY8Nlf12v2rrbSyIF/xKju8RjFgEL5XXdKRr+waIpuvSfHHPZU9Z7++keWjMo6u7+CBHYiYCmTmaJzWDLzOz/h8MEyGHLWQd3XwUvB3CmuyIfFLVMjDX1j1GznRcI2c6RzynW6940pNOzif/UEzS4nX+KTElSIkpZeZdGoT40uCjoGCPBPwDGmZmAgVBTuWX4t2y6bayPs9A88j8hYYhz2DLKGkt65+PD5KS6I/zXPFB+ctitorx3YZcBnVlGJHGHuajYIQY06IWGFuHodaeha2gvf96QTuxKto8FkXbg0l111zTz/q7X/v/QmI3iroPb9LiCCHg6cxeaH/nb491GCnBJPc07LxGjPIbnnEcNqjoLMMfdJbhjiOzjvyZXU+qvi0ncYHy7rQt8ncSNknBKWz2c884EWjoRGbPxeQF93jrWDum8814UHQeLoUZd8I1flPJ1W+6D1azb6bUZNcl1iYRNQd7PypAUXLVU3sJDUIIK7X7ojbE0EkWgFaqHZN8I2yyVowV9+DeTgo3kyrh/qkK02nVKN2tAz3S+Ar3ixK/Do4eu8j0q3FdF84FVbeRdzN09xaATk4d6CwDnBl6796VrocrrZbhSK3y6+XKcC21dMV4YeWaicRyWKPLN9T82kCHqprIGwc0sx9uFizs8PFMpdTAEV0OS7gSljAK4zuppYQQuLcrVzh2qKb/DqLIWfaxu1sDBSR9i9iIJ8lYgByWPSpYIpSIMklBB1FwslXrncFKMsVWEet2iii3KaR40njB3UpvXQ+hYA9Vwsl+WKoMF1s5aA85HlXjL3w4HCwqdlsT3KvlcK+SwemvwK1VMtij1HAVnEL/pvxnyXvCxYE3XCZM0M2YmDoDW0Yxw2rQw5pdtsoSqZKMrZetHAuULx60DzVikphh6zDAVnsqwiqqEI0naJZPfVMEXZzab3FHpk1wkl7cIwT2YeMhm8kQZ6Eq4EovxzZEIne/xocOaRPsTtOSG3bTCltDHd5CCD5/LhZZ6zjLc7NO7ls86DR51xt2E2zGOmb+vuMswR4ZBFGKYF6U4u6iQW+5TbBPNDGss2qM3K/CrDESNYQwue9zlsKXjDuUGKeVsDgob4071QgQaplCRiR8Bi0PQZqChS5cMmrTLovo7QC12If9L6jCOK1i2mgK12mKc5U+TpL6zx3REOLjkWkVRq0qBiFazBLtB/oTiuEtB7Rka3HzpM/gxIwSrfUFDMtVJYBcxK/lxXz6mfSmT+CF88dQX5SPCp4QjQV5rF5Syfrk5VR0rRG96BOYiu3gBnCRuTIJnBfiwU/XeX+aJiT6NpQf8Ta4G7mMzNcSkfF87EsCbuXBEpfz9RIy5xvIj5SBF57P4u2SgBcm4eSESYmooFHNSynxDRRTDRBTjQzpibOQVppX54TmlWcH5wYcXsfxDcyT6iEpaYG0toMhPWkGP0qB7HApstYewj+BxMBZvJr8CAAAAABJRU5ErkJggg==","aspectRatio":0.6756756756756757,"src":"/static/a592cfceebdbf105bac40baa898f12a9/53f65/landing_top-right.png","srcSet":"/static/a592cfceebdbf105bac40baa898f12a9/a90ce/landing_top-right.png 125w,\n/static/a592cfceebdbf105bac40baa898f12a9/002c1/landing_top-right.png 250w,\n/static/a592cfceebdbf105bac40baa898f12a9/53f65/landing_top-right.png 500w,\n/static/a592cfceebdbf105bac40baa898f12a9/f26e3/landing_top-right.png 750w,\n/static/a592cfceebdbf105bac40baa898f12a9/5d2c5/landing_top-right.png 1000w,\n/static/a592cfceebdbf105bac40baa898f12a9/6050d/landing_top-right.png 1200w","sizes":"(max-width: 500px) 100vw, 500px"}}}},"pageContext":{"slug":"/docs/concept"}},"staticQueryHashes":["34836940","3699375715"]}