Wrapping models from other frameworks is a core use case for Thinc: we want to
make it easy for people to write spaCy components using
their preferred machine learning solution. We expect a lot of code-bases will
have similar requirements. As well as wrapping whole models, Thinc lets you
call into an external framework for just part of your model: you can have a
model where you use PyTorch just for the transformer layers, using “native”
Thinc layers to do fiddly input and output transformations and add on
task-specific “heads”, as efficiency is less of a consideration for those parts
of the network.
Thinc uses a special class, Shim, to hold references
to external objects. This allows each wrapper space to define a custom type,
with whatever attributes and methods are helpful, to assist in managing the
communication between Thinc and the external library. The
Model class holds shim instances in a separate
list, and communicates with the shims about updates, serialization, changes of
device, etc.
The wrapper will receive each batch of inputs, convert them into a suitable form
for the underlying model instance, and pass them over to the shim, which will
manage the actual communication with the model. The output is then passed
back into the wrapper, and converted for use in the rest of the network. The
equivalent procedure happens during backpropagation. Array conversion is handled
via the DLPack standard wherever possible, so
that data can be passed between the frameworks without copying the data back
to the host device unnecessarily.
To see wrapped models in action, check out the following examples:
Intro to Thinc · Everything you need to know to get started. Composing and training a model on the MNIST data, using config files, registering custom functions and wrapping PyTorch, TensorFlow and MXNet models.
The PyTorchWrapper and
TensorFlowWrapper layers allow you to
easily use your predefined models in Thinc, as part or all of your network. For
simple models that accept one array as input and return one array as output, all
you need to do is create the PyTorch/TensorFlow layer and pass it into the
wrapper. The wrapper model will behave like any other Thinc layer.
PyTorch Examplefrom thinc.api import PyTorchWrapper, chain, Linear
import torch.nn
model = chain( PyTorchWrapper(torch.nn.Linear(16,8)),# 🚨 PyTorch goes (nI, nO)! Linear(4,8))
X = model.ops.alloc2f(1,16)# make a dummy batch
model.initialize(X=X)
Y, backprop = model(X, is_train=True)
dX = backprop(Y)
TensorFlow Examplefrom thinc.api import TensorFlowWrapper, chain, Linear
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = chain( TensorFlowWrapper(Sequential([Dense(8, input_shape=(16,))])), Linear(4,8))
X = model.ops.alloc2f(1,16)# make a dummy batch
model.initialize(X=X)
Y, backprop = model(X, is_train=True)
dX = backprop(Y)
In theory, you can also chain together layers and models written in PyTorch
and TensorFlow. However, this is likely a bad idea for actual production use,
especially since TensorFlow tends to hog the GPU. It could come in handy during
development, though, for instance if you need to port your models from one
framework to another.
Frankenmodelfrom thinc.api import PyTorchWrapper, TensorFlowWrapper, chain, Linear
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import torch.nn
model = chain(# 🚨 probably don't do this in production PyTorchWrapper(torch.nn.Linear(16,8)), TensorFlowWrapper(Sequential([Dense(4, input_shape=(8,))])), Linear(2,4))
model.initialize(X=model.ops.alloc2f(1,16))
For more complex cases, you can control the way data is passed into the wrapper
using the convert_inputs and convert_outputs callbacks. Both callbacks have
input signatures like normal forward functions and return a tuple with their
output and a callback to handle the backward pass. However, the converters will
send and receive different data during the backward pass.
Forward
Backward
Input
Thinc Framework
Framework Thinc
Output
Framework Thinc
Thinc Framework
To convert arrays, you can use the xp2 and 2xp utility functions which
translate to and from numpy and cupy arrays.
The ArgsKwargs object is a little dataclass that
represents the tuple (args, kwargs). Whatever you put in the ArgsKwargs you
return from your convert_inputs function will be passed directly into
PyTorch/TensorFlow/etc. In the backward pass, the shim will return a callback
with the gradients from PyTorch, in matching positions on another ArgsKwargs
object, and you’ll then return an object that matches the original input, to
pass the gradient back down the Thinc model.
For the output, the converter will receive a tuple that contains the original
input (i.e. whatever was passed into convert_inputs, and the output from the
PyTorch/TensorFlow/etc. layer. The input is provided because the output may not
contain all the information you need to manage the conversion. The
convert_output function should return the output object for the rest of the
network, converting via the 2xp helpers as necessary, along with the
un-convert callback.
The un-convert callback will receive a gradient object from the Thinc layer
above it in the network, and will return an ArgsKwargs object that will be
passed into
torch.autograd.backward
and the TensorFlow model and
tensorflow.GradientTape.watch
respectively.
A tuple of the PyTorch outputs, and a callback to un-convert the gradient for PyTorch that takes the output gradients from Thinc and returns the output gradients for PyTorch.
Thinc also includes some more specific PyTorch layers for common use-cases. The
PyTorchLSTM layer creates and wraps the
torch.nn.LSTM class,
making creation particularly easy. The
PyTorchRNNWrapper provides a little more
flexibility, allowing you to pass in a custom sequence model that has the same
inputs and output behavior as a
torch.nn.RNN object.
If you use the PyTorchWrapper for part of your network while using Thinc’s
layers for other parts, you may find yourself running out of GPU memory
unexpectedly. This can occur because both PyTorch and cupy reserve their own
internal memory pools, and the two libraries do not communicate with each
other. When PyTorch needs more memory, it can only ask the device – so you may
get an out-of-memory error even though cupy’s pool has plenty of spare memory
available.
The best solution to this problem is to reroute the memory requests so that
only one library is in charge. Specifically, cupy offers a
cupy.cuda.set_allocator
function, which should allow a custom allocator to be created that requests its
memory via PyTorch. Thinc provides a handy shortcut for this via the
set_gpu_allocator helper function.