Using AI (or ML) models in practice has been a challenge for years. Yes, many nice examples do exist, but in our experience, most companies first face the challenge of bringing their data together, then the challenge of building and validating useful models, and finally the challenge of deploying these models wherever they are most needed (be it in-house, in the cloud, or on the edge).
However, each of these challenges has become easier and easier over the last few years. None of them are exactly trivial yet, but tools abound for each step of the process. Even deploying relatively complex convolutional neural networks in the browser turns out to be pretty straightforward: let me share how.
1. Training the model: handwritten digit recognition
To demonstrate CNN deployment in the browswer we pick a simple task that CNNs often excel at: image recognition. For the sake of demonstration, we focus on the well-known MNIST dataset. This tutorial shows how to use Microsoft’s Cognitive Tookit to train a model that has pretty impressive performance. However, you don’t really need to train the model yourself; it is available for direct download in ONNX format in the ONNX Model zoo.
ONNX is an open format built to represent machine learning models. It is a great unified way of storing trained models (including pre- and post- processing) created using different tools (pyTorch, tensorflow, sklearn, etc.).
2. Model conversion: WebAssembly for easy deployment.
While ONNX.js (and similarly for ONNX runtimes that are available for other edge devices) might seem easier, by virtue of skipping the transpilation to WebAssembly step, ONNX runtimes are often both magnitudes larger, and slower in execution, than WASM runtimes (see Wasmer for a collection). We refer the interested readers to this article for more details comparing the ONNX.js and WebAssembly runtimes. In any case, in our experience the (various) ONNX runtimes that are currently available are much harder maintain and govern in actual (edge) production environments than the simple, stand-alone, WebAssembly binaries.
Scailable offers automatic off-the-shelf conversion of (most of) ONNX to .WASM for less than a dollar (and the first 100 conversions are free). A detailed tutorial describing the conversion process can be found here. Effectively, it is as simple as uploading the .ONNX file to a web form.
Scailable currently supports ONNX 1.3 to .WASM; support for ONNX 1.7 (including IR 5 and OPS 13) is coming up shortly.
After conversion, we end up with a pretty small (472 Kb) .WASM executable that can be used to carry out the digit recognition.
3. Model deployment in the browser (with a fallback).
After conversion to WebAssembly the actual deployment of the CNN is a piece of cake. Most browsers support running WebAssembly off-the-shelf and the exact functions exported by the.WASM created by Scailable are described here. Thus, coding up the
Putting it all together gives us the following implementation:
The implementation is surprisingly lean and fast, and the model’s performance is on par. You can try it out here (and obviously view the source of that page to see how the UX was done).
Intermediate representations like ONNX, and effective, fast, and portable compilation targets like WebAssembly, are slowly but surely changing the way in which we deploy AI on various targets. This is pretty awesome. For us, the process described above is appealing for various reasons:
- Performance: WebAssembly is super fast, and runs in a tiny container. It gives near-native performance when computing predictions and a minimal memory footprint.
- Portability: ONNX allows storing any model (including pre-and post-processing pipelines) from pretty much any modern data analysis tool, and subsequently WebAssembly binaries can be run effectively anywhere.
- Privacy: We often deploy models in situations that are privacy sensitive; in these cases we want the input data for the model to remain on premise (e.g., in the hospital). Instead of sending sensitive data around, we send models around.
- Governance: Hugely advantageous over ONNX runtime (or docker containers used to create model inferences) is that fact that the stand-alone .WASM binaries are easily version controlled and checked. We can guarantee the validity of the model’s predictions without having to worry about its surrounding (the python version, the libraries, etc.).
I hope this small tutorial (and the references therein) are useful for others to get started!