We received a number of responses after posting the video above on LinkedIn (find the original on Youtube). The video shows how an edge device equipped with a camera can be used to recognize hand-written digits and characters. The crux of the video is the demonstration of modular AI deployment. In the first ~20 seconds of the video we demonstrate that the deployed model is able to recognize digits, but fails on characters. Next, the video demonstrates an Over the Air (OtA) update of the deployed model such that the device is also able to recognize characters. This is made possible by installing a small WebAssembly microcontainer (or a virtual CPU really) on the device that supports modular deployment of trained AI models in WebAssembly format. In this post we will detail how this video was made step by step.
The crux of the video is the demonstration of modular AI deployment: In the first 20 seconds of the video we demonstrate that the deployment model is able to recognize digits, but fails on characters. Next, the video demonstrates an Over the Air (OtA) update of the deployed model such that the device is also able to recognize characters.
Decomposing the video: the Making of…
Let’s decompose the video chronologically and discuss the hardware, the AI models, and the tooling involved:
- 0:00: The video starts with a shot of the edge device. The device involved is a simple Raspberry Pi, equipped with an off-the-shelf camera and screen. This device is relatively large compared to many other (I)IoT devices. However, what is shown in the video can — depending on the size of the trained AI model — also be done on small MCUs. Note that the cable that can be seen in the video solely provides the power for the device.
- 0:07: At this point in the video we start demonstrating the performance of the AI model running on the device. The device is equipped with a simple digit recognition model (see below) in Scailable
wasiformat that is running in a small WebAssembly runtime. For this video we used the Scailable
golang-mediumruntime (which consumes a bit under 4Mb). Note that the
c-minimalruntime which provides the same modular AI deployment functionality (but without logging, OtA, and model testing) weighs in at only 64Kb. A custom application running on the device takes pictures ever 1/10th of a second and feeds these images to the runtime containing the AI model.
- 0:09: Here we clearly demonstrate the performance of the model. This model is trained on the famous MNIST dataset. Model training details can be found in this Jupyter Notebook. This same model, using the Scailable web node (i.e., technically a
wasiruntime implemented in
.js) can be tested here. As can be seen in the notebook, the trained model is stored in
ONNXformat and subsequently, using the Scailable Platform, compiled to WebAssembly. The resulting binary weighs in at 313Kb.
- 0:15: At this point it becomes clear that while the digits
3, are correctly identified, the character
Efails. Note that we render a
?when the inference is poor: whenever the recognition certainty of the model is below .6 for all digits we show the question mark. This logic is implemented in the custom application that is also responsible for feeding the images to the model.
- 0:26: After establishing the poor performance on characters, we move the Scailable Platform. We briefly demonstrate our login to the platform, however, our intend was not to show the UI of the Platform but rather the functionality of seamless OtA deployment…
- 0:29: …Things progress rapidly at this point in the video. First we show how the Scailable Platform allows users to manage their models. All the visible models have been converted to WebAssembly automatically by either using the
sclblpypackage or by uploading a trained model in
ONNXformat to the Platform. Uploaded models can immediately be tested and consumed as a
RESTendpoint (in which case inference is executed in the Scailable cloud). The trained eye will notice that an alternative model for both digit and character recognition is available in the model list (see this Jupyter Notebook for training details). After compiling, this WebAssembly binary weighs in at 335Kb. The screenshot below demonstrates the model list.
- 0:33: After looking at the various models, we move to the Deploy tab of the Platform. Next, the demo device is selected (left column), the new character and digit recognition model is selected (middle column), and we press the “Deploy to Device” button (see screenshot below). At this point, the selected model is added to the assignments of the demo device. Since both models have the same alias (
charsin this case), this newly assigned model will overwrite the previously assigned model and will be accessible inside the runtime installed on the device using the model alias.
- 0:46: Here we move back to the device. Admittedly, we hide the fact that the runtime will only check for new assignments once every
xminutes (obviously this is configurable). We simply make sure the new assignment is retrieved by the runtime before moving forward. At this point in the video the new WebAssembly binary — this time encoding a model that is able to recognize both digits and characters — is received by the device (or technically, by the runtime) Over the Air. Note that deployment of this newly trained model does not require a restart of the device. Deployment is fully modular and sandboxed.
- 0:51: With the new model deployed on the device, effectively the functionality of the device has been changed. The video demonstrates how the OtA update allows the device to recognize both the digits (
3) and the written character
With the Scailable Platform, fully modular, efficient, and secure deployment of trained AI models on virtually any edge device (or in the cloud) is easy and extremely efficient. We think that efficient edge deployment of trained AI models is a necessity: in many use-cases it is simply not feasible to send data back to a central cloud. WebAssembly provides a portable compilation target that enables seamless deployment in micro runtimes. We have been able to deploy DNNs on ESP32 devices using this setup.