repair_drone.png

Silo surface maintenance using drones

 ·

Imagine having to maintain hundreds of concrete, metal, or plastic silos located in your nearby harbor. As time progresses, the surfaces of these silos, although often cumbersome to inspect due to their size, will require maintenance: as cracks emerge due to wear-and-tear, early detection and repair will significantly drive down maintenance costs and prolong the silo’s lifetime.

The scenario above is exactly the scenario we are facing as part of a joint European INTERREG project called SPEED [1]. Our solution is conceptually easy, but challenging to build: we are endowing a fleet of drones with the ability to autonomously fly around the silos, detect irregularities in their surfaces (we will call these “cracks”), and fill up these irregularities as they go along.

In this post, I will detail how the crack detection logic itself was built (yes, using AI), and how we are able to deploy and monitor the crack-detection model over various devices. Thus, this post will provide an applied overview of deploying and monitoring AI on the edge.

In this post I will tackle the following topics in sequence:

  1. Model training using TensorFlow.
  2. Model optimization and transpilation to WebAssembly to allow for efficient deployment over multiple devices.
  3. Model deployment on a single device: the WebAssembly runtime.
  4. Model monitoring: orchestrating the deployment of a single AI model over a fleet of devices.

Model training using TensorFlow

Although there is more to the full silo surface maintenance solution, in this post I will focus solely on the crack detection model (and its deployment).

For the crack detection itself, we trained a neural network using TensorFlow. The training data is publicly available and called SDNET2018. It is described by the authors as follows:

“SDNET2018 is an annotated image dataset for training, validation, and benchmarking of artificial intelligence based crack detection algorithms for concrete. SDNET2018 contains over 56,000 images of cracked and non-cracked concrete bridge decks, walls, and pavements. The dataset includes cracks as narrow as 0.06 mm and as wide as 25 mm. The dataset also includes images with a variety of obstructions, including shadows, surface roughness, scaling, edges, holes, and background debris.”

The convolutional neural net that was trained on the (resized to 64×64 pixels) images attains ~85% accuracy on our validation set: this is deemed sufficient for now, although likely we will be improving the model (and deploying the updated model) in the future. When saved in standard .model format (Keras’s standard format) and visualized using Netron the model looks like this [2]:

Image for post

The crack detection CNN model (Image by author).

Note that the size of this model — in its default form — is a bit over 5Mb (this might be important depending on the deployment target; see below).

The performance of the model is shown below: evaluated on a slab of concrete, the model correctly identifies almost all of the areas that contain cracks (in red in the image below on the right).

Image for post

 

Image of the surface as captured by the camera (left), and the cracks identified by the trained CNN (right, in red). (Image by the author).

Model optimization and transpiling

Although one might be tempted to simply run the model in a Docker container in the cloud and submit the images captured by the drone to the model using REST, we decided against this approach: the drones will not have a continuous and stable internet connection and the overhead of sending data back-and-forth to the cloud make this solution simply infeasible. Rather, we aim to deploy the model directly on the drone itself. This makes that the drones can autonomously, without sending data back-and-forth, scan a silo and detect the cracks in the surface (note that the drones are able to fill up the cracks as well — to do so they use the little “elephant’s trunk” that is clearly visible in the image at the top of this post).

The drones we are using contain a simple Raspberry Pi 3b (although we have also been able to run the same model on an ESP32). Hence, we were looking for a way to deploy the trained TensorFlow model efficiently on the (relatively small) hardware available without the need to rebuild the model in a target specific project/language. Furthermore, we wanted the model’s deployment to be modular: as we will likely develop new models in the future we would like to update the models to our drones without significant development efforts.

Image for post

An impression of the Raspberry Pi 3b Single Board Computer used (image by author).

We selected the Scailable platform for deployment. The process involved is simple:

  1. First, we stored theTensorFlow model in ONNX format. This is easy enough using the tf2onnx package.
  2. Next, we uploaded the resulting .ONNX file (which, after some optimization operations present in the onnx package, weights in at 2.6Mb) to the Scailable platform.
  3. Scailable’s toolchains automatically transpile the .ONNX to WebAssembly: a binary instruction format for a stack-based virtual machine. The process used by Scailable creates a .wasm binary that exposes a number of functions which can be called by any WebAssembly runtime. The full instruction weights in at 2.5Mb; this is an extremely efficient representation of the trained model that can be executed in extremely small runtimes (or containers if you wish).
  4. For deployment on the Raspberry Pi we used the Scailable c runtime: the smallest instance of this runtime only requires 64Kb of memory and can easily be integrated in any existing c project.

Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications. (Check it out here).

In the next section, we will discuss the runtime in a bit more detail.

Model deployment on a single device: the WebAssembly runtime.

Although by transpiling the model to a .wasm binary we now have an efficient, and device independent, representation of the model (effectively an executable that can be run efficiently anywhere; see this discussion), we still need to make sure that it runs on our Raspberry Pi. To do so, we used the (proprietary) c runtime provided by Scailable which we included in the larger application that is running on the device and coordinating the drones actions.

The runtime itself is extremely small (so no hassle to include), but the benefits are huge:

  • The runtime makes over-the-air, modular updates of the models to the device possible. So, anytime we train a better model, we can safely deploy it on the maintenance drone without any rebuilding.
  • The runtime sandboxes the model’s binary making it safe and fault-proof.
  • The runtime is extremely easy to use after inclusion in the project: one can simple call sclbl.predict("model_alias") to generate inferences.

Thus, including the Scailable runtime in the drone’s main application project gives us a fully modular and secure way of deploying the CNN (and any future versions thereof).

Although the specific c runtime used in the project is proprietary, both the WebAssembly binaries and a number of the runtimes used by Scailable are very well documented and fully open.

Model monitoring: orchestrating the deployment of a single AI model over a fleet of devices.

The modular deployment of the .wasm CNN detecting cracks on a single drone really is only the beginning. The Scailable toolchain and runtimes are part of a larger platform which is, schematically, illustrated below:

Image for post

The full Scailable platform (image by author).

We detail each of the components in turn:

  • The conversion toolchains: The conversion toolchains automatically transpile trained models (and pipelines!) to .wasm. There are toolchains available for any model or pipeline stored in onnx format, as well as specific toolchains for sklearnxgbooststatsmodels (Python) and glm(m), and bart ([R]) models.
  • The model checking tools: The model checking tools allow for easy evaluation of transpiled models and pipelines. This suit of tools makes it easy to check the I/O of a newly converted model and replicate validation set performance.
  • The Taskmanager: The taskmanager is, in some ways, the beating heart of the platform: after the transpiled models are tested and available, the taskmanager allows a user to distribute (OtA) the models over the various runtimes (often one per physical device). Models can be updated instantly, tested remotely, and their operation on the whole fleet of devices is instantly visible.

Image for post

A screenshot of the taskmanager (cloud) application: the taskmanager allows efficient allocation of models to various devices (image by author).
  • The Model governance tools: The model governance tools supplement the taskmanager: this suite of tools allows easy parsing of the logs created by the runtimes. Furthermore, these tools make it possible to automatically raise alerts if models are not functioning as they should, need updating, or otherwise need attention.
  • The runtimes: The runtimes are the small and modular “containers” that can be included in the software running on the device. Depending on the selected runtime (these range from extremely small (64Kb) with little additional functionality, to slightly larger including extensive logging and automatic testing), the runtime will auto-subscribe to the taskmanager with a unique device id allowing users of the platform to easily manage which model (and which version thereof) is available on which device (or group of devices). Runtimes are available in a large number of languages (e.g., rustcjavajavascript, etc.) and for diverse targets (think: Intel, ARM).
  • The cloud execution fallback: Depending on the runtime (see above), the task at hand (i.e., the specific inference task a device is trying to execute) will automatically be executed in the cloud if the processing on device fails. This is easily demonstrated in the javascript runtime.

Jointly, the Scailable platform allows for the over-the-air deployment of multiple models to specific devices. Thus, we can work towards managing a whole fleet of drones which autonomously fly around the harbor to detect cracks in the silos (and even fill them up automatically). Over time, we can update our detection model and deploy it safely without any additional overhead. We can monitor performance, and be sure that things work efficiently.

Conclusions

In this post, I have tried to describe — admittedly only some of the steps — of our approach towards surface maintenance of storage silos in the harbor. There are still numerous technical challenges that need to be tackled before a whole fleet of drones is able to fly around autonomously, detect cracks, and repair them. However, some of the core challenges are solved: the CNN model trained using TensorFlow on (publicly available) data seems to perform well enough — for now — to warrant deployment. By converting the trained model to a .wasm binary, the inferences are extremely fast and the memory footprint of the model is small. Furthermore, the Scailable platform makes it simple to deploy (and monitor) the model (and any future instances thereof) on pretty much any device.

Notes:

[1] A large number of parties are involved in the overall SPEED project: The Antwerp School of Management, the University of Antwerp, Lille University, JADS Den Bosch, Scailable, WSX, NXTPort, CTIC, the Port of Moerdijk, Bournemouth University, Portsmouth Port, Startups.be, PHC, and BCP counsil. In this specific project the data are prepared by Ozgenel and Gonenc (Middle East Technical University) and the CNN was trained by Xinrui Yang (Polytech Lille). The CNN was transpiled to WASM and deployed on the drone using the Scailable platform byRobin van Emden (JADS & Scailable). This specific work package was led by prof. Rochdi Merzouki (Lille University) & Arjan Haring (JADS).

[2] I am skipping some of the details regarding model training in this post Xinrui Yang, under supervision of Rochdi Merzouki is preparing a publication describing the exact training process and network performance. We will share this once it becomes available.

Robin van EmdenSilo surface maintenance using drones