What this project is about
Project resources
The problem
Existing solutions
Stationary vs. wearable object detection
Reflective surfaces
Classes of items
Edge Impulse Studio
Manual data collection and labeling
Synthetic training data
NVIDIA Omniverse Replicator
Solution overview
Installing Omniverse Code, Replicator and setting up debugging with Visual Studio Code
Creating a 3D stage/scene in Omniverse
Working with 3D models in Blender
Importing 3D models in Omniverse, retaining transformations, applying materials
Setting metadata on objects
Creating script for domain randomization
Creating label file for Edge Impulse Studio
Creating an object detection project in Edge Impulse Studio and uploading dataset
Uploading images using Edge Impulse extension for Omniverse Isaac
Uploading images and labels using CLI edge-impulse-uploader
Using web UI to upload images and labels
Training and testing model
EON Tuner and Search Space
Model training
Transfer learning
NVIDIA TAO Toolkit and Edge Impulse
Testing model in synthetic environment
Model improvements
Deploying model
Deploy model to Jetson Orin Nano using edge-impulse-linux
Edge detection
Application
Final words
About the author

Published April 26, 2024 © Apache-2.0

Surgery asset tracking

Computer vision application to keep track of instruments and materials used during surgery to avoid Retained Surgical Bodies.

IntermediateFull instructions provided24 hours352

Things used in this project

Hardware components

NVIDIA Jetson Orin Nano Developer Kit

Software apps and online services

Edge Impulse Studio

NVIDIA Omniverse

Blender

Story

NVIDIA Omniverse synthetic image generation

What this project is about

This is a computer vision application to keep track of instruments and materials used during surgery as an extra layer of control to avoid Retained Surgical Bodies. To achieve this an object detection model was created using over 30, 000 annotated synthetic training images generated with NVIDIA Omniverse Replicator. The dataset was used to fine-tune an object detection model with Edge Impulse Studio using integrated NVIDIA TAO transfer learning. Further, the model was validated on synthetic scenes using Edge Impulse Omniverse extension. Finally the model was built, ready to be deployed NVIDIA Jetson Orin Nano for use with camera feeds from surgical lights in operating rooms (OR).

Project inference demo

Project resources

Kaggle dataset 30, 000 images in 320*320 pixel RGB, including labels

Github project code repository

Edge Impulse Studio project repository

The problem

Extensive routines are in place pre-, during, and post-operation to make sure no unintentional items are left in the patient. In the small number of cases when items are left the consequences can be severe, in some cases fatal.

In the following drawing we see how equipment and disposable materials are typically organized during surgery. Tools are pre-packaged in sets for the appropriate type of surgery and noted when organized on trays or tables. Swabs are packaged in numbers and contain tags that are noted and kept safe. When swabs are used they are displayed individually in transparent pockets on a stand so they can be counted and checked with the tags from the originating package. Extensive routines are in place to continuously count all equipment used; still errors occur an estimated rate between 0.3 and 1 per 1000 abdominal operations.

Sketch illustrating typical surgery layout by Eivind Holt

Existing solutions

Existing solutions are mainly based on either x-ray or RFID. With x-ray, the patient needs to be scanned using a scanner on wheels. Metal objects obviously will be visible, while other items such as swabs needs to have metal strips woven to be detected and the surgery team has to wear lead aprons. Some items have passive RFID-circuits embedded and can be detected by a handheld scanner.

Swabs with metallic strip, photo Eivind Holt

Stationary vs. wearable object detection

This project started as a proof-of-concept demonstrating that an object detection model could be deployed and run on heavily constrained hardware, an Arduino Nicla Vision, as a wearable. This was documented on Edge Impulse. Many operating rooms (OR) are equiped with adjustable lights with a camera embedded. A video feed from such a camera could make a viable source for the object detection model. The wearable PoC proved a success and sparked the question: could the work be transferred to an application that could utilize more powerful hardware and greater video resolution? This article documents the steps involved in transferring the application to a model that can run on a NVIDIA Jetson with potential to be hooked up to a surgical camera feed.

STERIS Surgical Light and Camera

NVIDIA Jetson Orin Nano development kit

Model deployed on Arduino Nicla Vision

Wearable version

Reflective surfaces

Most of the tools used in surgery have a chrome surface. Due to the reflective properties of chrome, especially the high specular reflection and highlights, a given item's features will vary highly judged by its composition of pixels, in this context known as visual features.

Reflective surfaces

Classes of items

More distinct types of objects can be added to the training data simply by importing 3D models. The goal of the device isn't to identify specific types of items, but rather to make the surgery team aware if item count changes from surgery start to end. With this approach we can group items with similar shapes and surfaces.

Render by AssetKit

Edge Impulse Studio

Edge Impulse Studio offers a web-based development platform for creating machine learning solutions from concept to deployment.

Edge Impulse Studio

Manual data collection and labeling

Using the camera on the intended device for deployment around 600 images were initially captured and labeled. Most images contained several items and a fraction were unlabeled images of unrelated background objects.

Manually captured and labeled images

The model trained on this data was quickly deemed useless in detecting reflective items but provided a nice baseline for the proceeding approaches.

To isolate the chrome surfaces as a problematic issue, a number of chrome instruments were spray painted matte and a few plastic and cloth based items were used to make a new manually captured and labeled dataset of the same size. For each image the items were scattered and the camera angle varied.

Matte vs. reflective surfaces, photo Eivind Holt

Manual labeling in Edge Impulse Studio

Synthetic training data

A crucial part of any ML-solution is the data the model is trained, tested and validated on. In the case of a visual object detection model this comes down to a large number of images of the objects to detect. In addition each object of interest in each image needs to be labeled. Edge Impulse offers an intuitive tool for drawing boxes around the objects in question and to define labels. On large datasets manual labeling can be a daunting task, thankfully EI offers an auto-labeling tool. Other tools for managing datasets offer varying approaches for automatic labeling, for instance using large image datasets. However, often these datasets are too general and fall short for specific use cases.

NVIDIA Omniverse Replicator

One of the main goals of this project is to explore creating synthetic object images that come complete with labels. This is achieved by creating a 3D scene in NVIDIA Omniverse and using it's Replicator synthetic data generation toolbox to create thousands of slightly varying images, a concept called domain randomization. With a NVIDIA RTX 3090 graphics card from 2020 it is possible to produce about 2 ray-traced images per second. Thus, creating 10, 000 images would take about 5 hours.

Real-time ray tracing

Solution overview

We will be walking through the following steps to create an object detection model that can run on a NVIDIA Jetson Orin Nano. The 3D models used to generate the training images are not distributed in the project repository due to limitations in licencing, the models can be purchased at TurboSquid. An updated Python environment with Visual Studio Code is recommended. A 3D geometry editor such as Blender is needed if object 3D models are not in USD-format (Universal Scene Description).

Installing Omniverse Code, Replicator and setting up debugging with Visual Studio Code
Creating a 3D stage/scene in Omniverse
Working with 3D models in Blender
Importing 3D models in Omniverse, retaining transformations, applying materials
Setting metadata on objects
Creating script for domain randomization
Creating label file for Edge Impulse Studio
Creating an object detection project in Edge Impulse Studio and uploading dataset
Training, improving and deploying model to device

Installing Omniverse Code, Replicator and setting up debugging with Visual Studio Code

Install Omniverse from NVIDIA.
Install Code: Open Omniverse Launcher, go to Exchange, install Code.

NVIDIA Omniverse Code

Launch Code from NVIDIA Omniverse Launcher.

NVIDIA Omniverse Launcher

Go to Window->Extensions and install Replicator

NVIDIA Omniverse Replicator

Install Embedded VS Code for NVIDIA Omniverse.

Creating a 3D stage/scene in Omniverse

Create a new stage/scene (USD-file)
Create a textured plane that will be a containment area for scattering the objects
Create a larger textured plane to fill the background
Add some lights

Create stage, photo Eivind Holt

If you have a hefty heat producing GPU next to you, you might prefer to reduce the FPS limit in the viewports of Code. It may default to 120 FPS, generating a lot of heat when the viewport is in the highest quality rendering modes. Set "UI FPS Limit" to something like 60 or 30. This setting unfortunately does not persist between sessions, so we have to repeat this everytime projects open.

FPS Limit

Working with 3D models in Blender

The objects we want to be able to detect need to be represented with a 3D model and a surface (material). Omniverse provides a library of ready-to-import assets, further models can be created using editors such as Blender or purchased on sites such as TurboSquid.

Model render in Blender by AssetKit

Selecting objects for export

Exporting model in Blender, photo Eivind Holt

A scene containing multiple geometric models should be exported on an individual model basis, with USD-format as output.

Exporting model in Blender, photo Eivind Holt

Omniverse has recently received limited support in importing BSDF material compositions, but this is still experimental. In this project materials or textures were not imported directly.

Importing 3D models in Omniverse, retaining transformations, applying materials

To avoid overwriting any custom scaling or other transformations set on exported models it is advisable to add a top node of type Xform on each model hierarchy. Later we can move the object around without loosing adjustments.

NVIDIA Omniverse real-time ray tracing

Importing 3D model in Omniverse, photo Eivind Holt

The replicator toolbox has a function for scattering objects on a surface in it's API. To (mostly) avoid object intersection a few improvements can be made. In the screenshot a basic shape has been added as a bounding box to allow some clearance between objects and to make sure thin objects are more appropriately handled while scattering. The bounding box can be set as invisible.

Bounding box, photo Eivind Holt

For the chrome surfaces a material from one of the models from the library provided through Omniverse was reused, look for http://omniverse-content-production.s3-us-west-2.amazonaws.com/Materials/Base/Metals/Chrome/ in the Omniverse Asset Store. Remember to switch to "RTX - Interactive" rendering mode to see representative ray-tracing results, "RTX - Real-Time" is a simplified rendering pipeline.

Chrome material, photo Eivind Holt

For the cloth based materials some of the textures from the original models were used, more effort in setting up the shaders with appropriate texture maps could improve the results.

Cloth material, photo Eivind Holt

Synthetic image generation

Setting metadata on objects

To be able to produce images for training and include labels we can use a feature of Replicator toolbox found under menu Replicator->Semantics Schema Editor.

Semantics Schema Editor, photo Eivind Holt

Here we can select each top node representing an item for object detection and adding a key-value pair. Choosing "class" as Semantic Type and e.g. "tweezers" as Semantic Data enables us to export these strings as labels later. The UI could benefit from a bit more exploration in intuitive design, as it is easy to misinterpret what fields shows the actual semantic data set on an item, an what fields carry over intended to make labeling many consecutive items easier.

Semantics Schema Editor suggestion, photo Eivind Holt

Semantics Schema Editor may also be used with multiple items selected. It also has handy features to use the names of the nodes for automatic naming.

Creating script for domain randomization

This part describes how to write a script in Python for randomizing the content of the images we will produce with Omniverse Code. We could choose to start with an empty stage and programatically load models (from USD-files), lights, cameras and such. With a limited number of models and lights we will proceed with adding most items to the stage manually as described earlier. Our script can be named anything, ending in.py and preferably placed close to the stage USD-file. The following is a description of such a script replicator_init.py:

To keep the items generated in our script separate from the manually created content we start by creating a new layer in the 3D stage:

python
with rep.new_layer():

Next we specify that we want to use ray tracing as our image output. We create a camera and hard code the position. We will point it to our items for each render later. Then we use our previously defined semantics data to get references to items, background items and lights for easier manipulation. Lastly we define our render output by selecting the camera and setting the desired resolution. We set the output resolution at 320x320 pixels. Edge Impulse Studio will take care of scaling the images to the desired size should we use images of different size than the configured model size.

rep.settings.set_render_pathtraced(samples_per_pixel=64)
camera = rep.create.camera(position=(0, 24, 0))
tools = rep.get.prims(semantics=[("class", "tweezers"), ("class", "scissors"), ("class", "scalpel"), ("class", "sponge")])
backgrounditems = rep.get.prims(semantics=[("class", "background")])
lights = rep.get.light(semantics=[("class", "spotlight")])
render_product = rep.create.render_product(camera, (320, 320))

Due to the asynchronous nature of Replicator we need to define our randomization logic as call-back methods by first registering them in the following fashion:

rep.randomizer.register(scatter_items)
rep.randomizer.register(randomize_camera)
rep.randomizer.register(alternate_lights)

Before we get to the meat of the randomization we define what will happen during each render:

with rep.trigger.on_frame(num_frames=10000, rt_subframes=20):
		rep.randomizer.scatter_items(tools)
		rep.randomizer.randomize_camera()
		rep.randomizer.alternate_lights()

num_frames defines how many renders we want. rt_subframes lets the render pipeline proceed a number of frames before capturing the result and passing it on to be written to disk. Setting this high will let advanced ray tracing effects such as reflections have time to propagate between surfaces, though at the cost of higher render time. Each randomization sub-routine will be called, with optional parameters.

To write each image and semantic information to disk we use a provided API. We could customize the writer but as of Replicator 1.9.8 on Windows this resulted in errors. We will use "BasicWriter" and rather make a separate script to produce a label format compatible with EI.

writer = rep.WriterRegistry.get("BasicWriter")
writer.initialize(
    output_dir="out",
    rgb=True,
    bounding_box_2d_tight=True)

writer.attach([render_product])
asyncio.ensure_future(rep.orchestrator.step_async())

Here rgb tells the API that we want the images to be written to disk as png-files, bounding_box_2d_tight that we want files with labels (from previously defined semantics) and bounding boxes as rectangles. The script ends with running a single iteration of the process in Omniverse Code, so we can visualize the results.

The bounding boxes can be visualized by clicking the sensor widget, checking "BoundingBox2DTight" and finally "Show Window".

Bounding Boxes, photo Eivind Holt

Only thing missing is defining the randomization logic:

def scatter_items(items):
    table = rep.get.prims(path_pattern='/World/SurgeryToolsArea')
    with items as item:
        carb.log_info("Tool: " + tool)
        logger.info("Tool: " + tool)
        rep.modify.pose(rotation=rep.distribution.uniform((0, 0, 0), (0, 360, 0)))
        rep.randomizer.scatter_2d(surface_prims=table, check_for_collisions=True)
    return items.node

def randomize_camera():
    with camera:
        rep.modify.pose(
            position=rep.distribution.uniform((-10, 50, 50), (10, 120, 90)),
            look_at=(0, 0, 0))
    return camera

def alternate_lights():
    with lights:
        rep.modify.attribute("intensity", rep.distribution.uniform(10000, 90000))
    return lights.node

For scatter_items we get a reference to the area that will contain our items. Each item is then iterated so that we can add a random rotation (0-360 degrees on the surface plane) and use scatter_2d to randomize placement. For the latter, surface_prims takes an array of items to use as possible surfaces, check_for_collisions tries to avoid overlap. The order of operations is important to avoid overlapping items.

For the camera we simply randomize the position in all 3 axis and make sure it points to the center of the stage.

With the lights we randomize the brightness between a set range of values.

Note that in the provided example rendering images and labels is separated between the actual objects we want to be able to detect and background items for contrast. The process would run once for the surgery items, then the following line would be changed from

rep.randomizer.scatter_items(tools)

rep.randomizer.scatter_items(backgrounditems)

When rendering the items of interest the background items would have to be hidden, either manually or programatically, and vice versa. The output path should also be changed to avoid overwriting the output.

Whether the best approach for training data is to keep objects of interest and background items in separate images or to mix them is debated, both with sound reasoning. In this project the best results were achieved by generating image sets of both approaches.

Synthetic training images

Creating label file for Edge Impulse Studio

Edge Impulse Studio supports a wide range of image labeling formats for object detection. Unfortunately the output from Replicator's BasicWriter needs to be transformed so it can be uploaded either through the EI Omniverse extension, web interface or via web-API.

Provided is a simple Python program, basic_writer_to_pascal_voc.py. A simple prompt was written for ChatGPT describing the output from Replicator and the desired results described at EI. Run the program from shell with

Copy

python basic_writer_to_pascal_voc.py <input_folder>

or debug from Visual Studio Code by setting input folder in launch.json like this:

Copy

"args": ["../out"]

This will create a file bounding_boxes.labels that contains all labels and bounding boxes per image.

Creating an object detection project in Edge Impulse Studio and uploading dataset

Look at the provided object detection Edge Impulse project or follow a guide to create a new object detection project.

For a project intended to detect objects with reflective surfaces a large number of images is needed for training, but the exact number depends on a lot of factors and experimentation should be expected. It is advisable to start relatively small, say 1, 000 images of the objects to be detected. For this project over 30, 000 images were generated.

EI creates unique identifiers per image, so you can run multiple iterations to create and upload new datasets, even with the same file names. Just upload all the images from a batch together with the bounding_boxes.labels file.

This way we can effortlessly produce thousands of labeled images and witness how performance on detecting reflective objects increases. Keep in mind to try to balance the number of labels for each class.

Data acquisition, photo Eivind Holt

For uploading the images we have a few options.

Uploading images using Edge Impulse extension for Omniverse Isaac

The Edge Impulse Omniverse Isaac extension is really handy when creating synthetic training images in Omniverse. We will get back to it later when testing the model. It can connect to your Edge Impulse project so you can upload any generated images. As of March 2024 you have to clone the extension repo and add to Omniverse manually, follow the steps in the repo. Once registrered it can be enabled to find the extension UI.

Edge Impulse extension in NVIDIA Omniverse Isaac

Connect to your Edge Impulse project by setting your API key (this key is obtained from your Edge Impulse project Dashboard > Keys > API Keys), then click Connect.
Once your project is connected to your Edge Impulse Omniverse extension, select the Data Upload, then specify your dataset's local path on your computer, select the dataset category to upload to (training, testing, or anomaly), then click Upload to Edge Impulse:

Extension connected to Edge Impulse project by API key, ready to upload images

Uploading images and labels using CLI edge-impulse-uploader

Since we have generated both synthetic images and labels, we can use the CLI tool from Edge Impulse to efficiently upload both. Use

edge-impulse-uploader --category split --directory [folder]

to connect to account and project and upload image files and labels in bounding_boxes.labels. To switch project first do

edge-impulse-uploader --clean

At any time we can find "Perform train/test split" under "Danger zone" in project dashboard.

Perform train/test split

Using web UI to upload images and labels

If in doubt you can always upload images and labels in Edge Impulse Studio under "Data acquisition".

Upload images and labels in Studio

Training and testing model

Finally we can design and train our object detection model. Determining optimal Neural Network architecture and parameters can be complex. Luckily EON Tuner can do the heavy lifting for us by finding the best suited model architecture. For specific constraints EON Tuner Search Space allows us to narrow options.

EON Tuner and Search Space

We can start EON Tuner to find appropriate settings simply by specifying target device and a rough latency.

EON Tuner

EON Tuner testing hyper parameters for NVIDIA TAO transfer learning

EON Tuner Search Space further lets us narrow parameters.

Refer to the documentation for defining block parameters.

Model training

With the tools to effortlessly create as much training data as we want at our disposal, we have a lot of options available for what approach to choose. For instance, we could train a model using FOMO architecture from scratch, supplying as much data as needed to achieve good results.

FOMO model performance

FOMO model testing

Alteratively we can fine-tune an existing object detection model with a fraction of the images needed in the previous architecture. Make no mistake, capturing just 1000 images and manually labeling 5-6 items in each image is increadibly labor intensive, so being able to generate the data is extremely useful in both cases.

Fine-tuning an existing model poses it's own set of parameters to experiment with. For instance, "Freeze blocks" allows us to specify what initial CNN blocks we want to transfer unaltered from the backbone in our new model. The available IDs for each model vary and needs to be referenced for the selected architecture. For instance, here we can find the available block IDs for the different backbones when fine-tuning a SSD model.

Edge Impulse EON Tuner Search space can help us find suitable parameters automatically.

NVIDIA TAO SSD with MobileNet backbone, block 0-7 frozen

Transfer learning

Transfer learning in the context of computer vision and object detection involves taking a model pre-trained on a large dataset and adapting it to a specific object detection task. This approach leverages the model's learned features, such as edges, textures, and shapes, which are common across different visual recognition tasks, to achieve higher performance with less data and training time on a new task.

A critical aspect of transfer learning is the concept of "freezing" the initial layers of the pre-trained model. In deep learning models, the initial layers capture general features (e.g., edges and textures) that are applicable across various tasks. By freezing these layers, their weights are kept unchanged during the training process on the new task.

Transfer learning - Navaneetha Krishnan L

NVIDIA TAO Toolkit and Edge Impulse

The integration of NVIDIA TAO in Edge Impulse Studio facilitates the building of efficient models faster by combining the power of transfer learning and the latest NVIDIA TAO models. These can be deployed across the entire Edge Impulse ecosystem of devices, silicon, and sensors.

Edge Impulse NVIDIA TAO integration

Testing model in synthetic environment

A really convenient feature in Edge Impulse extension for Omniverse Isaac is the ability to test the object detection model on a synthetic 3D scene as a digital twin. Just load up the scene that was created for generating annotated training images in Isaac, enable the extension as described earlier and hit "Classify". You could also use generic pre-built 3D scenes found in Omniverse and import your 3D models to simulate different environments. For this to work we only have to build a WebAssembly (wasm) target in Edge Impulse Studio under "Deployment".

WebAssembly target that can run in any Node.js environment

Now we can navigate our 3D scene and test the model on any view. We can even compare the model inference output side-by-side to the bounding boxes in Omniverse.

Bounding boxes visualization menu in Omniverse Isaac

Testing the model in synthetic environment

In the image above on the left square image we see model inference (bounding boxes, labels and confidence score in red) side-by-side with visualization of the bounding boxes of the actual models in the scene. This enables efficient model development iterations and can uncover a lot of issues without having to deploy the model to device and testing in a real environment.

Model improvements

The first version of the object detection model showed great results when tested on witheld images from the same generation process. However, when tested in both virtual environment using the Edge Impulse extension for Omniverse and on real surgery equipment it was soon clear that it would produce many false positives when exposed to background items. To remidy this a new 10, 000 items dataset was generated where the objects of interest were put on top of random background images.

False positives on background shapes

For this 5, 000 images were downloaded from the COCO 2017 dataset and the replicator program extended as such:

import os

def randomize_screen(screen, texture_files):
    with screen:
        # Let Replicator pick a random texture from list of .jpg-files
        rep.randomizer.texture(textures=texture_files)
    return screen.node

# Define what folder to look for .jpg files in
folder_path = 'C:/Users/eivho/source/repos/icicle-monitor/val2017/testing/'

# Create a list of strings with complete path and .jpg file names
texture_files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.jpg')]

# Register randomizer
rep.randomizer.register(randomize_screen)

# For each frame, call randomization function
with rep.trigger.on_frame(num_frames=2000, rt_subframes=50):
    # Other randomization functions...
    rep.randomizer.randomize_screen(screen, texture_files)

Random textures under the objects of interest

Adding this to the training/testing sets greatly increased performance when bacground objects are in view. Testing in a real surgery environment would determine if more of this type of images are needed.

Virtual validation with background noise

Model testing

Deploying model

In validating the model in Omniverse we have already deployed it in a form that can run on Windows or Linux bound by CPU. In the first version of the project the model was deployed to run on MCU, a Arduino Nicla Vision. Now we want to utilize the hardware capabilities of a NVIDIA Jetson Orin Nano, especially increased RAM size and the GPU, so we can run inference on a video feed from cameras mounted in operation lights.

Deploy model to Jetson Orin Nano using edge-impulse-linux

First make sure to follow the NVIDIA Jetson Getting Started Guide to flash a SD-card with the Ubuntu OS provided by NVIDIA with Jetpack (5.1.2 for Orin as of March 2024) and finish setup.

Following the Edge Impulse documentation for deployment to Jetson we must download the proper firmware for the Jetson:

wget -q -O - https://cdn.edgeimpulse.com/firmware/linux/jetson.sh | bash

We can use the generic Linux wrapper to test the model by running the edge-impulse-linux CLI tool. This will start a wizard which will ask you to log in, and choose an Edge Impulse project. If you want to switch projects run the command with --clean.

To run your impulse locally, just connect to your Jetson again, and run:

edge-impulse-linux-runner

This will automatically compile the model with full GPU and hardware acceleration, download the model to the Jetson, and then start classifying. Edge Impulse Linux SDK has examples on how to integrate the model with a range of programming languages.

Edge detection

Another interesting approach to the challenge of detecting reflective surfaces is using edge detection. This would still benefit from synthetic data generation with automatic labeling. By using OpenCV's Canny function we can easily create training images only concentrating on object edges, while maintaining the labels.

# Convert the image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Use Canny edge detector
edges = cv2.Canny(gray, threshold1=50, threshold2=200)

Edge detection

To utilize edge detection we would have to apply similar transformation on the input when running inference.

Application

The outlines of the final application flow looks like this:

The equipment planned for use in a given surgery is retrieved from the surgery planning system, as can be tested in open DIPS sandbox.

Surgery planning API in DIPS sandbox

Before surgery start planned inventory is compared and confirmed to the equipment layed out on table.
As surgery progresses the inventory list is automatically maintained with what tools can be seen on table.
Live inventory can be viewed on mobile app. Extra equipment and disposables can be registrered when introduced ad-hoc.
At surgery end a final tally of the equipment prepared and returned to table is documented in surgery planning system.

Technically this would consist of several components:

Integration with surgery planning system to retrieve planned equipment as a structured list for the currently planned surgery in the current OR. The equipment list would need to be mapped to the classes used in the Object Detection model.
Live camera feed from surgery camera.
Live inference on NVIDIA Jetson Orin Nano that publishes MQTT-messages with currently detected objects.

Application outline

Final words

This project demostrates that it is possible to create powerful object detection models purely on synthetic images. As demonstrated, due to the difficult nature of the objects of interest this project would not be feasible to be implemented with traditional manual data capture and annotation. The project also demonstrates the power of transfer learning, where existing computer vision models trained on large generic data sets can be fine-tuned for specific applications. Finally the project demonstrates how easily an object detection model based on synthetic data can be targeted both to constrained hardware, such as a microcontroller and to a more powerful GPU-based platform.

The power of synthetic training data

Synthetic image generation requires 3D models to represent the real-life counterparts. Luckily 3D assets are readily available, and advances in generative text-to-3D models are rapidly becoming usable.

Digital twin

About the author

Eivind Holt has worked as a developer in Norwegian Electronic Health Record system vendor DIPS since 2004, lately as Tech Lead in research and innovation. On his spare time he experiments with the latest advances in sensor and AI tech, writes articles, hosts workshops and talks.

Code

Credits

Eivind Holt

6 projects • 34 followers

Software engineer, maker, IoT-enthusiast.

Contact

Comments

Please log in or sign up to comment.

Embed the widget on your own site

Surgery asset tracking

Surgery asset tracking

Things used in this project

Hardware components

Software apps and online services

Story

What this project is about

Project resources

The problem

Existing solutions

Stationary vs. wearable object detection

Reflective surfaces

Classes of items

Edge Impulse Studio

Manual data collection and labeling

Synthetic training data

NVIDIA Omniverse Replicator

Solution overview

Installing Omniverse Code, Replicator and setting up debugging with Visual Studio Code

Creating a 3D stage/scene in Omniverse

Working with 3D models in Blender

Importing 3D models in Omniverse, retaining transformations, applying materials

Setting metadata on objects

Creating script for domain randomization

Creating label file for Edge Impulse Studio

Creating an object detection project in Edge Impulse Studio and uploading dataset

Uploading images using Edge Impulse extension for Omniverse Isaac

Uploading images and labels using CLI edge-impulse-uploader

Using web UI to upload images and labels

Training and testing model

EON Tuner and Search Space

Model training

Transfer learning

NVIDIA TAO Toolkit and Edge Impulse

Testing model in synthetic environment

Model improvements

Deploying model

Deploy model to Jetson Orin Nano using edge-impulse-linux

Edge detection

Application

Final words

About the author

Schematics

Application schematic

Code

Project code repository

Credits

Eivind Holt

Comments

Related channels and tags