Team Gingerbread:

•

•

•

Created July 24, 2024 © MIT

The Journey of Adapting fNIRS with AMD min PC

Unlock the future of neuroscience with our fNIRS and AMD mini PC integration, making advanced brain research accessible.

IntermediateOver 2 days424

PC AI: 3rd Place

Pervasive AI Developer Contest

The Journey of Adapting fNIRS with AMD min PC

Things used in this project

Hardware components

AMD Ryzen 7840HS

Minisforum Venus UM790 Pro with AMD Ryzen™ 9

Software apps and online services

Ryzen AI Software

Hand tools and fabrication machines

pytorch 2.0

Story

1. Problem Statement

The aim of this project is to address the challenge of adapting functional near-infrared spectroscopy (fNIRS) devices for the comprehensive cognitive test. Current fNIRS devices, which include fNIRS sensors and post-processing modules, are primarily used for portable applications. Therefore, they require low power consumption and stablity. However, existing market devices are not eco-friendly and highly sensitive to motion artifacts, which makes it challenging to obtain precise and consistent results for comprehensive cognitive testing.

To overcome these limitations, we propose the use of an AMD mini PC, weighing only 0.47 kg, powered by AyzenAI engine and AMD radeon graphics card. This system, utilising customised artificial neural network algorithms will effectively detect noise components, especially those from motion artifacts (MA). This integrated system provides a balance between computational efficiency and the ability to accurately observe brain activities over extended periods. The objective of this project is to develop an optimal low-power solution for motion artifacts removal in post-processing for fNIRS devices, ensuring longer operation times and enhanced usability for cognitive testing and potential therapeutic neuromodulation.

2. Approach

2.1 Introduction

Functional near-infrared spectroscopy is a non-invasive imaging technique employed to monitor brain activity by measuring blood flow. This technique is extensively utilised in neuroscience, clinical neurology, and personalised healthcare. However, fNIRS data is frequently contaminated by motion artifacts, which are unwanted signals resulting from the movement of the subject or equipment during data collection. These artifacts can significantly degrade the quality of the data, complicating the accurate interpretation of results. This project introduces a novel approach that leverages both denoising autoencoder (DAE) and a modified ResNet-50 to remove motion artifacts. The DAE and ResNet-50 models are optimised and deployed on AMD RyzenAI hardware using AMD RyzenAI software.

1 / 2 • fNIRS sensor designed by our team

2.2 Motion Artifacts

Traditional methods for removing motion artifacts from fNIRS data typically involve complex algorithms that require significant computational resources. These methods are usually performed offline, meaning that the data must be processed after it has been collected, which is not suitable for real-time applications.

Introduction to fNIRS motion artifact

3. Implementation

We used AMD Ryzen 7940HS as a heterogeneous platform, which includes AMD’s 8-core CPUs and inference processing units (IPUs). These IPUs are equipped with an FPGA-based AI engine capable of handling up to four concurrent data streams, which is essential for meeting the inference demands of artificial neural networks. The workflow demonstrates quantised IPU engine inference for Residual Neural Networks (ResNet) and Denoise Neural network (DAE) with a hardware/software codesign approach.

The Workflow of RyzenAI hardware/software codesign

3.1 Data collection and processing

The experimental data download from was pre-processed using Homer3. Simulated data was generated using MNE. To create a ground truth dataset for training and evaluation, motion artifacts were manually inserted into the data. These artifacts included random spikes and baseline shifts, which represent common types of motion artifacts encountered in fNIRS data. The data was then divided into 512x1 vectors, which served as input to the neural network model training .

3.2 Neural Network Training

Dataset Preparation:

The experimental dataset consists of 1,392 resting-state fNIRS data samples, from which experts identified and selected data from 23 channels free of motion artifacts. The simulated dataset, generated using the MNE toolbox, consists of raw resting-state fNIRS data from 8 channels, spanning 300 seconds, with totally 2,808 samples.

#load experiment data
class RealSignalDataset(Dataset):
    def __init__(self, clean_file, noisy_file):
        clean_data = pd.read_csv(clean_file, header=None)
        noisy_data = pd.read_csv(noisy_file, header=None)

        # Check the shape of the data
        print(f"Shape of clean_data: {clean_data.shape}")
        print(f"Shape of noisy_data: {noisy_data.shape}")

        self.clean_signals = torch.tensor(clean_data.values, dtype=torch.float32).unsqueeze(1).unsqueeze(1)
        self.noisy_signals = torch.tensor(noisy_data.values, dtype=torch.float32).unsqueeze(1).unsqueeze(1)

        print(f"Shape of clean_signals: {self.clean_signals.shape}")
        print(f"Shape of noisy_signals: {self.noisy_signals.shape}")

    def __len__(self):
        return len(self.clean_signals)

    def __getitem__(self, idx):
        return self.noisy_signals[idx], self.clean_signals[idx]

#usage in main function
train_dataset = RealSignalDataset('./motion_free_signal.csv', './signal_with_motion_artifacts.csv')

Dataset Labelling:

Both datasets were segmented into data fragments of 512x1 dimensions. Additionally, spike-like motion artifacts and baseline shift motion artifacts manually inserted into both simulated and experimental datasets. The locations of these artifacts were randomised, as were the magnitudes of the spikes and the heights of the baseline shifts. The experimental dataset with such labelled noise is shown in following figure, contains multiple motion artifacts in single channel light intensity data.

Signal channel fNIRS data with manually inserted motion artifacts

The simulated dataset is shown in following figure, contains randomly inserted motion artifacts (each sample contains at most one motion artifact).

Simulated dataset generated with MNE toolkit

DAE Model:

The DAE model is defined as following:

class EnhancedDenoisingAutoencoder(nn.Module):
    def __init__(self):
        super(EnhancedDenoisingAutoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.MaxPool2d((1, 2)),
            nn.Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.MaxPool2d((1, 2))
        )
        self.decoder = nn.Sequential(
            nn.Upsample(scale_factor=(1, 2), mode='nearest'),
            nn.Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.Upsample(scale_factor=(1, 2), mode='nearest'),
            nn.Conv2d(64, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.Conv2d(32, 1, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

The simulated dataset was utilised for training the DAE models for 50 epochs, while the experimental dataset was used for evaluation and testing purposes, as shown in below.

def main():
    global device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = EnhancedDenoisingAutoencoder().to(device)

    # Use the original dataset for training and calibration
    train_dataset = RealSignalDataset('./clean_simulated.csv', './noisy_simulated.csv')
    train_size = len(train_dataset) - 500
    calibration_size = 500
    train_subset, calibration_subset = random_split(train_dataset, [train_size, calibration_size])
    train_loader = DataLoader(train_subset, batch_size=10, shuffle=True)

    # Extract calibration data
    calibration_loader = DataLoader(calibration_subset, batch_size=10)

    # Use the new datasets for evaluation
    test_dataset = RealSignalDataset('./clean_experimental.csv', './noisy_experimental.csv')
    test_loader = DataLoader(test_dataset, batch_size=10)

    # Inspect the model before quantization
    sample_input = torch.randn(10, 1, 1, 512).to(device)
    inspect_model(model, sample_input)

    criterion = nn.MSELoss()
    optimizer = Adam(model.parameters(), lr=0.001)
    
    # Train the model
    model.train()
    for epoch in range(50):
        for inputs, targets in train_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
        print(f'Epoch {epoch+1}, Loss: {loss.item()}')

    # Save trained model
    torch.save(model.state_dict(), 'model_state.pth')

ResNet-50 Model

The training dataset and evaluation dataset for ResNet-50 is identical as DAE. ResNet-50 is fine-tuned as follows:

def train_and_export_model():
    global device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    model = resnet50(weights='ResNet50_Weights.IMAGENET1K_V1')
    model.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    model.fc = nn.Linear(model.fc.in_features, 512)  # Adjust for regression output

    model = model.to(device)

    train_dataset = RealSignalDataset('./clean_simulated.csv', './noisy_simulated.csv')
    train_size = len(train_dataset) - 500
    calibration_size = 500
    train_subset, calibration_subset = random_split(train_dataset, [train_size, calibration_size])
    train_loader = DataLoader(train_subset, batch_size=32, shuffle=True)

    test_dataset = RealSignalDataset('./clean_experimental.csv', './noisy_experimental.csv')
    test_loader = DataLoader(test_dataset, batch_size=32)

    criterion = nn.MSELoss()
    optimizer = Adam(model.parameters(), lr=0.001)
    
    top_models = []  # Min-heap to keep top 5 models

    model.train()
    for epoch in range(50):
        for inputs, targets in train_loader:
            inputs = inputs.to(device)  # (N, 1, 1, 512)
            targets = targets.to(device)  # (N, 1, 1, 512)
            optimizer.zero_grad()
            outputs = model(inputs).view(-1, 1, 1, 512)  # Adjust output shape to (N, 1, 1, 512)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
        print(f'Epoch {epoch+1}, Loss: {loss.item()}')

        # Evaluate the model using test data
        test_loss, test_mse, test_rmse = evaluate(model, test_loader, criterion)
        score = test_mse + test_rmse  # Combined score with 1:1 weight
        save_top_models(top_models, model, epoch, score)
        print(f'Epoch {epoch+1}, Test MSE: {test_mse}, Test RMSE: {test_rmse}, Score: {score}')

    save_models(top_models)
    print("Top models saved.")

    dummy_input = torch.randn(1, 1, 1, 512).to(device)  # Adjust dimensions as needed
    torch.onnx.export(model, dummy_input, "model.onnx", opset_version=13, input_names=['input'], output_names=['output'])

The onnx model snipshot for DAE and ResNet-50 are shown below:

1 / 2 • ResNet-50

3.3 AMD Ryzen AI quantisation Setting

To take advantage the Ryzen AI engine capability, we utilised AMD Ryzen AI development flow to implement quantisation techniques.

Code screen shot for the quantalisation:

def quantize_model(preprocessed_model_path, output_model_path, clean_file, noisy_file):
    dr = ResnetCalibrationDataReader(clean_file, noisy_file, batch_size=32)
    vai_q_onnx.quantize_static(
        preprocessed_model_path,
        output_model_path,
        dr,
        quant_format=vai_q_onnx.QuantFormat.QDQ,
        calibrate_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
        activation_type=vai_q_onnx.QuantType.QUInt8,
        weight_type=vai_q_onnx.QuantType.QInt8,
        enable_dpu=True,
        extra_options={'ActivationSymmetric': True}
    )
    print('Calibrated and quantized model saved at:', output_model_path)

# Load the pre-trained model
    model = EnhancedDenoisingAutoencoder()
    model.load_state_dict(torch.load("model_state.pth", map_location=torch.device('cpu')))
    model.eval()
    input_tensor = torch.randn(1, 1, 1, 512)  # Ensure this matches your model's expected input shape
    export_model_to_onnx(model, input_tensor, input_model_path)

    # Preprocess the ONNX model
    preprocess_onnx_model(input_model_path, preprocessed_model_path)

    # Quantize the preprocessed model
    quantize_model(preprocessed_model_path, output_model_path, clean_file, noisy_file)

    # Load the quantized model for inference
    quantized_model = onnx.load(output_model_path)

1 / 2 • DAE after quantisation

3.4 Neural Network Inference setting

The DAE network Inference Screenshot:

# Load the quantized model for inference
    quantized_model = onnx.load(output_model_path)

    # providers = ['CPUExecutionProvider']
    # provider_options = [()]

    providers = ['VitisAIExecutionProvider']
    cache_dir = Path(__file__).parent.resolve()
    provider_options = [{
                'config_file': 'vaip_config.json',
                'cacheDir': str(cache_dir),
                'cacheKey': 'denoise_cache'
            }]
    session = ort.InferenceSession(quantized_model.SerializeToString(), providers=providers, provider_options=provider_options)

    # Create the test data loader
    test_loader = create_dataloader(clean_file, noisy_file, batch_size=32)

The code sample for ResNet-50 network inference:

# Load the quantized model for inference
    quantized_model = onnx.load(output_model_path)
    cache_dir = Path(__file__).parent.resolve()
    
    if use_cpu:
        providers = ['CPUExecutionProvider']
        provider_options = []
    else:
        providers = ['VitisAIExecutionProvider']
        provider_options = [{
                    'config_file': 'vaip_config.json',
                    'cacheDir': str(cache_dir),
                    'cacheKey': 'modelcachekey'
                }]
    
    session = ort.InferenceSession(quantized_model.SerializeToString(), providers=providers, provider_options=provider_options)

    # Create the test data loader
    test_loader = create_dataloader(clean_file, noisy_file, batch_size=32)

3.5 Demostration

The successful deployment of the DAE model on IPU of AMD Ryzen 7940HS chip is demonstrated in below, following the quantisation process.

Demonstration of DAE inference

Fig. 3.4.a Screenshot of DAE model deployment on IPU

The successful deployment of the ResNet-50 model on IPU of AMD Ryzen 7940HS chip is demonstrated in below, following the quantisation process to achieve enhanced performance and efficiency.

Demonstration of ResNet50 inference

Fig. 3.4.b The screenshot of ResNet-50 deployment on IPU

3.6 Performance Evaluation

Quality evaluation

The quality evaluation of motion artifact correction evaluation was performed using five randomly chosen samples, as shown in the plotted image (Fig. 3.5.a & Fig. 3.5.b). The input signal ( blue solid line ), the model's predicted output (orange dashed line), and the target ( orange dotted line) indicate that the model corrected three shift motion artifacts (Example1, Example2, Exampl5) and two spike motion artifacts (Example 3 and Example 4) using DAE model. The results has similar trends to those observed in the ResNet-50 model.

Fig. 3.5.a DAE Result of fNIRS motion artifact removing

Fig. 3.5.b ResNet-50 result of fNIRS motion artifact removing

Quantitative evaluation

To evaluate the predictions of artificial neural networks with standard motion artifacts removal methods, such as spline interpolation and principal component analysis (PCA), the mean squared error (MSE) is chosen as the evaluation metric. As demonstrated in the following analysis, the Resnet-50 outperforms other methods.

# Evaluate the model
    mse, rmse, avg_inference_time, avg_memory_usage, outputs, targets = evaluate_model(session, test_loader)
    print(f'MSE: {mse}')
    print(f'RMSE: {rmse}')
    print(f'Average Inference Time: {avg_inference_time} seconds')
    print(f'Average Memory Usage: {avg_memory_usage / (1024 ** 2)} MB')  # Convert bytes to MB

    # Show examples
    plot_examples([test_loader.dataset[i][0].squeeze().numpy() for i in range(5)], out
puts[:5], targets[:5])

MSE comparision between Spline Interpolation, PCA, DAE(after quantization), and ResNet-50(after quantization)

Table in below is to show the comparison of inference time and memory usage.

Result for inference speed and memory usage comparision

To compare the efficiency ratios of the DAE and ResNet-50 models under various hardware configurations, a box plot was generated using CPU and IPU operational data. The CPU operates within a frequency range of 4.0 GHz to 5.2 GHz, while the IPU operates at a fixed frequency of 1.0 GHz. Load distributions were considered with ResNet-50 at 99.49% on IPU and 0.51% on CPU, and DAE at 87.3% on IPU and 12.7% on CPU, alongside 100% CPU usage scenarios. Efficiency ratios were calculated as inversely proportional to the load and frequency, combining CPU and IPU contributions where applicable. The box plot visualized these ratios across four categories: DAE with 100% CPU, DAE with combined CPU and IPU, ResNet-50 with 100% CPU, and ResNet-50 with combined CPU and IPU, effectively illustrating the performance impact of different hardware setups.

Efficiency ration comparison under various hardware configuration with DAE and ResNet-50 model

The integration of CPU and IPU for model inference demonstrates a significant enhancement in power efficiency, achieving up to a 5.1-fold improvement compared to using the CPU alone. This substantial increase in efficiency greatly extends battery life, making it particularly advantageous for portable applications, such as portable fNIRS health monitoring systems.

4. Monte Carlo Simulation

We also use the MCX toolkit to simulate photon propagation paths based on Monte Carlo to evaluate the physical 3D light distribution accuracy at a biological plausible brain model based on AMD Radeon 780M GPU.

The initial absorption coefficients, summarised in Table below were referenced from similar studies. The number of photons that used for the brain model is 1e7.

Table I: initial absorption coefficients value

Full head mesh model on AMD Radeon GPU

Monte Carlo simulation sample results

The screenshot of simulation results in shown in below:

5. Conclusion & Future work

The implementation of the DAE model and ResNet-50 model on AMD RyzenAI hardware/software co-design on the AMD miniPC represents a significant advancement in power efficiency for fNIRS data processing. This project combines the efficacy of deep learning with the practicality of embedded system deployment, offering a highly efficient solution for removing motion artifacts from fNIRS data. We found that the RyzenAI development flow is exceptionally developer-friendly, with clear and straightforward steps from model preparation and quantisation to deployment. This work opens up the opportunity to bring fNIRS from laboratory research into the broader community, paving the way for more advanced and efficient fNIRS data processing techniques and contributing to the wider field of brain imaging and neurotechnology.

Code

Credits

Comments

Awards

PC AI: 3rd Place

Pervasive AI Developer Contest

The Journey of Adapting fNIRS with AMD min PC

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

1. Problem Statement

2. Approach

3. Implementation

4. Monte Carlo Simulation

5. Conclusion & Future work

Schematics

fnirssensor

Code

AMDRyzenAIfNIRS

Credits

WOLVS

Kevin Zhao

luxury

Hotpot28

Comments

Awards

Embed the widget on your own site

The Journey of Adapting fNIRS with AMD min PC

The Journey of Adapting fNIRS with AMD min PC

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

1. Problem Statement

2. Approach

3. Implementation

4. Monte Carlo Simulation

5. Conclusion & Future work

Schematics

fnirssensor

Code

AMDRyzenAIfNIRS

Credits

WOLVS

Kevin Zhao

luxury

Hotpot28

Comments

Awards

Related channels and tags