The aim of this project is to address the challenge of adapting functional near-infrared spectroscopy (fNIRS) devices for the comprehensive cognitive test. Current fNIRS devices, which include fNIRS sensors and post-processing modules, are primarily used for portable applications. Therefore, they require low power consumption and stablity. However, existing market devices are not eco-friendly and highly sensitive to motion artifacts, which makes it challenging to obtain precise and consistent results for comprehensive cognitive testing.
To overcome these limitations, we propose the use of an AMD mini PC, weighing only 0.47 kg, powered by AyzenAI engine and AMD radeon graphics card. This system, utilising customised artificial neural network algorithms will effectively detect noise components, especially those from motion artifacts (MA). This integrated system provides a balance between computational efficiency and the ability to accurately observe brain activities over extended periods. The objective of this project is to develop an optimal low-power solution for motion artifacts removal in post-processing for fNIRS devices, ensuring longer operation times and enhanced usability for cognitive testing and potential therapeutic neuromodulation.
2. Approach2.1 Introduction
Functional near-infrared spectroscopy is a non-invasive imaging technique employed to monitor brain activity by measuring blood flow. This technique is extensively utilised in neuroscience, clinical neurology, and personalised healthcare. However, fNIRS data is frequently contaminated by motion artifacts, which are unwanted signals resulting from the movement of the subject or equipment during data collection. These artifacts can significantly degrade the quality of the data, complicating the accurate interpretation of results. This project introduces a novel approach that leverages both denoising autoencoder (DAE) and a modified ResNet-50 to remove motion artifacts. The DAE and ResNet-50 models are optimised and deployed on AMD RyzenAI hardware using AMD RyzenAI software.
2.2 Motion Artifacts
Traditional methods for removing motion artifacts from fNIRS data typically involve complex algorithms that require significant computational resources. These methods are usually performed offline, meaning that the data must be processed after it has been collected, which is not suitable for real-time applications.
3. ImplementationWe used AMD Ryzen 7940HS as a heterogeneous platform, which includes AMD’s 8-core CPUs and inference processing units (IPUs). These IPUs are equipped with an FPGA-based AI engine capable of handling up to four concurrent data streams, which is essential for meeting the inference demands of artificial neural networks. The workflow demonstrates quantised IPU engine inference for Residual Neural Networks (ResNet) and Denoise Neural network (DAE) with a hardware/software codesign approach.
3.1 Data collection and processing
The experimental data download from was pre-processed using Homer3. Simulated data was generated using MNE. To create a ground truth dataset for training and evaluation, motion artifacts were manually inserted into the data. These artifacts included random spikes and baseline shifts, which represent common types of motion artifacts encountered in fNIRS data. The data was then divided into 512x1 vectors, which served as input to the neural network model training .
3.2 Neural Network Training
- Dataset Preparation:
The experimental dataset consists of 1,392 resting-state fNIRS data samples, from which experts identified and selected data from 23 channels free of motion artifacts. The simulated dataset, generated using the MNE toolbox, consists of raw resting-state fNIRS data from 8 channels, spanning 300 seconds, with totally 2,808 samples.
#load experiment data
class RealSignalDataset(Dataset):
def __init__(self, clean_file, noisy_file):
clean_data = pd.read_csv(clean_file, header=None)
noisy_data = pd.read_csv(noisy_file, header=None)
# Check the shape of the data
print(f"Shape of clean_data: {clean_data.shape}")
print(f"Shape of noisy_data: {noisy_data.shape}")
self.clean_signals = torch.tensor(clean_data.values, dtype=torch.float32).unsqueeze(1).unsqueeze(1)
self.noisy_signals = torch.tensor(noisy_data.values, dtype=torch.float32).unsqueeze(1).unsqueeze(1)
print(f"Shape of clean_signals: {self.clean_signals.shape}")
print(f"Shape of noisy_signals: {self.noisy_signals.shape}")
def __len__(self):
return len(self.clean_signals)
def __getitem__(self, idx):
return self.noisy_signals[idx], self.clean_signals[idx]
#usage in main function
train_dataset = RealSignalDataset('./motion_free_signal.csv', './signal_with_motion_artifacts.csv')
- Dataset Labelling:
Both datasets were segmented into data fragments of 512x1 dimensions. Additionally, spike-like motion artifacts and baseline shift motion artifacts manually inserted into both simulated and experimental datasets. The locations of these artifacts were randomised, as were the magnitudes of the spikes and the heights of the baseline shifts. The experimental dataset with such labelled noise is shown in following figure, contains multiple motion artifacts in single channel light intensity data.
The simulated dataset is shown in following figure, contains randomly inserted motion artifacts (each sample contains at most one motion artifact).
- DAE Model:
The DAE model is defined as following:
class EnhancedDenoisingAutoencoder(nn.Module):
def __init__(self):
super(EnhancedDenoisingAutoencoder, self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.MaxPool2d((1, 2)),
nn.Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.MaxPool2d((1, 2))
)
self.decoder = nn.Sequential(
nn.Upsample(scale_factor=(1, 2), mode='nearest'),
nn.Conv2d(64, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.Upsample(scale_factor=(1, 2), mode='nearest'),
nn.Conv2d(64, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.Conv2d(32, 1, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
The simulated dataset was utilised for training the DAE models for 50 epochs, while the experimental dataset was used for evaluation and testing purposes, as shown in below.
def main():
global device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = EnhancedDenoisingAutoencoder().to(device)
# Use the original dataset for training and calibration
train_dataset = RealSignalDataset('./clean_simulated.csv', './noisy_simulated.csv')
train_size = len(train_dataset) - 500
calibration_size = 500
train_subset, calibration_subset = random_split(train_dataset, [train_size, calibration_size])
train_loader = DataLoader(train_subset, batch_size=10, shuffle=True)
# Extract calibration data
calibration_loader = DataLoader(calibration_subset, batch_size=10)
# Use the new datasets for evaluation
test_dataset = RealSignalDataset('./clean_experimental.csv', './noisy_experimental.csv')
test_loader = DataLoader(test_dataset, batch_size=10)
# Inspect the model before quantization
sample_input = torch.randn(10, 1, 1, 512).to(device)
inspect_model(model, sample_input)
criterion = nn.MSELoss()
optimizer = Adam(model.parameters(), lr=0.001)
# Train the model
model.train()
for epoch in range(50):
for inputs, targets in train_loader:
inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}, Loss: {loss.item()}')
# Save trained model
torch.save(model.state_dict(), 'model_state.pth')
- ResNet-50 Model
The training dataset and evaluation dataset for ResNet-50 is identical as DAE. ResNet-50 is fine-tuned as follows:
def train_and_export_model():
global device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = resnet50(weights='ResNet50_Weights.IMAGENET1K_V1')
model.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
model.fc = nn.Linear(model.fc.in_features, 512) # Adjust for regression output
model = model.to(device)
train_dataset = RealSignalDataset('./clean_simulated.csv', './noisy_simulated.csv')
train_size = len(train_dataset) - 500
calibration_size = 500
train_subset, calibration_subset = random_split(train_dataset, [train_size, calibration_size])
train_loader = DataLoader(train_subset, batch_size=32, shuffle=True)
test_dataset = RealSignalDataset('./clean_experimental.csv', './noisy_experimental.csv')
test_loader = DataLoader(test_dataset, batch_size=32)
criterion = nn.MSELoss()
optimizer = Adam(model.parameters(), lr=0.001)
top_models = [] # Min-heap to keep top 5 models
model.train()
for epoch in range(50):
for inputs, targets in train_loader:
inputs = inputs.to(device) # (N, 1, 1, 512)
targets = targets.to(device) # (N, 1, 1, 512)
optimizer.zero_grad()
outputs = model(inputs).view(-1, 1, 1, 512) # Adjust output shape to (N, 1, 1, 512)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}, Loss: {loss.item()}')
# Evaluate the model using test data
test_loss, test_mse, test_rmse = evaluate(model, test_loader, criterion)
score = test_mse + test_rmse # Combined score with 1:1 weight
save_top_models(top_models, model, epoch, score)
print(f'Epoch {epoch+1}, Test MSE: {test_mse}, Test RMSE: {test_rmse}, Score: {score}')
save_models(top_models)
print("Top models saved.")
dummy_input = torch.randn(1, 1, 1, 512).to(device) # Adjust dimensions as needed
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=13, input_names=['input'], output_names=['output'])
The onnx model snipshot for DAE and ResNet-50 are shown below:
3.3 AMD Ryzen AI quantisation Setting
To take advantage the Ryzen AI engine capability, we utilised AMD Ryzen AI development flow to implement quantisation techniques.
Code screen shot for the quantalisation:
def quantize_model(preprocessed_model_path, output_model_path, clean_file, noisy_file):
dr = ResnetCalibrationDataReader(clean_file, noisy_file, batch_size=32)
vai_q_onnx.quantize_static(
preprocessed_model_path,
output_model_path,
dr,
quant_format=vai_q_onnx.QuantFormat.QDQ,
calibrate_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
activation_type=vai_q_onnx.QuantType.QUInt8,
weight_type=vai_q_onnx.QuantType.QInt8,
enable_dpu=True,
extra_options={'ActivationSymmetric': True}
)
print('Calibrated and quantized model saved at:', output_model_path)
# Load the pre-trained model
model = EnhancedDenoisingAutoencoder()
model.load_state_dict(torch.load("model_state.pth", map_location=torch.device('cpu')))
model.eval()
input_tensor = torch.randn(1, 1, 1, 512) # Ensure this matches your model's expected input shape
export_model_to_onnx(model, input_tensor, input_model_path)
# Preprocess the ONNX model
preprocess_onnx_model(input_model_path, preprocessed_model_path)
# Quantize the preprocessed model
quantize_model(preprocessed_model_path, output_model_path, clean_file, noisy_file)
# Load the quantized model for inference
quantized_model = onnx.load(output_model_path)
3.4 Neural Network Inference setting
The DAE network Inference Screenshot:
# Load the quantized model for inference
quantized_model = onnx.load(output_model_path)
# providers = ['CPUExecutionProvider']
# provider_options = [()]
providers = ['VitisAIExecutionProvider']
cache_dir = Path(__file__).parent.resolve()
provider_options = [{
'config_file': 'vaip_config.json',
'cacheDir': str(cache_dir),
'cacheKey': 'denoise_cache'
}]
session = ort.InferenceSession(quantized_model.SerializeToString(), providers=providers, provider_options=provider_options)
# Create the test data loader
test_loader = create_dataloader(clean_file, noisy_file, batch_size=32)
The code sample for ResNet-50 network inference:
# Load the quantized model for inference
quantized_model = onnx.load(output_model_path)
cache_dir = Path(__file__).parent.resolve()
if use_cpu:
providers = ['CPUExecutionProvider']
provider_options = []
else:
providers = ['VitisAIExecutionProvider']
provider_options = [{
'config_file': 'vaip_config.json',
'cacheDir': str(cache_dir),
'cacheKey': 'modelcachekey'
}]
session = ort.InferenceSession(quantized_model.SerializeToString(), providers=providers, provider_options=provider_options)
# Create the test data loader
test_loader = create_dataloader(clean_file, noisy_file, batch_size=32)
3.5 Demostration
The successful deployment of the DAE model on IPU of AMD Ryzen 7940HS chip is demonstrated in below, following the quantisation process.
The successful deployment of the ResNet-50 model on IPU of AMD Ryzen 7940HS chip is demonstrated in below, following the quantisation process to achieve enhanced performance and efficiency.
3.6 Performance Evaluation
- Quality evaluation
The quality evaluation of motion artifact correction evaluation was performed using five randomly chosen samples, as shown in the plotted image (Fig. 3.5.a & Fig. 3.5.b). The input signal ( blue solid line ), the model's predicted output (orange dashed line), and the target ( orange dotted line) indicate that the model corrected three shift motion artifacts (Example1, Example2, Exampl5) and two spike motion artifacts (Example 3 and Example 4) using DAE model. The results has similar trends to those observed in the ResNet-50 model.
- Quantitative evaluation
To evaluate the predictions of artificial neural networks with standard motion artifacts removal methods, such as spline interpolation and principal component analysis (PCA), the mean squared error (MSE) is chosen as the evaluation metric. As demonstrated in the following analysis, the Resnet-50 outperforms other methods.
# Evaluate the model
mse, rmse, avg_inference_time, avg_memory_usage, outputs, targets = evaluate_model(session, test_loader)
print(f'MSE: {mse}')
print(f'RMSE: {rmse}')
print(f'Average Inference Time: {avg_inference_time} seconds')
print(f'Average Memory Usage: {avg_memory_usage / (1024 ** 2)} MB') # Convert bytes to MB
# Show examples
plot_examples([test_loader.dataset[i][0].squeeze().numpy() for i in range(5)], out
puts[:5], targets[:5])
Table in below is to show the comparison of inference time and memory usage.
To compare the efficiency ratios of the DAE and ResNet-50 models under various hardware configurations, a box plot was generated using CPU and IPU operational data. The CPU operates within a frequency range of 4.0 GHz to 5.2 GHz, while the IPU operates at a fixed frequency of 1.0 GHz. Load distributions were considered with ResNet-50 at 99.49% on IPU and 0.51% on CPU, and DAE at 87.3% on IPU and 12.7% on CPU, alongside 100% CPU usage scenarios. Efficiency ratios were calculated as inversely proportional to the load and frequency, combining CPU and IPU contributions where applicable. The box plot visualized these ratios across four categories: DAE with 100% CPU, DAE with combined CPU and IPU, ResNet-50 with 100% CPU, and ResNet-50 with combined CPU and IPU, effectively illustrating the performance impact of different hardware setups.
The integration of CPU and IPU for model inference demonstrates a significant enhancement in power efficiency, achieving up to a 5.1-fold improvement compared to using the CPU alone. This substantial increase in efficiency greatly extends battery life, making it particularly advantageous for portable applications, such as portable fNIRS health monitoring systems.
4. Monte Carlo SimulationWe also use the MCX toolkit to simulate photon propagation paths based on Monte Carlo to evaluate the physical 3D light distribution accuracy at a biological plausible brain model based on AMD Radeon 780M GPU.
The initial absorption coefficients, summarised in Table below were referenced from similar studies. The number of photons that used for the brain model is 1e7.
The screenshot of simulation results in shown in below:
The implementation of the DAE model and ResNet-50 model on AMD RyzenAI hardware/software co-design on the AMD miniPC represents a significant advancement in power efficiency for fNIRS data processing. This project combines the efficacy of deep learning with the practicality of embedded system deployment, offering a highly efficient solution for removing motion artifacts from fNIRS data. We found that the RyzenAI development flow is exceptionally developer-friendly, with clear and straightforward steps from model preparation and quantisation to deployment. This work opens up the opportunity to bring fNIRS from laboratory research into the broader community, paving the way for more advanced and efficient fNIRS data processing techniques and contributing to the wider field of brain imaging and neurotechnology.
Comments