Overview
Requirements
Considerations
Hardware
Firmware
Software
Reference
AI Solution
Requirements
Making
AIOT SPEAKER

Published October 16, 2023 © Apache-2.0

Let AI create a song by your favorite singer

Discover the forefront of AIOT with AI speech synthesis. Merging WIZnet IoT speaker tech with AI TTS, experience real-time audio delivery.

BeginnerProtip1 hour4

Let AI create a song by your favorite singer

Story

Overview

Voice technology has many applications in the modern world. Consumers have come to expect voice interactions in smartphones, home devices, and services. In addition, a project that uses AI to learn the voices of popular singers to recreate their songs has recently attracted attention. Based on these trends, we have started a new project using WIZnet IoT speaker with AI TTS technology and singer voice recreation technology.

* The following is the original description, not mine. I used it because it is more descriptive.*

wiznet TOE contest 1st project

Requirements

Organize the requirements to achieve the goal and define and schedule detailed tasks based on them.

Utilize the high-speed characteristics of the W5300 chip.
Be able to deliver audio data in real time.
It should be able to output data from the network to the speaker.
It should be possible to check necessary information such as IP using an information display device such as an LCD.
Build own board and share the data.
Audio data can be sent from a PC or to the same board.
The development environment must be multiplatform. (Windows/Mac/Linux)
W5300 Control reads datasheets and uses registers to build libraries by hand.
Organize the requirements to achieve the goal and define and schedule detailed tasks based on them.Utilize the high-speed characteristics of the W5300 chip.Be able to deliver audio data in real time.It should be able to output data from the network to the speaker.It should be possible to check necessary information such as IP using an information display device such as an LCD.Build own board and share the data.Audio data can be sent from a PC or to the same board.The development environment must be multiplatform. (Windows/Mac/Linux)W5300 Control reads datasheets and uses registers to build libraries by hand.

Considerations

As you progress through the project, feel free to draw out and reflect on things to consider.

I make my own boards and enjoy the process.
Create additional I/O on the board beyond the basic functions so that it can be utilized as a development board.
How do you want to send audio data from the PC?
If you're creating a PC program, should it be a GUI?
Do I need a microphone sensor, and if so, should I mount it on the board?
Can we send video data as well?
Could you send motion data to control a remote robot?
As you progress through the project, feel free to draw out and reflect on things to consider.I make my own boards and enjoy the process.Create additional I/O on the board beyond the basic functions so that it can be utilized as a development board.How do you want to send audio data from the PC?If you're creating a PC program, should it be a GUI?Do I need a microphone sensor, and if so, should I mount it on the board?Can we send video data as well?Could you send motion data to control a remote robot?

Hardware

Schematic and Artwork Online with EasyEDA
SMT directly through JLCPCB

Firmware

Build Environments

CMake, GCC, Make
Build EnvironmentsCMake, GCC, Make

Editing Tools

Visual Studio Code
Editing ToolsVisual Studio Code

Libraries

STM32CubeH7

MCU library provided by ST
STM32CubeH7MCU library provided by ST

LittleFS

File system for embedded with failsafe features
LittleFSFile system for embedded with failsafe features

LVGL

Free GUI library for embedded use
LVGLFree GUI library for embedded use

WIZnet ioLibrary_Driver

dhcp, sntp, socket
WIZnet ioLibrary_Driverdhcp, sntp, socket
LibrariesSTM32CubeH7MCU library provided by STLittleFSFile system for embedded with failsafe featuresLVGLFree GUI library for embedded useWIZnet ioLibrary_Driverdhcp, sntp, socket

MCU code generator

STM32CubeMX
MCU code generatorSTM32CubeMX

Software

PySide6

A Python version of Qt6
PySide6A Python version of Qt6

MinGW GCC

For creating PC programs to update firmware
MinGW GCCFor creating PC programs to update firmware

Reference

chobaram blog & youtube & github

As a first place project, I have organized YouTube videos and related materials as blog posts during the implementation process, and I have organized H/W, S/W, and GUI code on github. I would like to introduce how our chip was used in the above project, and what principle and code were implemented first.

ToE Contest 1st project by chobaram

maker site link

AI Solution

DALL-e

Requirements

mp3 file original
CUDA GPU or Colab Pro
pretrained artist voice
ChatGPT
Bandlab
GaudiStudio

Reference RVC v2 Colab descriptions

Making

The first step is to download an MP3 file of the music you want and split the instrumental and vocals in GAUDIO STUDIO.

Gaudio Studio

If you go to the RVC v2 docs and read through them, there are a lot of different values for hyperparameters and a lot of different source code. In fact, the author and others have done a lot of parameter tuning experiments, so it's best to use the default values.

Find a link where a trained model of the singer you want has been saved and downloaded, either from a place with a lot of AI models like Hugging Face or through a community.

You can do this by running the colab code in the RVC v2 docs straight through.

Code

TTS

Google Colab

# Model load
url = 'https://huggingface.co/leeloli/rosiesoft/resolve/main/rosiesoft.zip'  #@param {type:"string"}
model_zip = urlparse(url).path.split('/')[-2] + '.zip'
model_zip_path = '/content/zips/' + model_zip

Find the trained model of the singer you want and put the link to the model's save location in the URL.

import os
import time
import fileinput
from subprocess import getoutput
import sys
from IPython.utils import capture
from IPython.display import display, HTML, clear_output
%cd /content/Retrieval-based-Voice-Conversion-{reeee}UI/

#@markdown Keep this option enabled to use the simplified, easy interface.
#@markdown <br>Otherwise, it will use the advanced one that you see in the YouTube guide.
ezmode = True #@param{type:"boolean"}
#@markdown You can try using cloudflare as a tunnel instead of gradio.live if you get Connection Errors.
tunnel = "gradio" #@param ["cloudflared", "gradio"]

if ezmode:
  if tunnel == "cloudflared":
    for line in fileinput.FileInput(f'Easier{weeee}.py', inplace=True):
      if line.strip() == 'app.queue(concurrency_count=511, max_size=1022).launch(share=True, quiet=True)':
        # replace the line with the edited version
        line = f'        app.queue(concurrency_count=511, max_size=1022).launch(quiet=True)\n'
      sys.stdout.write(line)
    !pkill cloudflared
    time.sleep(4)
    !nohup cloudflared tunnel --url http://localhost:7860 > /content/srv.txt 2>&1 &
    time.sleep(4)
    !grep -o 'https[^[:space:]]*\.trycloudflare.com' /content/srv.txt >/content/srvr.txt
    time.sleep(2)
    srv=getoutput('cat /content/srvr.txt')
    display(HTML('<h1>Your <span style="color:orange;">Cloudflare URL</span> is printed below! Click the link once you see "Running on local URL".</span></h1><br><h2><a href="' + srv + '" target="_blank">' + srv + '</a></h2>'))
    !rm /content/srv.txt /content/srvr.txt
  elif tunnel == "gradio":
    for line in fileinput.FileInput(f'Easier{weeee}.py', inplace=True):
      if line.strip() == 'app.queue(concurrency_count=511, max_size=1022).launch(quiet=True)':
        # replace the line with the edited version
        line = f'        app.queue(concurrency_count=511, max_size=1022).launch(share=True, quiet=True)\n'
      sys.stdout.write(line)
  !python3 Easier{weeee}.py --colab --pycmd python3
else:
    !python3 infer-web.py --colab --pycmd python3

This is the code for a task that utilizes the Gradio UI window to convert the model to the desired target data.

Despite the explanation above, we leave the parameters at their default settings. Of course, changing the parameters can improve performance, but it will take longer and may be more resource intensive. The current values are based on experiments by the original authors and what has been shared in the community.

UI Convert by gradio

1. in Choose model, press the refresh button several times if the model with the voice of the singer you want is not loaded well.

2. You can think of it as target data, which is the vocal file you converted to gaudilab earlier. This is how I want to train the data in 1 as the target data in 2.

3. optional is an octave-related parameter, which changes female -> male and male -> female +-12 using the parameter. I tried it with different octaves, but it's probably better not to use it unless you have a song that needs to be changed specifically. After converting, the synthesis proceeds. The result is a vocal file.

After the speech synthesis is completed, you can download the file and get the file. The parameters below are TTS-related parameters, but after experimenting with them, the best performance is to set them to the default values and convert them.

After that, various instrumental elements, such as drums, which are synthesized from the previously extracted sound source and the TTS result, must be combined again in a music-related work program. There are various programs such as Ableton, Cue Bass, Logic, etc., but this is easily available to those who have used it for some time, and if you are an Apple user, Garageband is said to be very good in cost performance. And you can combine them through BandLab, which is available on the web

AIOT SPEAKER

Result

Connect the IOT SPEAKER made above to the Ethernet communication and put the voice converted by AI into the GUI program.

1. IOT Speaker first prepares for communication by setting IP.2. When it is completed, press the Play button on the GUI program and the music will come out.3. You can use the speaker according to the desired output format with I2S and SAI. The red button controls the volume of SAI, and the blue button controls I2S.

simon

8 projects • 5 followers

I am interested in artificial intelligence and looking for ways to combine it with IOT.

Contact

Comments

Please log in or sign up to comment.

Embed the widget on your own site

Let AI create a song by your favorite singer

Let AI create a song by your favorite singer

Story

Overview

Requirements

Considerations

Hardware

Firmware

Software

Reference

AI Solution

Requirements

Making

AIOT SPEAKER

Credits

simon

Comments

Related channels and tags