PROJECTSTORY
IntroI saw the link and wanted to work on a project utilizing Arducam, rp2040 and W5100s. (https://github.com/Innovation4x/WIZnet-EVB-Pico-ArduCam)
We've asked ChatGPT to summarize the contents of an existing UCC link and to create an AI project that can be linked to the above project.
#Chat GPT Examples
This project is about upscaling images with AI using the W5100S-EVB-Pico and Arducam.
The project started with an interest in a project utilizing Arducam, rp2040, and W5100s. To this end, we asked ChatGPT to summarize the content of the existing UCC links and create an AI project that coulThis project is about upscaling images with AI using the W5100S-EVB-Pico and Arducam.
The project started with an interest in a project utilizing Arducam, rp2040, and W5100s. To this end, we asked ChatGPT to summarize the content of the existing UCC links and create an AI project that could be linked to the above projects. As a result, ChatGPT proposed using AI to upscale images.d be linked to the above projects. As a result, ChatGPT proposed using AI to upscale images.
As you can see, chatgpt suggested upscaling the image using AI.
AI ModelFollowing chatgpt's recommendation, I looked up a few models that can upscale images and found the Real-ESRGAN model to be the best.
https://github.com/xinntao/Real-ESRGAN
Real-ESRGAN is an AI model that stands for Enhanced Super-Resolution Generative Adversarial Networks. It is capable of transforming low-resolution images into high-resolution images.
The model is based on the concept of Generative Adversarial Networks (GAN). GAN consists of two neural networks, the generator and the discriminator, which compete against each other during the learning process. The generator aims to produce fake data that is similar to the real data, while the discriminator aims to distinguish between the generated fake data and the real data. Through this competitive process, the generator gradually generates data similar to the real data, and the discriminator becomes better at distinguishing between real and fake data.
ESRGAN applies this GAN concept to the generation of ultra-high-resolution images. Particularly, ESRGAN has several improvements over the existing SRGAN (Super-Resolution Generative Adversarial Networks). One of them is the use of a structure called Residual in Residual Dense Block (RRDB). RRDB adds a Dense Block to the existing Residual Block, allowing more information to be preserved and better reproducing the details of the image.
Moreover, Real-ESRGAN has evolved into an optimized model capable of supporting facial enhancement by integrating with GFPGAN and even restoring animation images/videos. Through this model, various projects can be conducted to enhance low-resolution images into high resolution.📷📷
The project at this link is an image captioning model that uses transformers. It was trained by @ydshieh in flax and this is the PyTorch version of it.
The model takes an image as input and generates a caption for the image. The model uses Vision Transformer (ViT) as the encoder and GPT-2 as the decoder. The encoder processes the image and generates a sequence of image features, which are then fed into the decoder to generate the caption.
Here is a sample code for using the model:
https://www.hackster.io/louis_m/w5100s-poe-web-camera-88002f
See the link above to build the hardware by combining the W5100s-evb-pico board with the arducam, circuitpython to get the webcam working.
We used the Bundle for Version 7.x of the CircuitPython libraries, and for the Adafruit_CircuitPython_wiznet5k library, we used the 1.12.15 release version.
https://circuitpython.org/libraries
https://github.com/ArduCAM/PICO_SPI_CAM/tree/master/Python
https://github.com/adafruit/Adafruit_CircuitPython_Wiznet5k/releases/tag/1.12.15
We have changed the existing streaming method to a capture method, and lowered the resolution as much as possible for quick capture.
CurationThe rest was carried out in VS Code. The code was written in Python, and we saved images captured via Arducam, then proceeded to upscale these images four times.
Here is an example of upscaling using an image of IU.
You can refer to the detailed code on Github.
https://github.com/WiznetAI/CCC_image_upscaling_esrgan_img2txt_with_GPT
inferenceI wanted to do an image to text example using the GAN project above, so I wrote some multimodal code that utilizes GPT with image-captioning to make inferences from pictures. It should be a useful reference. This code will be useful for extending AIOT.
Code that utilizes GPT as an API to take an upscaled image and create and store a name for itself
nextstepWhile the example is of a human face, natural upscaling is possible for a variety of images used in real life, not just people.
As a next project, we are considering video upscaling, and we plan to upgrade our features by adding a function that describes the photo using an AI model that provides image-captioning.
ESRGAN applies this GAN concept to the generation of super-resolution images. In particular, ESRGAN has a number of improvements over traditional Super-Resolution Generative Adversarial Networks (SRGANs), one of which is the use of a structure called Residual in Residual Dense Block (RRDB). RRDB adds a Dense Block to the existing Residual Block, which allows it to preserve more information and better reproduce the details of the image.
In addition, Real-ESRGAN has been integrated with GFPGAN to develop an optimized model that can support face enhancement and restore animated images/videos. With this model, we can work on various projects to enhance low-resolution images to high resolution.
how to?VScode(Python3)
You can utilize the tutorial python file to run it by setting the input output to pico appropriately.
!git clone https://github.com/jh941213/w5100s_image_upscaling
!git clone https://github.com/xinntao/Real-ESRGAN.git
Refer to the code above and run it slowly You must have a Cuda-ready PC environment!
That concludes this post thaks!
Comments
Please log in or sign up to comment.