In our visual world, images appear before our eyes in consecutive frames. Then, during image processing, machines can often only store and present images in discrete frames. Images are usually stored in machines as two-dimensional arrays, but the complexity and precision of images are often governed by resolution. Furthermore, we often refer to distortions in which objects, images, sounds, waveforms, or other forms of information change their original shape (or other characteristics). And for this project, it's mostly about the high fidelity of the images that can be extracted at any resolution we want. When we look at a picture of an area of an image, we tend to see it by zooming in, but for low-resolution images it becomes blurry because when zooming in, when the resolution is too low, the stored numerical values in the computer The value of leads to the accuracy of the image. low, and remove some necessary or unnecessary pixel values (or features), so that the computer cannot display the features of the area in high-definition, which will cause a certain degree of distortion in the image or computer vision, and train an ideal neural network. For the network, we need to preprocess the images before training. This preprocessing reduces the resolution of all images in the dataset to images of the same size, which can reduce memory consumption. We then packaged and loaded the processed dataset into the network for training, but the computer caused a dramatic drop in image fidelity during processing. Therefore, we use the VCK5000 accelerated inference local implicit image function model to predict RGB values for the desired image resolution, so that we can take advantage of this accelerated model when capturing low-resolution images in low-resolution cameras or in dataset preprocessing, to obtain images of any resolution and size, while ensuring image fidelity or image accuracy.
The steps for my project are as follows:
First, let me introduce the local implicit image function proposed by Yinbo Chen et al. It takes the pixel coordinates and pixel values of an image and the two-dimensional depth feature values of all coordinates of the image as input, and then queries the image according to the coordinate information. Local depth feature values near the coordinates, predict their local RGB values, and reconstruct a new resolution image as output.
Secondly, I will introduce the project operation framework of the devices and functions used. I use the functions provided by them to further combine the low-resolution Raspberry Pi CSI camera to take pictures, and then input the pictures taken by the camera into the local implicit image function model quantized by VCK5000, and at the same time accelerate the inference, and finally output the desired arbitrary High-resolution images, so as to solve the problem that low-cost low-resolution cameras cannot capture high-resolution images, reduce the cost of the camera's high-resolution sensor module, and help convert images to high-resolution images, effectively ensuring high image fidelity.
Finally, I will introduce the steps of how to obtain and generate pictures. I use the camera to save low-resolution pictures after shooting, combined with the self-made GUI interface human-computer interaction, input the resolution size of the picture you want into the size input box, and then put your image save path into the path box, and then click the "Generate" button to generate a picture of any resolution you want, and save the picture to the file path you specify.
In this way, I can reduce the memory consumption of the image in the computer, and can also meet the requirement that when you zoom in and observe the image, you can see the local details of the image more clearly.
Comments