3D models are hard to make. They require a lot of time and effort to create. Skilled artists, which are in short supply, are needed to create them.
The demand for 3D models is increasing, as game worlds become bigger and more realistic and simulations become more complex and detailed.
What if we could automate the process of creating 3D models?
ConceptWhat if we could generate new 3D models from existing ones? We could use a generative AI model to generate new 3D models by letting the model learn the structure of existing 3D models. The model would be trained on a dataset of 3D models, and then be able to generate new 3D models.
This concept already exists for 2D images, for example Stable Diffusion or DALL-E 2. We want to apply this concept to 3D models.
This AI model could then be packaged into a tool that can be used by 3D artists to generate new 3D models in their preferred 3D modeling software. Another use case would be on-the-fly generation of 3D models in games or simulations. The model would run locally in both cases, as the hardware required to run the model is already available locally in these situations.
First AttemptThe datasets we used are ShapeNet and ModelNet. They contain many different classes of 3D models, but we only used the airplane class for this project.
For our model, we first tried to use a Variational Autoencoder (VAE) in Tensorflow. As a starting point, we used prior work from a different lecture at our university. We adapted the code to work with 3D models instead of 2D images.
The results were not very good, as can be seen in the following image of a generated model:
As we had forgotten to normalize the input data, the model was not able to learn correctly. We also didn't really understand the loss function of the VAE. This part is non-trivial, as the loss function is based on advanced concepts from statistics and probability theory and has to be adapted to the specific use case.
Second AttemptIn addition to fixing the issues with the first attempt, we also wanted to try bigger models, which we could not run in Google Colab. We had access to a new RDNA3 GPU, but it is currently not supported by Tensorflow. This was a good opportunity to learn how to use PyTorch, which supports RDNA3.
We rewrote everything in PyTorch from scratch. To gain a better understanding of the theory behind VAEs, we watched a very good video on this topic.
We also made an ordinary autoencoder to try to figure out a good model architecture. VAEs and AEs are very similar, but VAEs have a latent space that is normally distributed. Figuring out a good model architecture as a starting point for the VAE is easier with an AE, as it doesn't have the additional constraints of the VAE.
Here we tried networks with fully connected layers and networks with convolutional layers. The convolutional networks performed better, so we decided to use them for the VAE.
Here's an example of a reconstructed airplane model from the AE:
Using the same model architecture, we then trained a VAE. Here is an example of a generated airplane model:
The result is worse than that of the AE, but it is still recognizable as an airplane if you squint your eyes a bit ;).
ConclusionWe were able to generate new 3D models using our model. The results are not very good, the model also only generates very similar point clouds with little variation. One reason for this could be that the dataset is too small.
We also found a paper that has the same issues, solved by using a different reconstruction loss function. We use the chamfer distance here, which describes the distance to the nearest point in the reconstructed point cloud for each point in the original point cloud. The paper proposes the earth mover's distance, which describes the distance between two point clouds as the minimum cost of transforming one point cloud into the other.
Comments
Please log in or sign up to comment.