Arti - an overview
Art comes in many beautiful forms, but you can only fit so many paintings on your wall. What looks interesting one day might fail to catch your attention the next. What if you could have infinitely diverse artwork generated every few minutes? What about art work based on your conversations? What if it was completely powered by AI? Welcome to this tutorial, where we will be showing you how to build Arti, your personal art-bot.
Arti will capture your conversations and create images in real time. It’s powered by many forms of Generative AI, which help it create accurate yet distinct images relatable to the conversation at hand. It’s also completely open-source, meaning that you can create your own art-bot with just a Jetson.
We created Arti to constantly generate inspirational, meaningful, and relevant works of art, and we hope you enjoy using it as much as we did. Let’s get started!
How Arti works:
Arti turns your conversations into images over three steps:
First, it starts a new recording of a conversation every 60 seconds. This means the image will change every 60 seconds - you can adjust this according to your preferences. It will save the recording as an audio file and send it to Whisper. Whisper is an open-source speech-to-text software recently released by OpenAI, with automatic speech recognition built in. It will write the audio file into text.
Next, LLaMa 2, an open-source large language model similar to ChatGPT, will translate this raw text into a prompt which is more understandable for Stable Diffusion (our image generation model, also open-source). This step will allow Stable Diffusion to generate more accurate images representing the topic of the conversation.
Finally, Stable Diffusion will receive its prompt from LLaMa 2, and generate the image. In this way, your art bot will be able to generate images based on your conversation!
We won’t be writing code in this tutorial, but you can access it at the top of this Github page. However, please read through everything first - your device may not satisfy the specifications for this project, and you may not have everything set up on your device.
Since we didn’t want our conversations to have the potential to be shared, we stored everything in a local directory, which also enabled offline access.
Examples
Following are a few of our favorite examples of the images Arti generated, along with the conversations taking place at those times.
EXAMPLE 1:
In this example, two people were discussing a diagram explaining nuclear fission.
Conversation: and this lighter isotope is less tightly bound. Wait, what? Compared to its... And this light... Wait, wait, what happened here? ...lighter isotope is less tightly bound. That's an isotope. Compared to its... And this, but roughly one in every 140, lacks three neutrons, and this lighter isotope is less tightly bound. It's not on the... Compared to its more abundant cousin, a strike by a neutron easily splits the U-235 nuclei into lighter radioactive elements called fission products, in addition to two to three neutrons, gamma rays, and a few neutrinos. During fission, some nuclear mass transforms into energy. A fraction of the newfound energy powers the fast moving neutrons. And if some of them strike uranium nuclei, fission results in a second larger generation of neutrons. So that's a little bit very complicated. It's very complicated. It's not . So there's three types of radioactivity that happen with uranium, alpha, beta, and gamma. And so the gamma first starts and then does these other two. It's more complicated than this. This video will not really explain it very well. But it starts a nuclear reaction. The gamma starts, you know, things that happen. Yeah, but maybe I can more actually show what it looks like. If you want actual uranium? Like this. Well, you want to know the nuclear... No, like this. You want to know the uranium really actually, how it works. Uranium decay, okay? That's what it is. That's if you really want to... Don't take each of your autonomous. This is how uranium becomes the chain reaction. It's a physics one, but it might be complicated. I don't know, just go back for one second. Yeah, this one. This is fission. There's something called fission and fusion in nuclear reactions. Do you want to know? Let's try this one, and if it's too hard. Yeah, but there's another one. Actually, how it works. Do you want to know? Let's try this one if it's too hard. Yeah, but there's another one. Actually how it works. Not like, hmm, but the thing is. Like actually how it works. But do you want to know the reaction of the radio? No, it just goes back to normal. So this new Wi-Fi camera is taking the US by storm. This brand new subscription. Electrobel has seven nuclear power plants, four in Doule and three in Tiange, covering half of the electricity consumption in Belgium without producing CO2. But how exactly does a nuclear power plant work? A nuclear power plant works to a large extent exactly does a nuclear power plant work? A nuclear power plant works to a large extent like a conventional thermal power plant. Water is converted into steam which drives a turbine connected to a generator. This generator converts the mechanical energy into electrical energy. The only difference is that the heat which converts water into steam is produced by nuclear fission and not by burning coal, natural gas or biomass. The nuclear power plants of Doubs and Thiages use fissile uranium.
LLaMa's Interpretation: A nuclear power plant with a glowing core and steam rising from its cooling towers, surrounded by a futuristic cityscape with sleek skyscrapers and neon lights.
Image generated:
EXAMPLE 2:
In this example Whisper didn't interpret all the audio correctly - for example, "San Diego" was its interpretation of an alarm going off on a phone - which was one of the reasons why we chose to use LLaMa 2.
Conversation: This is what San Diego San Diego San Diego San Diego San Diego San Diego San Diego San Diego San Diego San Diego San Diego San Diego San Diego I'm sorry. Where? I want to. Dad? Yeah? Mom needs you for pancakes. In the middle of something, man. Do I have to? It's fine. They don't want pancakes. That's like, I don't want to. Mom. Mom, I'm in the middle of something with Maya. Look it's raining outside. I don't want to do pancakes right now. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. Mom. I don't want to do pancakes right now. Thank you. I'm sorry. . . . . . . . . . . . . . . . . . Remember, because he's doing the virtual environment.
LLaMa's Interpretation: A young boy, surrounded by rain and fog, reluctantly helps his mom make pancakes while using a virtual reality headset.
Image generated:
EXAMPLE 3:
In this example a video about training autonomous cars was being played.
Conversation: not reversing the vehicle, and not requiring online calculations. Additionally, where all previous methods have used low constant speeds, our method uses variable speeds up to 6 m per second and ensures the vehicle remains within the friction limit. We presented a method of safe learning that reformulates reinforcement learning to incorporate the supervisor. The safe learning method was evaluated in the F-10-1-10 simulator at speeds of up to 6 meters per second. The results showed that safe learning presents a 5x or 5 times improvement in sample efficiency, requiring only 10,000 steps. The supervisor and the learning formulation effectively train the agent to not require supervision. The safe learning agents select lower speed profiles than the conventional learning agents. This results in the safe learning agents achieving slower lap times and higher success rates. A major advantage of our methods is that the vehicles never crash during training. Future work should use this method to train agents on board physical vehicles. The ability to train agents for high performance robotic control while ensuring safety during the training process means that these methods can be used to train deep reinforcement learning agents on real world robots, thus bypassing the sim to real problem. Future work should evaluate how well safe learning uses the supervisor performs using the supervisor performs on real-world high-performance platforms. Bypassing a simple gap will mean that there is no difference between the training and testing behavior since both will be on the same physical device. The improvement in sample efficiency means that it is easier to use deep reinforcement learning since training time is reduced. Training more conservative policies leads to safer solutions which are essential for
LLaMa's Interpretation: Generate an image of a high-performance robotic vehicle navigating a challenging track while ensuring safety during training.
Image generated:
Video
And now, a short video to demonstrate Arti in action!
Conclusion
We hope you enjoy using Arti! Remember that it’s not perfectly accurate, but you may find the implications entertaining.
Thank you and happy generating!
Comments