For the project that is a part of the AMD Pervasive AI Developer Contest, we decided to explore how to combine Large Language Models (LLM) with text-to-image (TTI) models in an interesting application. We applied the project to a game-like application where you cooperatively work with AI to continuously generate a never-ending story which is at the same time presented with images.
SolutionStoryteller is an application that uses AI to generate an adventure fantasy story in which the user can participate in creation and direction of the story. At the same time the story is visualized trough images.
When the user arrives at the application webpage, a new story is started. The LLM model generates an output, which is displayed to the user. At the same time, the initial visualization of the story is generated and displayed.
When the LLM finishes with the introduction of the story, user can input their part of the story. The new part of the story is added to the initial one and LLM is used to generate continuation of the story. The existing story is again visualized with an image on the right of the page.
Then the cycle is repeated. The user can input their ideas and the LLM continues the story and an image is generated.
Even though the story never ends per say, it is very interesting and addictive experience exploring fantasy worlds.
System overviewWe designed the system to be modular, so we created two servers one for LLM model and the other one for TTI model. The logic of the application and the front-end were combined in a third server.
For the development we used Python programming language. The AI model servers were deployed in AMD Accelerator Cloud where we used AMD Instinct MI210 accelerators. The basis for using the accelerators is the AMD ROCm Software. In our case, we used provided container based on PyTorch v2.1.2 and ROCm v6.1.2.
Models used were Llama3.1 8B Instruct for the LLM model and Stable Diffusion 3 medium model for TTI. This two models are one of the most recent iterations and most capable models. For building servers around the models for inference we used transformers and diffusers libraries with their pipe generation capabilities. The servers were implemented based on the FastAPI library. For the Llama3.1 server we implemented streaming API which allows us to display the text to the user as it is being generated.
For the main application we used simple Streamlit python library. In Streamlit we designed the logic and the front-end. We leveraged some of our knowledge form another project on Hackster.io, where we already implemented asynchronous capability for Streamlit as it is by default synchronous. The application was deployed on our local machine. To enable connection to the model servers in the AMD Accelerator Cloud we made SSH tunnels for the required ports.
Below is a schematic of the end system.
The front-end is exposed on localhost. When the user connects to the webpage, the Storyteller asks the LLM to create and introduce a world for the story. Next, the LLM is used to introduce the main protagonists. In the meantime, an image is generated with TTI model. First, the existing story is condensed by another request to the LLM and the generated prompt given to the TTI and the generated result is displayed.
When the LLM stops introducing the protagonists, input for the user is enabled to provide ideas how the story continues. When the user prompt is received, a new request to the LLM is made to continue the combined story. At the same time, a prompt for the image generation is done and sent to TTI. The generated image replaces the previous one.
Below is a schematic of the program flow.
To run the Storyteller application you can follow our code repository. There you will find detail instructions on how to run the server and the Streamlit application in the repository README file. Here is just a short recap.
First, setup the servers. If you have access to the AMD Accelerator Cloud all of the information is provided in our repository on how to create the workload with the container, clone the repository and run the servers. Servers will take some time to setup for the first time as the models are downloaded. To download the models, setup a Hugging Face and request access to both of the models: Llama3.1 8B Instruct and Stable Diffusion 3 medium.
There is also a possibility to install everything on another ROCm supported HW or even others but you would have to set the proper missing libraries.
When the servers are set and the SSH tunnel opened if you are using AMD cloud, install and run also the Storyteller main application. You will be then able to connect to it on your localhost port 8501: http://localhost:8501.
ConclusionWe think that we developed an interesting application that people will find entertaining and have a lot of fun while exploring new stories. The Storyteller application is a deviation from our proposal to the competition but we think that it is even more interesting to a broader audience.
During this project we learned a lot. Especially about how to develop in the cloud environment, how to use pipelines, and a lot about AI models with the focus on inference. When we climbed over the initial learning curve on how to use the AMD Accelerator Cloud, we see it now as a very convenient way for development as the software is already setup for you.
For our project we have many additional ideas that we will work on in the future. We will integrate Storyteller with speech-to-text and text-to-speech capabilities to make it even more user friendly.
Comments