The main idea of the project was to solve one of the main problems of modern AI systems, which is that almost if not all AI systems requiere copious amounts of computational power to run. Meaning that most people can't locally run the model on their computer and the models have to be run in server wich consume a lot of energy. For instance 1 message to chat GPT produces 4, 32 grams of CO2. Which may seem little but that is aproximately 43, 200 kilograms of CO2. And the problem is not only the contamination but the cost of operation as Chat GPT costs 700, 000$ to operate per day. To solve said problem I have designed a simpler AI model that works with a simplified and modified version of a transformer model trained from scratch by me. The model went through a lot of model iterations and changes. After training the best iteration (Which was model 14) the outputs are sometimes comparable or even better that GPT 2 but operating on less than the half of the paramteres and memory. While my model has 64, 070, 048 parameters and weights 733Mb GPT 2 has 1, 500, 000, 000 parameters and weights an estimage of 6 Gb. The idea behind the model was to provide a lightweight base model for developers to fine-tune it to use it for their own programs and purposes.
Model:The model uses a simplified transformer architecture with 64, 070, 048 parameters. The model summary looks like this:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 49)] 0 embedding_2 (Embedding) (None, 49, 64) 1280000 transformer_block_33 (None, 49, 64) 50048 flatten_2 (Flatten) (None, 3136) 0 dropout_34 (Dropout) (None, 3136) 0 dense_34 (Dense) (None, 20000) 62740000 =================================================================
Total params: 64070048 (244.41 MB)
Trainable params: 64070048 (244.41 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Performance:The performance of the model varies depending on the prompt. It performs excelently (sometimes even better than GPT 2) on prompts that allow for a wide context of awnsers. On the other hand it doesn't perform as well on very specific prompts. Fortunately, the idea of the model was to provide a pretrained model for developers to train accordingly and the model can be fine-tuned to specific datasets the developers may need to use.
Here are some examples of the model output:
Max lenght: 30
Prompt: "That is" Model output: "That is a great way to keep everything you use in one place it helps reduce the time it takes to get your daily computer task done because it keeps it all"
Prompt: "Of course" Model output: "Of course i'll be starting to blog again next monday fresh and spirited up thanks for being such a great company in my alone time life's been great so far I think"
Prompt: "Do you ever" Model output: "Do you ever heard somebody say that you can’t re order tracks in your well today i’m going to show you just how to do that it is possible though it’s not easy"
Compared to the GPT model which outputs the following text:That is a very important point. The fact that the government is trying to make it harder for people to get jobs is a very important point. "The fact that the government is trying to make it harder for people to get jobs is a
Mollie vs GPT-2:
Here are 2 text 50 word generations side by side made by Mollie and GPT-2 made with the prompt "That is a"
GPT-2: That is a very important point. The fact that the government is trying to make it harder for people to get jobs is a very important point. "The fact that the government is trying to make it harder for people to get jobs is a
Mollie: That is a great way to keep everything you use in one place it helps reduce the time it takes to get your daily computer task done because it keeps it all in one place in a list form no more searching so to all over your is visit their website today for
The Mollie model performs much better than the GPT-2 model but using much less resources.
Uses for the model:This model wasn't intendended to be used as a chat AI like chat GPT. The purpose of this model is to make it easier for developers to create loacal generative AI apps that run on low computational power. For example a videogame developer could train the model with a conversational dataset he made for his videogame to implement the AI to the characters of his game to make the game more inmersive while not using the resources required for the game.
Limitations and future improvements:One of the main limitations while training this model was the dataset size, because the dataset it very big, the computer used to train the model had difficulty loading the entire dataset and training the model. To solve that only a fraction of the dataset has been used and a limited amount of epochs as the computer had a 8 hour training limit. Future improvements sould be training with the entire dataset and for more epochs.
Comments
Please log in or sign up to comment.