This project is most suitable for beginners in GenAI and Python beginners-intermediates
What does this project cover? Several of the key GenAI concepts discussed in the Google Cloud Kaggle 5-day intensive GenAI course:Embeddings, Retrieval augmented generation (RAG), Vector Databases, Few-shot prompting, Prompt Caching, and more.
Project Aim ๐ฏAn AI assistant for a (fake) plant shop that I developed previously, named Planted!
This project uses the Google Gemini family of LLMs so you will need to generate an API key via AI Studio. You can find detailed instructions in the docs.
1: Smart FAQ (RAG)This part of the project involves using Retrieval Augmented Generation (RAG) to provide additional context (internal policy documents) to the LLM to help answer customer queries accurately.
1.1 Data (generate some dummy data)
My project data includes:
- ๐ Plant shop policies
- โ Sample FAQ questions; this gives us something to start with in our FAQ prompt-cache
1.2Creating the Vector Database
Initiate a vector database (e.g. ChromaDB) and assign your embedding function; this basically manages and calls an embedding model (e.g. text-embedding-004 from Google Gemini) for you when you are adding to/querying the database.
1.3Using vectorDB in RAG bot
The below diagram explains the RAG and prompt-caching process used in this project; re-using answers for similar queries where appropriate.
1.4Diagram explained:
- ๐ฌ First, the user asks their query e.g. "Can I get a refund?"
- The model then checks if a semantically-similar question has been asked previously
- This is done by converting the query to an embedding which is searched in the vector database for similar queries (measured using distance between them)
- This determines which route (or bucket ๐ชฃ) Is triggered next!
def ask_planted_bot(message: str, history: Optional[List[List[str]]] = None) -> List[str]:
"""Main function to ask the Planted bot a question."""
answer = None
query = message
cached_result_dict = retrieve_cached_answer(query)
if cached_result_dict and cached_result_dict["matched_answers"]:
if not cached_result_dict["further_investigation"]: # Bucket 1
answer = cached_result_dict["matched_answers"][0]["answer"]
cached_id = cached_result_dict["matched_answers"][0].get("cache_id")
if cached_id:
print(f"Found a high-similarity cached answer (Bucket 1) - ID: {cached_id}")
_increment_prompt_cache_hit_count(cached_id, "1")
# return answer
else: # Bucket 2
similar_questions = [item["answer"] for item in cached_result_dict["matched_answers"]]
cached_ids_bucket2 = [item.get("cache_id") for item in cached_result_dict["matched_answers"] if item.get("cache_id")]
print(f"Found moderately similar cached answers (Bucket 2) with IDs: {cached_ids_bucket2}, retrieving relevant docs...")
for cached_id in cached_ids_bucket2:
_increment_prompt_cache_hit_count(cached_id, "2")
relevant_docs = retrieve_relevant_docs(query)
if relevant_docs:
answer = generate_answer(query, relevant_docs, similar_questions=similar_questions)
print(f"Generated answer based on relevant docs and similar cached answers (Bucket 2)")
add_to_prompt_cache(query, answer)
# return answer
else:
error_message = "I'm sorry, I cannot find an answer to your question in our FAQs."
else: # Bucket 3
print("No highly similar cached answer found (Bucket 3), retrieving relevant docs...")
relevant_docs = retrieve_relevant_docs(query)
if relevant_docs:
answer = generate_answer(query, relevant_docs, similar_questions=None)
print(f"Generated answer based on FAQ documents (Bucket 3)")
# Add prompt to cache
add_to_prompt_cache(query, answer)
# return answer
else:
error_message = "I'm sorry, I cannot find an answer to your question in our FAQs."
follow_up_message = "Is there anything else I can help you with today?"
return [answer] if answer else [error_message, follow_up_message]
Bucket A: Very similar query found with <= 0.75 similarity score, so return the answer without any processing through the LLM needed.
Bucket B: Somewhat similar query/queries found; similarity score between > 0.75 and <=0.9; so return top X similar questions and retrieve data from larger document pool (top K relevant docs) - LLM uses these in answer generation (few-shot)
Bucket C: No similar queries found (>0.9 similarity score) then retrieve data from larger document pool (top K relevant docs)
- ๐Any new generated answers are added to the prompt cache
- Safe guarding and formatting (e.g. removing typos) applied to avoid adding irrelevant or badly written queries to the prompt cache ๐งน๐งฝ
- If no relevant information found in the documents, tell the user that we cannot answer their query at this time ๐
- ๐ฏ๐ฏ๐ฏ Prompt-cache "hit" count updated when either bucket A or B answers are located; this allows the FAQ questions to be sorted by the number of "hits" in the prompt-cache; the top 4 are displayed as suggestions in the chat-bot interface.
1.5Gradio Interface
Lastly, Gradio is used to spin up a nice chatbot interface
import gradio as gr
gr.ChatInterface(
fn=ask_planted_bot,
title="Planted Intelligence",
theme="ocean",
examples=top_n_questions_list,
type="messages",
chatbot=gr.Chatbot(placeholder="<strong>Hello! Welcome to Planted, your online plant shop.</strong><br>How can I assist you today?", type="messages"),
).launch()
This provides an easy mechanism to interact with the tool!
This tool involves function calling over a local database; to get information about the plants and plant products stocked at Planted.
2.1 Create sqlite database
Tables created:
- plants with columns: plant_id, plant_name, scientific_name, price, category, light_requirement, water_requirement, humidity_requirement, care_level, description
- plant_products with columns: product_id, product_name, description, application_plants, symptoms_addressed
โ ๏ธ Make sure to add enough data into the tables
2.2 Write functions which enable the LLM to interact with these
- list_tables() - Retrieve the names of all tables in the database
- describe_table - Look up the table schema, returns a list of columns and corresponding data types
- execute_query - Execute an SQL statement, returning the results
2.3 Provide the functions as Tools to the LLM
The model can now use these functions (tools) to answer user queries, for example:
- What is the cheapest plant you sell? ๐ธ๐ฑ
- What are the light and water requirements for a ZZ Plant?๐ฆ
- Do you have any Pothos that cost less than ยฃ15?๐ค
- Describe the Peace Lily and its light requirements๐ชด
# Functions defined above, used to interact with the database
db_tools = [list_tables, describe_table, execute_query]
instruction = """You are a helpful chatbot that can interact with an SQL database
for a computer store. You will take the users questions and turn them into SQL
queries using the tools available. Once you have the information you need, you will
answer the user's question using the data returned.
Use list_tables to see what tables are present, describe_table to understand the
schema, and execute_query to issue an SQL SELECT query."""
# Start a chat with automatic function calling enabled.
database_chat = client.chats.create(
model="gemini-2.0-flash",
config=types.GenerateContentConfig(
system_instruction=instruction,
tools=db_tools,
),
)
resp = database_chat.send_message("What is the cheapest plant?")
print(f"\n{resp.text}")
๐กThis example "What is the cheapest plant?" returns the following:
- DB CALL: list_tables()
- DB CALL: describe_table(plants)
- DB CALL: execute_query(SELECT plant_name, price FROM plants ORDER BY price ASC LIMIT 1)
The cheapest plant is Tulip Bulbs (Mixed Colors - 10 Pack), which costs $9.5.
Overall, database querying capability is very powerful but careful security considerations would be needed when looking at a real-world scenario!
๐ Potential Limitations (Tool 1: Smart FAQ)- Data Quality & Coverage: Incomplete or outdated policies could restrict the bot's ability to accurately answer a wider range of customer queries.
- Embedding Model Semantic Understanding: While helpful, the embedding model's interpretation of language isn't perfect and might miss subtle differences in phrasing or context, leading to incorrect similarity matches.
- Fixed Similarity Thresholds: The current hardcoded thresholds for identifying similar queries might be too strict or lenient, potentially leading to suboptimal routing of queries to the LLM or the prompt cache.
- RAG Retrieval Relevance: The top retrieved documents from the vector database might not always contain the most precise information needed to answer the user's specific question, even if the initial semantic search is good.
- Prompt Cache Accuracy Over Time: Cached answers might become outdated due to policy changes, and the potential for irrelevant or incorrect entries could degrade the quality of future responses for similar queries.
- Data Dependency & Query Accuracy: The tool's accuracy depends heavily on well-structured database information and the LLM's ability to correctly translate natural language into precise SQL queries. Errors in either can lead to inaccurate results.
- Limited Implicit Relationship Understanding: The tool might struggle with user queries that require understanding indirect connections between the
plants
andplant_products
tables beyond explicitly defined relationships.
Improve Smart FAQ (additional functionality)
- Take Human In The Loop feedback on FAQ's; unhelpful queries could have "hit" score reduced
- Compare long-context / in-context learning vs current RAG approach
Combine tools into an Agent
- Agent orchestration decides how to handle query, utilising all the tools available
- For example, it can answer general questions about the shop (tool one - Smart FAQ) and also answer queries related to products stocked at Planted (tool two - Intelligent Database Querying)
Add a third tool: Plant Health Assistant
- Image Uploader: user uploads an image of their poorly plant
- Plant Identification & Symptom Detection: The Gemini API identifies the plant species and describes any visible symptoms
- If no visible symptoms: prompt the user for more info / provide general potential issues from google search
- (Agent Extension) Plant Health Information Retrieval: the Google Search API retrieves information on common diseases and treatments for the identified plant and symptoms
- (Agent Extension) Weather Data Integration: the Google Weather API provides current weather conditions, which are factored into the plant health analysis
- Provide treatment recommendations and potential cause explanation to user & recommend any relevant products from Planted shop
Fourth tool: Shopping Assistant (application integration)
- Provide shopping assistance e.g. discuss options before building order, and account creation/management
I will be continuing to work on this project over the coming weeks โฐ Feel free to follow me at @pip.codes in the meantime!โจ
Comments