In today's digital age, while platforms like Google Maps provide a wealth of information about local businesses, they often fall short when it comes to specific amenities, image based queries or personalized recommendations. Enter our advanced Location-based Recommendation System, designed to bridge this gap and offer users a more comprehensive and tailored experience while minimising the effort required to discover specific information in web.
Imagine asking, "When does Dr. Smith's dental office open on Wednesday?" or "Does the Oceanview Hotel actually have rooms with a sea view?" or "Does the Majestic security service provide transportation service?" Our system goes beyond business static information, tapping into a rich database of business details, user reviews along with images to provide accurate and personalised answers and images.
But we don't stop there. By analysing your past dining experiences and preferences, our system can offer restaurant's recommendations that truly resonate with your taste. "Based on your love for spicy Thai cuisine and cozy atmospheres, we think you'll enjoy 'Pad Thai from the House of Thai restaurant."
In this post, we'll explore the development of a cutting-edge personalised location-based recommendation system. This system leverages advanced natural language processing techniques, including Retrieval-Augmented Generation (RAG) and fine-tuned Large Language Models (LLMs), to provide users with highly tailored location recommendations based on their queries and past experiences.
The Code for the project, including RAFT dataset generation, finetuning, and UI, can be found here
Our recommendation and Q&A system consists of several key components:
- FastAPI Backend: Handles data processing, retrieval, and integration with the LLM.
- Vector Store: Utilizes Chroma DB for efficient similarity search.
- Embedding Model: Employs HuggingFace's BAAI/bge-small-en-v1.5 for text embeddings.
- Large Language Model: Uses a fine-tuned LLaMA3-8B model for natural language understanding and generation.
- Image Embedding and Matching: Incorporates image-based search for enhanced recommendations using CLIP's features.
- Streamlit Frontend: Provides an intuitive user interface for interacting with the system.
Packages Required
- ROCm
- Pytorch
- fastapi
- streamlit
- llama_index
- llama-index-embeddings-huggingface
- ollama
- chromadb
- datasets
- CLIP
- tranformers
- unsloth
Each Business Meta data will follow this JSON structure
{ 'name': 'Walgreens Pharmacy',
'address': 'Walgreens Pharmacy, 124 E North St, Kendallville, IN 46755', 'gmap_id': '0x881614ce7c13acbb:0x5c7b18bbf6ec4f7e',
'description': 'Department of the Walgreens chain providing prescription medications & other health-related items.',
'latitude': 41.451859999999996,
'longitude': -85.2666757,
'category': ['Pharmacy'],
'avg_rating': 4.2,
'num_of_reviews': 5,
'price': '$$',
'hours': [['Thursday', '8AM–1:30PM'], ['Friday', '8AM–1:30PM'], ['Saturday', '9AM–1:30PM'], ['Sunday', '10AM–1:30PM'], ['Monday', '8AM–1:30PM'], ['Tuesday', '8AM–1:30PM'], ['Wednesday', '8AM–1:30PM']],
'MISC': { 'Service options': ['Curbside pickup', 'Drive-through', 'In-store pickup', 'In-store shopping'], 'Health & safety': ['Mask required', 'Staff wear masks', 'Staff get temperature checks'], 'Accessibility': ['Wheelchair accessible entrance', 'Wheelchair accessible parking lot'], 'Planning': ['Quick visit'],
'Payments': ['Checks', 'Debit cards'] },
'state': 'Closes soon ⋅ 1:30PM ⋅ Reopens 2PM',
'relative_results': ['0x881614cd49e4fa33:0x2d507c24ff4f1c74', '0x8816145bf5141c89:0x535c1d605109f94b', '0x881614cda24cc591:0xca426e3a9b826432', '0x88162894d98b91ef:0xd139b34de70d3e03', '0x881615400b5e57f9:0xc56d17dbe420a67f'],
'url': 'https://www.google.com/maps/place//data=!4m2!3m1!1s0x881614ce7c13acb b:0x5c7b18bbf6ec4f7e?authuser=-1&hl=en&gl=us'
}
and the user review will follow this JSON structure
{ 'user_id': '101463350189962023774',
'name': 'Jordan Adams',
'time': 1627750414677,
'rating': 5,
'text': 'Cool place, great people, awesome dentist!',
'pics': [ { 'url': ['https://lh5.googleusercontent.com/p/AF1QipNq2nZC5TH4_M7h5xRAd 61hoTgvY1o9lozABguI=w150-h150-k-no-p'] } ],
'resp': { 'time': 1628455067818, 'text': 'Thank you for your five-star review! -Dr. Blake' },
'gmap_id': '0x87ec2394c2cd9d2d:0xd1119cfbee0da6f3'
}
In order to meet the system requirements, we used Indiana, a state in the United States with 100, 391 businesses and 7, 638, 803 reviews.Key Features
1. Personalized Recommendations
The system takes into account the user's past reviews and preferences when generating recommendations. This personalization is achieved by:
- Storing and retrieving user-specific review data
- Incorporating user reviews into the context provided to the LLM
2. Multi-modal Search
Our system goes beyond text-based search by incorporating image data.
- Stores and retrieves business images
- Performs image-text matching to find visually relevant results
- Presents top-matching images alongside text recommendations
3. Conversational Interface
The Streamlit frontend provides a chat-like interface, allowing users to:
- Ask natural language queries
- Receive detailed responses from the LLM
- View relevant images and business details
- Engage in follow-up questions for deeper exploration
4. Efficient Data Retrieval
The system employs vector search techniques for fast and accurate retrieval.
- Uses Chroma DB as a vector store for both text and image embeddings
- Implements semantic search using the BAAI/bge-small-en-v1.5 embedding model
- Multi-modal retrieval using CLIP's embedding for query and images
1. Data Processing and Storage
The system processes and stores several types of data, including:
1. Business metadata (name, address, GPS coordinates, etc.)
2. User reviews
3. Business images
This data is loaded from gzipped JSON files and stored in memory for quick access. Vector embeddings of business names and addresses are created and stored in a Chroma DB collection for efficient similarity search.
2. Query Processing Pipeline
When a user submits a query, the system follows these steps:
- Performs a vector similarity search to find relevant businesses
- Retrieves associated metadata, user reviews, and images
- Conducts image-text matching if images are available
- Generates a context string containing relevant information
- Sends the context and user query to the fine-tuned LLM
- Processes the LLM's response and returns it to the user
3. Image Search and Matching
The system incorporates image search capabilities.
- Image embeddings are generated and stored in a separate Chroma DB collection
- The image_text_matching function finds images that best match the user's query
- Top-matching images are displayed alongside text recommendations
In this section, we'll dive into the process of fine-tuning a Large Language Model (LLM) using the RAFT dataset and integrating it with a Retrieval-Augmented Generation (RAG) technique. We'll be using the Meta-Llama-3-8B-Instruct model as our base model and fine-tuning it for improved performance on our specific task.
1. RAFT Dataset Generation
Before diving into the fine-tuning process, it's important to understand how the RAFT (Retrieval Augmented Fine-Tuning) dataset was generated. The dataset generation process uses a raft.py script with several configurable parameters:
datapath
: Path to the source document (default: "./datasets/indiana/meta-Indiana.json.gz")output
: Path to save the generated dataset (default: "./gen_data/")output-format
: Format of the output dataset (default: "hf" for HuggingFace)distractors
: Number of distractor documents per data point (default: 3)p
: Probability of including the oracle document in the context (default: 1.0)questions
: Number of data points to generate per chunk document (default: 5)- completion_model: Model used to generate questions and answers (default: "llama3")The dataset generated will have the following structure:
''question': 'How would you rate the overall quality of service provided by Majestic Security based on user reviews?
as follows:
'oracle_context': {"name": "Majestic Security", "address": "Majestic Security, 3128 Lexington Park Dr, Elkhart, IN 46514", "gmap_id": "0x8816c4b2fb8fb6a1:0x80451636e10ca83f", "description": null, "latitude": 41.6899261, "longitude": -86.02416989999999, "category": ["Security guard service", "Business to business service", "Public safety office", "Security service", "Training centre", "Training school", "Transportation escort service"], "avg_rating": 4.3, "num_of_reviews": 48, "price": null, "hours": [["Thursday", "9AM\\u20135PM"], ["Friday", "9AM\\u20135PM"], ["Saturday", "Closed"], ["Sunday", "Closed"], ["Monday", "9AM\\u20135PM"], ["Tuesday", "9AM\\u20135PM"], ["Wednesday", "9AM\\u20135PM"]], "MISC": null, "state": "Open \\u22c5 Closes 5PM", "relative_results": ["0x8816e8092cc37eff:0xa138075153591bc7", "0x8816ce61cc404e23:0x71a5e9e0898036a4", "0x8816e9eb8afbc539:0x7d7ee677df3fafa3", "0x8816cd46eed45c35:0x7d80db2d3b489fc3", "0x8816ebe5c65cf3ad:0xe8dede77091f4ecf"], "url": "https://www.google.com/maps/place//data=!4m2!3m1!1s0x8816c4b2fb8fb6a1:0x80451636e10ca83f?authuser=-1&hl=en&gl=us"}'
'cot_answer': 'To rate the overall quality of service provided by Majestic Security based on user reviews, I can analyze the average rating (4.3) and the number of reviews (48). This suggests that a significant portion of customers have had positive experiences with the company.\n\n##begin_quote##The high average rating indicates a strong level of satisfaction among users.##end_quote##\n\nTaking this into account, I would rate the overall quality of service provided by Majestic Security\n\n<ANSWER>: Excellent'
2. Model and Dataset Preparation
The dataset generated from RAFT has following input parameters:
First, we set up our base model and tokenizer.
base_model = AutoModelForCausalLM.from_pretrained(
'meta-llama/Meta-Llama-3-8B-Instruct',
device_map="auto",)
llama_tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct', trust_remote_code=True)
Next, we prepare our dataset for fine-tuning
def preprocess_function(examples):
texts = [
f"Question: {q}\nContext: {c}\nCOTAnswer: {a}"
for q, c, a in zip(examples['question'], examples['oracle_context'], examples['cot_answer'])
]
return {"text": texts}
def prepare_dataset_for_sft(dataset_path):
dataset = load_from_disk(dataset_path)
preprocessed_dataset = dataset.map(
preprocess_function,
batched=True,
remove_columns=dataset.column_names
)
preprocessed_dataset.set_format(type="torch")
return preprocessed_dataset
training_data = prepare_dataset_for_sft("datasets/arrow_dataset")
3. LoRA Configuration and Model Setup
We use the LoRA (Low-Rank Adaptation) technique for efficient fine-tuning.
from peft import get_peft_model, LoraConfig
peft_parameters = LoraConfig(
lora_alpha=8,
lora_dropout=0.1,
r=8,
target_modules=['q_proj', 'v_proj'],
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, peft_parameters)
model.print_trainable_parameters()
trainable params: 3,407,872 || all params: 8,033,669,120 || trainable%: 0.0424
4. fine-tuning ProcessWe use the SFTTrainer for supervised fine-tuning.
fine_tuning = SFTTrainer(
model=base_model,
train_dataset=training_data,
peft_config=peft_parameters,
dataset_text_field="text",
tokenizer=llama_tokenizer,
args=train_params
)
Start training
fine_tuning.train()
5. Evaluation: Base Model vs. Fine-tuned Model
After fine-tuning, we compare the performance of the base model and the fine-tuned model:
Question: How would you rate the overall quality of service provided by Majestic Security based on user reviews?
Context: {"name": "Majestic Security", "address": "Majestic Security, 3128 Lexington Park Dr, Elkhart, IN 46514", "gmap_id": "0x8816c4b2fb8fb6a1:0x80451636e10ca83f", "description": null, "latitude": 41.6899261, "longitude": -86.02416989999999, "category": ["Security guard service", "Business to business service", "Public safety office", "Security service", "Training centre", "Training school", "Transportation escort service"], "avg_rating": 4.3, "num_of_reviews": 48, "price": null, "hours": [["Thursday", "9AM–5PM"], ["Friday", "9AM–5PM"], ["Saturday", "Closed"], ["Sunday", "Closed"], ["Monday", "9AM–5PM"], ["Tuesday", "9AM–5PM"], ["Wednesday", "9AM–5PM"]], "MISC": null, "state": "Open ⋅ Closes 5PM", "relative_results": ["0x8816e8092cc37eff:0xa138075153591bc7", "0x8816ce61cc404e23:0x71a5e9e0898036a4", "0x8816e9eb8afbc539:0x7d7ee677df3fafa3", "0x8816cd46eed45c35:0x7d80db2d3b489fc3", "0x8816ebe5c65cf3ad:0xe8dede77091f4ecf"], "url": "https://www.google.com/maps/place//data=!4m2!3m1!1s0x8816c4b2fb8fb6a1:0x80451636e10ca83f?authuser=-1&hl=en&gl=us"}.
Base Model Output
Solution: Based on the provided user reviews, I would rate the overall quality of service provided by Majestic Security as 4.3 out of 5 stars. This rating suggests that the majority of customers are satisfied with the service, but there may be some minor issues or areas for improvement. The high rating indicates that Majestic Security is doing a good job in terms of providing reliable and effective security services. However, it's essential to note that individual experiences may vary, and some customers may have had different experiences. To get a more comprehensive understanding, it would be helpful to read the actual user reviews to identify common themes and areas of improvement.
Fine-tuned Model Output
COTAnswer: Based on the user reviews, I would rate the overall quality of service provided by Majestic Security as 4.3 out of 5 stars. This is indicated by the "avg_rating" field in the business information, which shows an average rating of 4.3 based on 48 reviews. This suggests that most customers have had a positive experience with the company.
<ANSWER>: 4.3 out of 5 stars.
6. 📊Analysis of Results Base Model vs. Fine-tuned ModelThe fine-tuned model demonstrates several improvements over the base model:
- Conciseness: The fine-tuned model provides a more concise answer, focusing on the key information requested.
- Direct reference to data: It explicitly mentions the "avg_rating" field and the number of reviews, showing a clearer connection to the provided context.
- Structured output: The fine-tuned model uses the <ANSWER> tag to clearly delineate its final rating, making it easier to extract the key information.
- Reduced speculation: Unlike the base model, the fine-tuned version doesn't speculate about potential issues or variations in individual experiences, sticking closer to the provided data and giving precise answers by using quotes from context.
- Chain of Thought: As demonstrated in this example, the fine-tuned model is primed for providing a chain of thought, as evidenced by the inclusion of reasoning before the final answer.
These improvements suggest that the fine-tuning process has successfully adapted the model to provide more targeted, data-driven responses for this specific task.
Model Conversion to Q4_k_m using UnslothYou can install unsloth from their git repo directly usingpip install "unsloth [colab-new] @ git+https://github.com/unslothai/unsloth.git and follow the Ollama installation from here
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("../checkpoint_12000")
model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method = "q4_k_m")
You can use Olama or directly from the generated GGUf model with the following model file:.
FROM /home/saidinesh/Desktop/Projects/gmap-recomm/notebook/gguf_model/unsloth.Q4_K_M.gguf
TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"
PARAMETER num_keep 24
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
LICENSE "META LLAMA 3 COMMUNITY LICENSE AGREEMENT"
After that, create the ollama model using Modelfileollama create gmap_recomm_llama3 -f./gguf_model/ModelfileNow, the ollama model results can be directly accessed in terminal by curl
!curl -XPOST localhost:11434/api/generate -d '{"model": "gmap_recomm_llama3", "prompt": "<Your Prompt>", "stream": false }'
Please note that The Quantized model may not follow the same structure as unquantized model but this can be reduced by fine-tuning the model further than 16k training steps.
We are using chromaDB to store the embeddings generated by sentence transformer (BAAI/bge-small-en-v1.5) and CLIP embeddings for user images.
We are computing and storing business meta data in ChromDB for quick retrieval in order to obtain business data.
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
if os.path.exists("datasets/indiana/chroma_index"):
vector_index = VectorStoreIndex.from_vector_store(
vector_store, storage_context=storage_context
)
else:
vector_index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, embed_model=embed_model)
For image embeddings, we are calculating closest Images to the given query embeddings in common latent space by CLIP.
def initialize_image_embeddings(gmap_id, urls, image_collection, n_images=500):
for i, url in enumerate(urls[:n_images]):
try:
image = (
preprocess(Image.open(requests.get(url, stream=True).raw))
.unsqueeze(0)
.to(device)
)
with torch.no_grad():
image_features = (
model.encode_image(image).cpu().numpy().flatten().tolist()
)
image_collection.add(
embeddings=[image_features],
ids=[f"{gmap_id}_img_{i}"],
metadatas=[{"url": url, "gmap_id": gmap_id}],
)
except Exception as e:
print(f"Error processing image {url}: {e}")
Swagger Doc for fastapiFor Each Query api, we need user_id, input (user query) and conversation_id
The Streamlit-based frontend provides an intuitive chat interface.
- Users can enter natural language queries about the business
- The system displays LLM-generated responses
- Relevant images and additional business data are shown below the text response for the first query
- The conversation history is maintained for context in follow-up queries
The current version takes into account a smaller subset of the huge google local dataset. By using the entire dataset for training, performance can be greatly increased, but this requires huge computing hardware. For various other requirements, there is also space to use multiple agents, including the router agent. Even if we do not use other ranking methods, the results can still be enhanced by bringing in more businesses results based on the user's location data.
Conclusion
Our personalized location-based recommendation system demonstrates the power of combining advanced NLP techniques with multi-modal data to create highly tailored user experiences. By leveraging RAG, fine-tuned LLMs, and efficient vector search, we've created a system that understands user preferences, provides contextually relevant recommendations, and enhances the results with visual information.
License
MIT
References1. Google Local review Dataset
2. ROCm documents for fine-tuning the LLama3 on MI210 GPU
3. RAFT for generating dataset to fine-tune the LLama3
🙏 Thank you for reading
I would like to thank the AMD team for providing me with the MI210 GPU for my fine-tuning workload and for their quick resolution of issues.
Comments