Enhancing Local Business Intelligence: Beyond Google Maps
🏛️System Architecture
🛢️Dataset Preview
Technical Deep Dive
🤖 LLM RAFT data generation and Fine-tuning with ROCm
Model Conversion to Q4_k_m using Unsloth
🛢Chroma Database for Storing Business Meta data
Swagger Doc for fastapi
🖥️User Interface
Future work and potential improvements
References

Team r-bot:

•

Published July 31, 2024 © MIT

Enhancing Local Business Intelligence: Beyond Google Maps

A location-based personalized recommendation/Q&A assistant that uses MULT-MODAL RAG to interact with businesses on maps.

AdvancedFull instructions providedOver 4 days90

Enhancing Local Business Intelligence: Beyond Google Maps

Things used in this project

Hardware components

AMD MI210 GPU

Software apps and online services

AMD ROCm Software

bitsandbytes

Pytorch

streamlit

ollama

fastapi

Story

Enhancing Local Business Intelligence: Beyond Google Maps

In today's digital age, while platforms like Google Maps provide a wealth of information about local businesses, they often fall short when it comes to specific amenities, image based queries or personalized recommendations. Enter our advanced Location-based Recommendation System, designed to bridge this gap and offer users a more comprehensive and tailored experience while minimising the effort required to discover specific information in web.

Imagine asking, "When does Dr. Smith's dental office open on Wednesday?" or "Does the Oceanview Hotel actually have rooms with a sea view?" or "Does the Majestic security service provide transportation service?" Our system goes beyond business static information, tapping into a rich database of business details, user reviews along with images to provide accurate and personalised answers and images.

But we don't stop there. By analysing your past dining experiences and preferences, our system can offer restaurant's recommendations that truly resonate with your taste. "Based on your love for spicy Thai cuisine and cozy atmospheres, we think you'll enjoy 'Pad Thai from the House of Thai restaurant."

In this post, we'll explore the development of a cutting-edge personalised location-based recommendation system. This system leverages advanced natural language processing techniques, including Retrieval-Augmented Generation (RAG) and fine-tuned Large Language Models (LLMs), to provide users with highly tailored location recommendations based on their queries and past experiences.

The Code for the project, including RAFT dataset generation, finetuning, and UI, can be found here

Local Business Assistant:Beyond Google Map Demo

🏛️System Architecture

Our recommendation and Q&A system consists of several key components:

FastAPI Backend: Handles data processing, retrieval, and integration with the LLM.
Vector Store: Utilizes Chroma DB for efficient similarity search.
Embedding Model: Employs HuggingFace's BAAI/bge-small-en-v1.5 for text embeddings.
Large Language Model: Uses a fine-tuned LLaMA3-8B model for natural language understanding and generation.
Image Embedding and Matching: Incorporates image-based search for enhanced recommendations using CLIP's features.
Streamlit Frontend: Provides an intuitive user interface for interacting with the system.

End to End pipeline for the location based recommendation MRAG-Bot

Packages Required

ROCm
Pytorch
fastapi
streamlit
llama_index
llama-index-embeddings-huggingface
ollama
chromadb
datasets
CLIP
tranformers
unsloth

🛢️Dataset Preview

Each Business Meta data will follow this JSON structure

{   'name': 'Walgreens Pharmacy', 
   'address': 'Walgreens Pharmacy, 124 E North St, Kendallville, IN 46755',      'gmap_id': '0x881614ce7c13acbb:0x5c7b18bbf6ec4f7e', 
'description': 'Department of the Walgreens chain providing prescription medications & other health-related items.',
'latitude': 41.451859999999996,
'longitude': -85.2666757,  
'category': ['Pharmacy'],  
'avg_rating': 4.2,  
'num_of_reviews': 5,   
'price': '$$',  
'hours': [['Thursday', '8AM–1:30PM'], ['Friday', '8AM–1:30PM'], ['Saturday', '9AM–1:30PM'], ['Sunday', '10AM–1:30PM'], ['Monday', '8AM–1:30PM'], ['Tuesday', '8AM–1:30PM'], ['Wednesday', '8AM–1:30PM']],
'MISC': {     'Service options': ['Curbside pickup', 'Drive-through', 'In-store pickup', 'In-store shopping'],      'Health & safety': ['Mask required', 'Staff wear masks', 'Staff get temperature checks'],      'Accessibility': ['Wheelchair accessible entrance', 'Wheelchair accessible parking lot'],      'Planning': ['Quick visit'], 
'Payments': ['Checks', 'Debit cards']   },
'state': 'Closes soon ⋅ 1:30PM ⋅ Reopens 2PM',    
'relative_results': ['0x881614cd49e4fa33:0x2d507c24ff4f1c74', '0x8816145bf5141c89:0x535c1d605109f94b', '0x881614cda24cc591:0xca426e3a9b826432', '0x88162894d98b91ef:0xd139b34de70d3e03', '0x881615400b5e57f9:0xc56d17dbe420a67f'],    
'url': 'https://www.google.com/maps/place//data=!4m2!3m1!1s0x881614ce7c13acb   b:0x5c7b18bbf6ec4f7e?authuser=-1&hl=en&gl=us'
 }

and the user review will follow this JSON structure

{   'user_id': '101463350189962023774',    
'name': 'Jordan Adams',    
'time': 1627750414677,    
'rating': 5,    
'text': 'Cool place, great people, awesome dentist!',    
'pics': [   {     'url': ['https://lh5.googleusercontent.com/p/AF1QipNq2nZC5TH4_M7h5xRAd     61hoTgvY1o9lozABguI=w150-h150-k-no-p']   }   ],    
'resp': {     'time': 1628455067818,      'text': 'Thank you for your five-star review! -Dr. Blake'   },    
'gmap_id': '0x87ec2394c2cd9d2d:0xd1119cfbee0da6f3' 
}

In order to meet the system requirements, we used Indiana, a state in the United States with 100, 391 businesses and 7, 638, 803 reviews.Key Features

1. Personalized Recommendations

The system takes into account the user's past reviews and preferences when generating recommendations. This personalization is achieved by:

Storing and retrieving user-specific review data
Incorporating user reviews into the context provided to the LLM

2. Multi-modal Search

Our system goes beyond text-based search by incorporating image data.

Stores and retrieves business images
Performs image-text matching to find visually relevant results
Presents top-matching images alongside text recommendations

3. Conversational Interface

The Streamlit frontend provides a chat-like interface, allowing users to:

Ask natural language queries
Receive detailed responses from the LLM
View relevant images and business details
Engage in follow-up questions for deeper exploration

4. Efficient Data Retrieval

The system employs vector search techniques for fast and accurate retrieval.

Uses Chroma DB as a vector store for both text and image embeddings
Implements semantic search using the BAAI/bge-small-en-v1.5 embedding model
Multi-modal retrieval using CLIP's embedding for query and images

Technical Deep Dive

1. Data Processing and Storage

The system processes and stores several types of data, including:

1. Business metadata (name, address, GPS coordinates, etc.)

2. User reviews

3. Business images

This data is loaded from gzipped JSON files and stored in memory for quick access. Vector embeddings of business names and addresses are created and stored in a Chroma DB collection for efficient similarity search.

2. Query Processing Pipeline

When a user submits a query, the system follows these steps:

Performs a vector similarity search to find relevant businesses
Retrieves associated metadata, user reviews, and images
Conducts image-text matching if images are available
Generates a context string containing relevant information
Sends the context and user query to the fine-tuned LLM
Processes the LLM's response and returns it to the user

3. Image Search and Matching

The system incorporates image search capabilities.

Image embeddings are generated and stored in a separate Chroma DB collection
The image_text_matching function finds images that best match the user's query
Top-matching images are displayed alongside text recommendations

🤖 LLM RAFT data generation and Fine-tuning with ROCm

In this section, we'll dive into the process of fine-tuning a Large Language Model (LLM) using the RAFT dataset and integrating it with a Retrieval-Augmented Generation (RAG) technique. We'll be using the Meta-Llama-3-8B-Instruct model as our base model and fine-tuning it for improved performance on our specific task.

1. RAFT Dataset Generation

Llama3 Finetuning Dataset (created using SD3)

Before diving into the fine-tuning process, it's important to understand how the RAFT (Retrieval Augmented Fine-Tuning) dataset was generated. The dataset generation process uses a raft.py script with several configurable parameters:

datapath: Path to the source document (default: "./datasets/indiana/meta-Indiana.json.gz")
output: Path to save the generated dataset (default: "./gen_data/")
output-format: Format of the output dataset (default: "hf" for HuggingFace)
distractors: Number of distractor documents per data point (default: 3)
p: Probability of including the oracle document in the context (default: 1.0)
questions: Number of data points to generate per chunk document (default: 5)
completion_model: Model used to generate questions and answers (default: "llama3")The dataset generated will have the following structure:''question': 'How would you rate the overall quality of service provided by Majestic Security based on user reviews? 'oracle_context': {"name": "Majestic Security", "address": "Majestic Security, 3128 Lexington Park Dr, Elkhart, IN 46514", "gmap_id": "0x8816c4b2fb8fb6a1:0x80451636e10ca83f", "description": null, "latitude": 41.6899261, "longitude": -86.02416989999999, "category": ["Security guard service", "Business to business service", "Public safety office", "Security service", "Training centre", "Training school", "Transportation escort service"], "avg_rating": 4.3, "num_of_reviews": 48, "price": null, "hours": [["Thursday", "9AM\\u20135PM"], ["Friday", "9AM\\u20135PM"], ["Saturday", "Closed"], ["Sunday", "Closed"], ["Monday", "9AM\\u20135PM"], ["Tuesday", "9AM\\u20135PM"], ["Wednesday", "9AM\\u20135PM"]], "MISC": null, "state": "Open \\u22c5 Closes 5PM", "relative_results": ["0x8816e8092cc37eff:0xa138075153591bc7", "0x8816ce61cc404e23:0x71a5e9e0898036a4", "0x8816e9eb8afbc539:0x7d7ee677df3fafa3", "0x8816cd46eed45c35:0x7d80db2d3b489fc3", "0x8816ebe5c65cf3ad:0xe8dede77091f4ecf"], "url": "https://www.google.com/maps/place//data=!4m2!3m1!1s0x8816c4b2fb8fb6a1:0x80451636e10ca83f?authuser=-1&hl=en&gl=us"}' 'cot_answer': 'To rate the overall quality of service provided by Majestic Security based on user reviews, I can analyze the average rating (4.3) and the number of reviews (48). This suggests that a significant portion of customers have had positive experiences with the company.\n\n##begin_quote##The high average rating indicates a strong level of satisfaction among users.##end_quote##\n\nTaking this into account, I would rate the overall quality of service provided by Majestic Securityas follows:\n\n<ANSWER>: Excellent'

2. Model and Dataset Preparation

The dataset generated from RAFT has following input parameters:

First, we set up our base model and tokenizer.

base_model = AutoModelForCausalLM.from_pretrained(
    'meta-llama/Meta-Llama-3-8B-Instruct',
    device_map="auto",)
llama_tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct', trust_remote_code=True)

Next, we prepare our dataset for fine-tuning

def preprocess_function(examples):
texts = [
f"Question: {q}\nContext: {c}\nCOTAnswer: {a}"
for q, c, a in zip(examples['question'], examples['oracle_context'], examples['cot_answer'])
]
return {"text": texts}
def prepare_dataset_for_sft(dataset_path):
dataset = load_from_disk(dataset_path)
preprocessed_dataset = dataset.map(
preprocess_function,
batched=True,
remove_columns=dataset.column_names
)
preprocessed_dataset.set_format(type="torch")
return preprocessed_dataset
training_data = prepare_dataset_for_sft("datasets/arrow_dataset")

3. LoRA Configuration and Model Setup

We use the LoRA (Low-Rank Adaptation) technique for efficient fine-tuning.

from peft import get_peft_model, LoraConfig
peft_parameters = LoraConfig(
lora_alpha=8,
lora_dropout=0.1,
r=8,
target_modules=['q_proj', 'v_proj'],
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, peft_parameters)
model.print_trainable_parameters()
trainable params: 3,407,872 || all params: 8,033,669,120 || trainable%: 0.0424

4. fine-tuning ProcessWe use the SFTTrainer for supervised fine-tuning.

fine_tuning = SFTTrainer(
model=base_model,
train_dataset=training_data,
peft_config=peft_parameters,
dataset_text_field="text",
tokenizer=llama_tokenizer,
args=train_params
)

Start training

fine_tuning.train()

5. Evaluation: Base Model vs. Fine-tuned Model

After fine-tuning, we compare the performance of the base model and the fine-tuned model:

Question: How would you rate the overall quality of service provided by Majestic Security based on user reviews?

Context: {"name": "Majestic Security", "address": "Majestic Security, 3128 Lexington Park Dr, Elkhart, IN 46514", "gmap_id": "0x8816c4b2fb8fb6a1:0x80451636e10ca83f", "description": null, "latitude": 41.6899261, "longitude": -86.02416989999999, "category": ["Security guard service", "Business to business service", "Public safety office", "Security service", "Training centre", "Training school", "Transportation escort service"], "avg_rating": 4.3, "num_of_reviews": 48, "price": null, "hours": [["Thursday", "9AM–5PM"], ["Friday", "9AM–5PM"], ["Saturday", "Closed"], ["Sunday", "Closed"], ["Monday", "9AM–5PM"], ["Tuesday", "9AM–5PM"], ["Wednesday", "9AM–5PM"]], "MISC": null, "state": "Open ⋅ Closes 5PM", "relative_results": ["0x8816e8092cc37eff:0xa138075153591bc7", "0x8816ce61cc404e23:0x71a5e9e0898036a4", "0x8816e9eb8afbc539:0x7d7ee677df3fafa3", "0x8816cd46eed45c35:0x7d80db2d3b489fc3", "0x8816ebe5c65cf3ad:0xe8dede77091f4ecf"], "url": "https://www.google.com/maps/place//data=!4m2!3m1!1s0x8816c4b2fb8fb6a1:0x80451636e10ca83f?authuser=-1&hl=en&gl=us"}.

Base Model Output

Base Model Prediction

Solution: Based on the provided user reviews, I would rate the overall quality of service provided by Majestic Security as 4.3 out of 5 stars. This rating suggests that the majority of customers are satisfied with the service, but there may be some minor issues or areas for improvement. The high rating indicates that Majestic Security is doing a good job in terms of providing reliable and effective security services. However, it's essential to note that individual experiences may vary, and some customers may have had different experiences. To get a more comprehensive understanding, it would be helpful to read the actual user reviews to identify common themes and areas of improvement.

Fine-tuned Model Output

Finetuned Model with Chain of Thought Answer

COTAnswer: Based on the user reviews, I would rate the overall quality of service provided by Majestic Security as 4.3 out of 5 stars. This is indicated by the "avg_rating" field in the business information, which shows an average rating of 4.3 based on 48 reviews. This suggests that most customers have had a positive experience with the company. 
<ANSWER>: 4.3 out of 5 stars.

6. 📊Analysis of Results Base Model vs. Fine-tuned ModelThe fine-tuned model demonstrates several improvements over the base model:

Conciseness: The fine-tuned model provides a more concise answer, focusing on the key information requested.
Direct reference to data: It explicitly mentions the "avg_rating" field and the number of reviews, showing a clearer connection to the provided context.
Structured output: The fine-tuned model uses the <ANSWER> tag to clearly delineate its final rating, making it easier to extract the key information.
Reduced speculation: Unlike the base model, the fine-tuned version doesn't speculate about potential issues or variations in individual experiences, sticking closer to the provided data and giving precise answers by using quotes from context.
Chain of Thought: As demonstrated in this example, the fine-tuned model is primed for providing a chain of thought, as evidenced by the inclusion of reasoning before the final answer.

These improvements suggest that the fine-tuning process has successfully adapted the model to provide more targeted, data-driven responses for this specific task.

Model Conversion to Q4_k_m using Unsloth

You can install unsloth from their git repo directly usingpip install "unsloth [colab-new] @ git+https://github.com/unslothai/unsloth.git and follow the Ollama installation from here

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("../checkpoint_12000")
model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method = "q4_k_m")

You can use Olama or directly from the generated GGUf model with the following model file:.

FROM /home/saidinesh/Desktop/Projects/gmap-recomm/notebook/gguf_model/unsloth.Q4_K_M.gguf
TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"
PARAMETER num_keep 24
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
LICENSE "META LLAMA 3 COMMUNITY LICENSE AGREEMENT"

After that, create the ollama model using Modelfileollama create gmap_recomm_llama3 -f./gguf_model/ModelfileNow, the ollama model results can be directly accessed in terminal by curl

!curl -XPOST localhost:11434/api/generate -d '{"model": "gmap_recomm_llama3", "prompt": "<Your Prompt>", "stream": false }'

Please note that The Quantized model may not follow the same structure as unquantized model but this can be reduced by fine-tuning the model further than 16k training steps.

🛢Chroma Database for Storing Business Meta data

We are using chromaDB to store the embeddings generated by sentence transformer (BAAI/bge-small-en-v1.5) and CLIP embeddings for user images.

We are computing and storing business meta data in ChromDB for quick retrieval in order to obtain business data.

db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
if os.path.exists("datasets/indiana/chroma_index"):
vector_index = VectorStoreIndex.from_vector_store(
vector_store, storage_context=storage_context
)
else:
vector_index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, embed_model=embed_model)

For image embeddings, we are calculating closest Images to the given query embeddings in common latent space by CLIP.

def initialize_image_embeddings(gmap_id, urls, image_collection, n_images=500):
    for i, url in enumerate(urls[:n_images]):
        try:
            image = (
                preprocess(Image.open(requests.get(url, stream=True).raw))
                .unsqueeze(0)
                .to(device)
            )
            with torch.no_grad():
                image_features = (
                    model.encode_image(image).cpu().numpy().flatten().tolist()
                )

            image_collection.add(
                embeddings=[image_features],
                ids=[f"{gmap_id}_img_{i}"],
                metadatas=[{"url": url, "gmap_id": gmap_id}],
            )
        except Exception as e:
            print(f"Error processing image {url}: {e}")

Swagger Doc for fastapi

For Each Query api, we need user_id, input (user query) and conversation_id

Swagger Docs for apis

query api

🖥️User Interface

The Streamlit-based frontend provides an intuitive chat interface.

Users can enter natural language queries about the business
The system displays LLM-generated responses
Relevant images and additional business data are shown below the text response for the first query
The conversation history is maintained for context in follow-up queries

Future work and potential improvements

The current version takes into account a smaller subset of the huge google local dataset. By using the entire dataset for training, performance can be greatly increased, but this requires huge computing hardware. For various other requirements, there is also space to use multiple agents, including the router agent. Even if we do not use other ranking methods, the results can still be enhanced by bringing in more businesses results based on the user's location data.

Conclusion

Our personalized location-based recommendation system demonstrates the power of combining advanced NLP techniques with multi-modal data to create highly tailored user experiences. By leveraging RAG, fine-tuned LLMs, and efficient vector search, we've created a system that understands user preferences, provides contextually relevant recommendations, and enhances the results with visual information.

License

MIT