Created December 10, 2023

Smart Audio guide - a next gen guide using GenAI at the Edge

Revolutionize museum visits with our AI guide: personalized, real-time learning that adapts to you, making culture accessible to all

Things used in this project

Hardware components

NVIDIA Jetson Orin Nano Developer Kit

headphones

To serve as microphone and speaker for visitor to interact with AI

Webcam, Logitech® HD Pro

Any webcam will do though the higher the resolution the better he should be (ideally 1080p and above)

Battery Pack

I expect to use a 20000mAh battery pack for maximum portability and run time for now

push button

TODO

Touch screen

using a small touch screen for debugging

Software apps and online services

nvidia SDKs

Hand tools and fabrication machines

3D Printer (generic)

3D printer to print some casing

Story

Have you ever visited a museum and left feeling unfulfilled, wishing you could have learned more in a way that resonated with you personally? Traditional museum experiences often rely on static placards or one-size-fits-all audio guides, which may not cater to the diverse interests and learning styles of all visitors.

This solution aims to revolutionise this experience by introducing an AI-driven interactive guide. Utilising NVIDIA Jetson Orin Dev kit's edge computing capabilities, this guide offers real-time, personalised interaction, adapting to individual learning preferences and engaging visitors in a dynamic and informative way. This not only enhances the educational aspect of museum visits but also promises to make cultural exploration more accessible and appealing to a broader audience, fostering a deeper connection with our heritage and arts.

For FUN: Our son at 4 years old, really focused on "drawing" at the National Gallery :)

Let's get started...

1. Setting up the NVIDIA Dev kit

1.1 Orin intro and why

1.2 Set up

2. Choosing the LLMs

TODO: how to combine vision (to interpret what painting we are looking at, which room we are in for the visit?) and speech interaction (to talk live with chat bot/model)

-> Should those be llamaspeak and LLaVa?

Will we need to use NanoDB to feed training data (relevant content)

https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/local_llm#multimodal-chat

3 Training

TODO: find resources related to paintings available at the national gallery - maybe focus on 2 rooms (to demonstrate geospatial awareness within museum through vision)

3.1 Identifying and confirming the content

3.2 Training the model?

3.3 How does it know you are a child or an adult to adapt the speech?

4. Testing

Does it behave as expected and

5. Final build

Putting it all together: nice packaging using 3D printing, + final demo

Conclusion and next steps

Useful links:

Smart Audio guide - a next gen guide using GenAI at the Edge

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

1. Setting up the NVIDIA Dev kit

2. Choosing the LLMs

3 Training

4. Testing

5. Final build

Conclusion and next steps

Credits

Dan Benitah

Comments

Embed the widget on your own site

Smart Audio guide - a next gen guide using GenAI at the Edge

Smart Audio guide - a next gen guide using GenAI at the Edge

Things used in this project

Hardware components

Software apps and online services

Hand tools and fabrication machines

Story

1. Setting up the NVIDIA Dev kit

2. Choosing the LLMs

3 Training

4. Testing

5. Final build

Conclusion and next steps

Credits

Dan Benitah

Comments