What is my project outline
Voice recognition
Voice By Typing
GEN-AI
Computer Vision
1.) Color detection using python
2.) Mouse Gesture control using python
Text to speech recognition
Voice to text conversion

Team Techvisioniz:

•

•

•

Published August 1, 2024 © MIT

Advanced AI Application for physically challenged peoples

Our solution differs by integrating multiple accessibility features in one application, optimized with AMD hardware.

BeginnerWork in progress20 hours153

Advanced AI Application for physically challenged peoples

Things used in this project

Hardware components

Webcam, Logitech® HD Pro

It is used for implement the computer vision modules

https://www.hackster.io/AMD/products/minisforum-venus-um790-pro-with-amd-ryzen-9

Software apps and online services

AMD Vitis Unified Software Platform

it is used to quantize your model

Story

What is my project outline

First of all, thank for AMD to giving this wonderful opportunity to learn this knowledge. I am a beginner to this project. I have only the theoretical knowledge about AI and algorithms. I have created seven module, first module is speech recognition in Tamil language using python, and second thing is computer vision modules I have created the color detection for the Tritanopia defected people and third thing is virtual mouse it helps to use the mouse virtually and it can be also a set of computer vision. forth module is text to speech converter using python for deaf and dumb they can also speak with the help of this text to voice converter once they enter the text, they want it can be converted into audio file the document folder. fifth module is local and private Llama application it supports both Languages Tamil and English, and sixth module is quantization model it used to shrink the size of the model which has.h extension

my project's theme is given below

1. Voice Recognition for the Blind: Enables interaction with devices through spoken commands.

2. Text-to-Voice Conversion for the Mute: Converts written text into spoken words for communication.

3. Voice-to-Text Conversion for the Deaf: Transcribes spoken words into written text in real-time.

4. Optimized with AMD AI Hardware: Utilizes AMD hardware to enhance performance and efficiency.

5. Seamless Accessibility: Empowers individuals with disabilities to engage with technology effectively.

the given below pictures show the implementation of this it shows four options they are given in the picture when it executes you type of letter "F" or "f" through the help of keyboard for speech Recognition and Letter "T" or "t" for Text to voice and "J" or "j" for voice to text conversion. it contains simple usages only

Voice recognition:

we use python eel module and speech recognition model which is used to create the voice recognition here we use two languages they are Tamil and English for differently abled people. we use auto voice recognition when we execute the app.py python file the voice searches are implement and text search are also implemented, it gets the choice from user to use English language or Tamil to communicate with user and the past executed command are displayed in the Tkinter windows

voice recognition

Voice By Typing:

it simply types the request in the input column and submit the request it executes the command simultaneously

voice by typing

GEN-AI:

we use the Tamil llama model to implement the generative ai. we use the stream lit framework to implement the user interface.

1 / 3 • text generation in Tamil using llama concept

model quantizer

here we quantize the model into smaller size.

Computer Vision

1.) Color detection using python:

I had implemented the code for color detection for the Tritanopia disease (Tritanopia (Tritanopia is a rare type of genetic color blindness that affects a person’s ability to distinguish between the colors blue and yellow. It occurs when the blue-sensitive cones (also known as short-wavelength cones or S cones) in the retina are either not functioning or completely missing.) by using the computer vision I had implement the following screenshots. we are using the python module like cv2 for stimulate the camera.

it shows the color change

2.) Mouse Gesture control using python:

In this model we have use the webcam for to control the mouse keys. it can be able to use our hands for control the primary click.

1 / 4

Text to speech recognition:

In this model we have an ability to covert the text to generation is stores to the project folder

1 / 2

Voice to text conversion:

it also does the similar thing like get the voice from user and save in txt extension when it executes it get the input from user to select the language.

voice to text conversion

C:\Users\(your_path)>git clone https://github.com/JAYASIMMA/AMD_Hack.git

clone the repository

C:\Users\(your_path)>pip install -r requirements.txt

after install all the requirements create virtual Environment in python

C:\Users\(your_path)\amd_hack>cd project
C:\Users\(your_path)\projects>python main_app.py

after running this simultaneously running another module Computer visions

C:\Users\(your_path)>cd ..
C:\Users\(your_path)>cd computer_vision
cd color_detection
python main.py
cd ..
cd live_mouse_control_using_hand_gestures
python main.py
cd ..
cd virual_mouse
cd mouse
cd scripts
code .
by using jupyter run > Hand_Gesture_Mouse.ipynb

Generativa AI

Install the ollama

enter to the cmd

ollama pull conceptsintamil/tamil-llama-7b-instruct-v0.2

test the model if it is run correctly

ollama run conceptsintamil/tamil-llama-7b-instruct-v0.2

then

C:\Users\(your_path)>code .
cd ..
cd ..
cd ..
cd ..
cd ollama
pip -m venv venv
cd venv\scripts
activate.bat
cd ..
pip install -r requirements.txt
streamlit run app.py
cd ..
cd quantize 
streamlit run app.py

Code

import tkinter as tk
import speech_recognition as sr
import datetime
import gtts
import random
import wikipedia
import pyjokes
import os
import webbrowser
import pyttsx3
import pywhatkit
import playsound

current_language = "en"


def respond(query):
    if "stop" in query or "" in query:
        simplyspeak("Exiting" if current_language == "en" else "")
        root.quit()
    elif "what is your name" in query or "  " in query:
        simplyspeak("My name is Keran" if current_language == "en" else "  ")
    elif "search" in query or "" in query:
        search = recordaudio(ask="What do you want to search?" if current_language == "en" else "   ?")
        url = "https://google.com/search?q=" + search
        webbrowser.open(url)
        simplyspeak("Searching " + search + " on the web." if current_language == "en" else search + "  .")
    elif "play" in query or "" in query:
        song = query.replace('play', '').replace('', '')
        simplyspeak('Playing ' + song if current_language == "en" else song + ' ')
        pywhatkit.playonyt(song)
    elif "send" in query or "" in query:
        message = query.replace('send', '').replace('', '')
        simplyspeak('Sending ' + message if current_language == "en" else message + ' ')

        pywhatkit.sendwhatmsg("+1234567890", message, datetime.datetime.now().hour, datetime.datetime.now().minute + 1)
    elif "open app" in query or " " in query:
        app = query.replace('open app', '').replace(' ', '')
        simplyspeak('Opening ' + app if current_language == "en" else app + ' ')

    elif "close app" in query or " " in query:
        app = query.replace('close app', '').replace(' ', '')
        simplyspeak('Closing ' + app if current_language == "en" else app + ' ')

    elif "what is my name" in query or "  " in query:
        simplyspeak("Your name is Jayasimma" if current_language == "en" else "  ")
    elif "joke" in query or "" in query:
        simplyspeak(pyjokes.get_joke())
    elif "who is the" in query or "" in query:
        person = query.replace('who is the', '').replace('', '')
        ans = wikipedia.summary(person, 1)
        simplyspeak(ans)
    elif "time" in query or "" in query:
        current_time = datetime.datetime.now().strftime('%H:%M %p')
        simplyspeak('Current time is ' + current_time if current_language == "en" else '  ' + current_time)
    elif "hi" in query or "" in query:
        simplyspeak('Hi, Hello! I am your assistant. How can I help you?' if current_language == "en" else '!   .     ?')
    elif "weather" in query or "" in query:
        climate = recordaudio(ask="Which region's weather do you want to search?" if current_language == "en" else "     ?")
        url = "https://weatherspark.com/y/109356/Average-Weather-in-Karur-India-Year-Round#Figures-Summary" + climate
        webbrowser.open(url)
        simplyspeak("Searching weather for " + climate + " on the web." if current_language == "en" else climate + "    .")
    elif "news" in query or "" in query:
        url = "https://www.dailythanthi.com/"
        webbrowser.open(url)
    elif "music" in query or "" in query:
        pad = recordaudio(ask="Which music do you want to play?" if current_language == "en" else "   ?")
        url = "https://open.spotify.com/search" + pad
        webbrowser.open(url)
        simplyspeak("Searching for " + pad + " on the web." if current_language == "en" else pad + "  .")
    else:
        simplyspeak("I'm not sure how to help with that." if current_language == "en" else "     .")
    return query

# Function to record audio
def recordaudio(ask=False):
    r = sr.Recognizer()
    r.energy_threshold = 100
    voicetext = ''
    if ask:
        simplyspeak(ask)
    try:
        with sr.Microphone() as source:
            audio = r.listen(source)
            voicetext = r.recognize_google(audio, language=current_language)
            print(voicetext)
    except sr.UnknownValueError:
        simplyspeak("Unable to recognize your voice, please speak louder." if current_language == "en" else "    ,   .")
    except sr.RequestError:
        simplyspeak("Unable to find the result." if current_language == "en" else "  .")
    return voicetext


def simplyspeak(strdata):
    print(strdata)
    tts = gtts.gTTS(text=strdata, lang=current_language)
    audiofile = "audio-" + str(random.randint(1, 10000)) + ".mp3"
    tts.save(audiofile)
    playsound.playsound(audiofile)
    os.remove(audiofile)

def toggle_language():
    global current_language
    current_language = "ta" if current_language == "en" else "en"
    lang_button.config(text="Switch to English" if current_language == "ta" else " ")


def on_query_enter(event=None):
    query = user_input.get()
    user_input.set("")
    respond(query)

root = tk.Tk()
root.title("Voice Assistant")

frame = tk.Frame(root)
frame.pack(pady=20)

label = tk.Label(frame, text="Ask me anything:" if current_language == "en" else "  :")
label.pack(side=tk.LEFT, padx=5)

user_input = tk.StringVar()
entry = tk.Entry(frame, textvariable=user_input, width=50)
entry.pack(side=tk.LEFT, padx=5)
entry.bind("<Return>", on_query_enter)

button = tk.Button(frame, text="Submit", command=on_query_enter)
button.pack(side=tk.LEFT, padx=5)

lang_button = tk.Button(frame, text=" ", command=toggle_language)
lang_button.pack(side=tk.LEFT, padx=5)

root.mainloop()

import cv2
import mediapipe as mp
import pyautogui
import random
import util
from pynput.mouse import Button, Controller
mouse = Controller()


screen_width, screen_height = pyautogui.size()

mpHands = mp.solutions.hands
hands = mpHands.Hands(
    static_image_mode=False,
    model_complexity=1,
    min_detection_confidence=0.7,
    min_tracking_confidence=0.7,
    max_num_hands=1
)


def find_finger_tip(processed):
    if processed.multi_hand_landmarks:
        hand_landmarks = processed.multi_hand_landmarks[0]  # Assuming only one hand is detected
        index_finger_tip = hand_landmarks.landmark[mpHands.HandLandmark.INDEX_FINGER_TIP]
        return index_finger_tip
    return None, None


def move_mouse(index_finger_tip):
    if index_finger_tip is not None:
        x = int(index_finger_tip.x * screen_width)
        y = int(index_finger_tip.y / 2 * screen_height)
        pyautogui.moveTo(x, y)


def is_left_click(landmark_list, thumb_index_dist):
    return (
            util.get_angle(landmark_list[5], landmark_list[6], landmark_list[8]) < 50 and
            util.get_angle(landmark_list[9], landmark_list[10], landmark_list[12]) > 90 and
            thumb_index_dist > 50
    )


def is_right_click(landmark_list, thumb_index_dist):
    return (
            util.get_angle(landmark_list[9], landmark_list[10], landmark_list[12]) < 50 and
            util.get_angle(landmark_list[5], landmark_list[6], landmark_list[8]) > 90  and
            thumb_index_dist > 50
    )


def is_double_click(landmark_list, thumb_index_dist):
    return (
            util.get_angle(landmark_list[5], landmark_list[6], landmark_list[8]) < 50 and
            util.get_angle(landmark_list[9], landmark_list[10], landmark_list[12]) < 50 and
            thumb_index_dist > 50
    )


def is_screenshot(landmark_list, thumb_index_dist):
    return (
            util.get_angle(landmark_list[5], landmark_list[6], landmark_list[8]) < 50 and
            util.get_angle(landmark_list[9], landmark_list[10], landmark_list[12]) < 50 and
            thumb_index_dist < 50
    )


def detect_gesture(frame, landmark_list, processed):
    if len(landmark_list) >= 21:

        index_finger_tip = find_finger_tip(processed)
        thumb_index_dist = util.get_distance([landmark_list[4], landmark_list[5]])

        if util.get_distance([landmark_list[4], landmark_list[5]]) < 50  and util.get_angle(landmark_list[5], landmark_list[6], landmark_list[8]) > 90:
            move_mouse(index_finger_tip)
        elif is_left_click(landmark_list,  thumb_index_dist):
            mouse.press(Button.left)
            mouse.release(Button.left)
            cv2.putText(frame, "Left Click", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        elif is_right_click(landmark_list, thumb_index_dist):
            mouse.press(Button.right)
            mouse.release(Button.right)
            cv2.putText(frame, "Right Click", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
        elif is_double_click(landmark_list, thumb_index_dist):
            pyautogui.doubleClick()
            cv2.putText(frame, "Double Click", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 0), 2)
        elif is_screenshot(landmark_list,thumb_index_dist ):
            im1 = pyautogui.screenshot()
            label = random.randint(1, 1000)
            im1.save(f'my_screenshot_{label}.png')
            cv2.putText(frame, "Screenshot Taken", (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 0), 2)


def main():
    draw = mp.solutions.drawing_utils
    cap = cv2.VideoCapture(0)

    try:
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break
            frame = cv2.flip(frame, 1)
            frameRGB = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            processed = hands.process(frameRGB)

            landmark_list = []
            if processed.multi_hand_landmarks:
                hand_landmarks = processed.multi_hand_landmarks[0]  # Assuming only one hand is detected
                draw.draw_landmarks(frame, hand_landmarks, mpHands.HAND_CONNECTIONS)
                for lm in hand_landmarks.landmark:
                    landmark_list.append((lm.x, lm.y))

            detect_gesture(frame, landmark_list, processed)

            cv2.imshow('Frame', frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
    finally:
        cap.release()
        cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

import numpy as np
import cv2


def get_limits(color):
    #HSV Values not RGB
    colors = {
        'blue': ([88, 100, 100], [140, 255, 255]),  # Lower and upper limits for blue
        'yellow': ([20, 100, 100], [30, 255, 255]),  # Lower and upper limits for yellow
        'green': ([40, 100, 100], [80, 255, 255]),  # Lower and upper limits for green

    }
    color = color.lower()
    if color in colors:
        lower_limit, upper_limit = colors[color]
        lower_limit = np.array(lower_limit, dtype=np.uint8)
        upper_limit = np.array(upper_limit, dtype=np.uint8)
        return lower_limit, upper_limit
    else:
        return None, None


def adjust_brightness(img, factor=1.2):
    # Convert to HSV to adjust brightness
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    h, s, v = cv2.split(hsv)
    v = np.clip(v * factor, 0, 255).astype(np.uint8)
    hsv = cv2.merge([h, s, v])
    return cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)


def detect_color(frame, color_list):
    hsv_image = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    detected = []

    for color_name in color_list:
        lower_limit, upper_limit = get_limits(color_name)
        if lower_limit is not None and upper_limit is not None:
            mask = cv2.inRange(hsv_image, lower_limit, upper_limit)

            # Apply adaptive thresholding
            _, mask = cv2.threshold(mask, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

            # Apply morphological operations to reduce noise
            kernel = np.ones((9, 9), np.uint8)  # Larger kernel for noise reduction
            mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel)

            # Find contours and filter based on area
            contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
            for cnt in contours:
                area = cv2.contourArea(cnt)
                if area > 100:  # Adjust area threshold as needed
                    x, y, w, h = cv2.boundingRect(cnt)
                    detected.append((color_name, (x, y, x + w, y + h)))  # Store bounding box coordinates

    return detected


def main():
    cap = cv2.VideoCapture(0)
    while True:
        ret, frame = cap.read()

        if ret:
            # Mirror the frame horizontally
            mirrored_frame = cv2.flip(frame, 1)  # 1 for horizontal flip
            # Brighten the frame (adjust the factor as needed)
            brightened_frame = adjust_brightness(mirrored_frame,1.5)

            colors_to_detect = ['blue', 'yellow', 'green']
            detected_colors = detect_color(brightened_frame, colors_to_detect)

            for color_name, bbox in detected_colors:
                x1, y1, x2, y2 = bbox
                brightened_frame = cv2.rectangle(brightened_frame, (x1, y1), (x2, y2), (0, 255, 0), 3)
                cv2.putText(brightened_frame, color_name.capitalize(), (x1, y2 + 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                            (0, 255, 255), 2, cv2.LINE_AA)

            # Display the instruction to press ESC key to close in black color
            cv2.putText(brightened_frame, "Press ESC key to close", (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                        (0, 0, 0), 2, cv2.LINE_AA)

            cv2.imshow('frame', brightened_frame)

            key = cv2.waitKey(1) & 0xFF
            if key == 27:  # Check if the key pressed is the Escape key (ASCII 27)
                break

    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

Credits

Thanks to Abhinand Balachandran.

Comments

Please log in or sign up to comment.