In pursuit of more intuitive and efficient human-computer interaction, we developed a hand gesture-based system designed to control computer functions. Our project aimed to transcend traditional input methods by enabling users to adjust volume and brightness levels effortlessly through natural hand movements. Building upon core functionalities, we iteratively expanded the system's capabilities to include mouse control, culminating in a comprehensive solution that enhances user experience and productivity.
By leveraging computer vision techniques and advanced algorithms, we were able to transform hand gestures into actionable commands. This project represents a significant step towards a future where technology seamlessly adapts to human behavior.
We are incredibly grateful to AMD for providing us with a powerful mini-PC as part of the AMD PC AI category, which served as the backbone of our project. This cutting-edge hardware enabled us to experiment and iterate rapidly.
Project OverviewOur project involved developing a system that utilizes Google's MediaPipe library for real-time hand tracking and gesture recognition. The goal was to create a versatile application capable of controlling volume, brightness, and even mouse movement through intuitive hand gestures.
Due to time constraints imposed by our academic commitments, we were unable to fully realize our vision of a comprehensive application. However, we made significant progress in developing core functionalities and laying the groundwork for future enhancements.
Tools and TechnologiesHardware- AMD Mini-PC: Provided the computational power for real-time image processing and gesture recognition.
Development Environment:
- Visual Studio Code (VSC): Used as the primary code editor for project development.
- Jupyter Notebook: Employed for rapid prototyping and data exploration during initial stages.
- GitHub: Utilized for version control, collaboration, and code sharing among team members.
Libraries
- OpenCV: Employed for computer vision tasks, including image processing and video capture.
- MediaPipe: Leveraged for real-time hand tracking and landmark detection.
- NumPy: Used for numerical operations and array manipulation.
- AutoPy: Interacted with the operating system for mouse control and other system-level functions.
- PyCaw: Utilized for volume control on Windows operating systems.
- Win10Toast: Implemented for displaying system notifications on Windows 10.
These tools and libraries collectively formed the foundation for developing and implementing the hand gesture controlled system.
Project DevelopmentSource: Hand landmarks detection guide for Python | Google AI Edge | Google for Developers
We structured our development process in three distinct versions:
Version 1: Core Functionalities
- Focus: Volume and brightness control
Version 2: Mouse Control Integration
- Focus: Expanding control capabilities
Version 3: Optimization and Gesture Refinement
- Focus: Enhancing performance and user experience
Focus: Establishing basic volume and brightness control using hand gestures.
Process:
Calibration:
- To ensure accurate gesture interpretation, the system begins with a calibration phase.
- Users are prompted to perform various hand movements to establish a range of distances between their thumb and index finger.
- These measurements serve as reference points for subsequent gesture analysis
Real-time Hand Tracking:
- During the control phase, the system continuously captures video frames from the user's webcam.
- Advanced computer vision techniques are employed to detect and track the user's hands in real time.
- The system specifically focuses on identifying the thumb and index finger.
Gesture Recognition and Control:
- The distance between the tracked thumb and index finger is calculated for each frame.
- This distance is compared to the calibration data to determine the desired action (increase, decrease, or maintain volume/brightness).
- Based on the hand's position (left or right), the system adjusts either the volume or brightness accordingly.
- Changes are reflected in the system settings in real time.
Key Components:
- MediaPipe Library: Provides essential tools for hand detection and tracking.
- Image Processing: Techniques for analyzing video frames and extracting relevant information.
- Gesture Recognition Algorithm: Correlates hand movements with desired system actions.
- System Interaction: Integration with operating system functions to modify volume and brightness settings.
Limitations:
- Relies solely on thumb-index finger distance for control.
- Lack of advanced features like multiple hand tracking or gesture customization.
While Version 1 provides a foundational framework, subsequent versions aimed to expand functionalities and enhance user experience.
Version 2: Expanding Control CapabilitiesBuilding upon the core functionalities established in Version 1, Version 2 focused on introducing mouse control features using hand gestures.
Focus: Integration of mouse movement and click functionalities.
Hand Gestures:
- Index Finger Extension: Controls cursor movement on the screen.
- Open Thumb: Pauses/freezes cursor movement.
- Bending Both Index and Thumb: Simulates a click action.
Implementation:
- Hand Tracking: Utilizes MediaPipe to detect and track hand landmarks in real-time.
- Gesture Recognition: Interprets specific hand postures (index finger, thumb, and both) to determine desired actions.
- Cursor Control: Maps hand movements to cursor movements on the screen, incorporating smoothing techniques for accuracy.
- Click Simulation: Detects specific finger combinations to simulate mouse clicks.
- System Integration: Interacts with the operating system to move the cursor and trigger click events.
Cursor Movement:
- Similar to Version 1, the system tracks the index finger (tip landmark) in real time.
- A mapping function converts the relative position of the finger within the webcam frame to the corresponding coordinates on the user's actual screen.
- Smoothing algorithms like the one you included can be implemented to reduce jittery cursor movement.
Click Action:
- Detecting when both the index finger and thumb are bent (using landmark positions and angles) triggers a mouse click event.
- A time delay mechanism prevents accidental clicks and ensures intentional user interaction (similar to the
click_delay
variable you used).
Benefits:
- Version 2 provides basic mouse control capabilities, enhancing user interaction with the system beyond just volume and brightness adjustment.
- This functionality can be particularly useful for users who prefer a hands-free approach to navigating their computer screens.
Limitations:
- The single-finger mouse control might lack precision compared to traditional methods like a mouse or trackpad.
- The click detection based on finger bending might need further refinement to ensure accuracy and avoid accidental clicks.
Version 2 represents a significant step forward by introducing mouse control features, paving the way for further advancements in user experience and interaction capabilities.
Version 3: Optimization and IntegrationFocus: Combining the best aspects of Versions 1 and 2 while enhancing efficiency and user experience.
Implementation:
- Merged Functionalities: Integrated volume, brightness, and mouse control functionalities into a single, cohesive system.
- Code Optimization: Streamlined code for improved performance and resource utilization.
- Refactored Structure: Organized code into modular components for better maintainability and scalability.
- Gesture Refinement: Refined hand gesture recognition for increased accuracy and responsiveness.
- User Interface Enhancements: Improved visual feedback and user experience through potential additions like progress bars or on-screen displays.
Key Improvements:
- Comprehensive control suite offering volume, brightness, and mouse management.
- Optimized performance for smoother operation.
- Enhanced code structure for easier maintenance and future expansion.
- Improved gesture accuracy and responsiveness.
Version 3 represents the culmination of the project, providing a robust and user-friendly hand gesture control system with a wide range of functionalities.
Challenges and Future WorkWhile we made substantial progress, several challenges hindered our ability to fully complete the project. These included time constraints due to academic commitments, complexities in real-time hand tracking, and the need for extensive testing and refinement.
Despite these challenges, we believe that the foundation we have laid is strong. We envision future iterations of this project incorporating advanced features such as:
- Support for multiple users
- Integration with popular operating systems (macOS, Linux)
- Development of a mobile application
- Incorporation of additional gestures for extended control capabilities
Our project represents a significant step towards creating a more intuitive and user-centric computing experience. Although we faced limitations, the knowledge and skills gained through this process are invaluable. We are confident that with continued development, this project has the potential to become a groundbreaking solution for human-computer interaction.
We're here to help! Feel free to ask us anything about our project, from technical details to the inspiration behind it.
Comments