AccessiCommand is a hackathon project aimed at creating an alternative and intuitive way for users to interact with their computers using facial gestures, hand gestures, and voice commands. Traditional input methods (keyboard/mouse) can be limiting for various reasons, including accessibility needs or simply providing a different kind of user experience (like gaming). AccessiCommand provides a flexible framework to map these alternative inputs to standard system actions.
- Multimodal Input: Supports control via:c
- Voice Commands (Configurable keywords)
- Facial Gestures (Blinks, eyebrow raise, mouth open, head tilt)
- Hand Gestures (Open palm, fist, thumbs up, pointing index, victory sign)
- Configurable Bindings: Easily map specific voice commands, facial gestures, or hand gestures to system actions (like pressing keyboard keys, mouse clicks, or executing shortcuts).
- Intuitive GUI: Simple graphical interface to start/stop the input detection engine and configure custom bindings.
- Separate UI Voice Control: Use voice commands (e.g., "Computer, start engine", "Computer, open configuration") to control the application's GUI itself, even before the main detection engine is running.
- Combined Visual Detection: Process face and hand tracking from a single camera feed simultaneously.
- Modular Architecture: Designed with separate components for detection (Voice, Face, Hand), core logic (Engine), actions (Registry, System Actions), and configuration (ConfigManager, config file), making it extensible.
- Offline Voice Transcription: Uses Whisper via
speech_recognitionfor fast, local speech-to-text.
AccessiCommand follows a modular architecture:
- Detectors (
detectors/): Independent modules (voice_detector.py,facial_detector.py,hand_detector.py) responsible only for detecting specific events from input streams (microphone, camera). They do not know what action should result. When an event is detected (e.g., "go" heard, "left eyebrow raised start"), they report it by calling theEngine'shandle_eventmethod. - Engine (
core/engine.py): The central coordinator.- Initializes based on configuration loaded by the
ConfigManager. - Creates instances of the configured Detectors, providing them its
handle_eventmethod as a callback. - Manages camera capture for visual detectors and distributes frames.
- When
handle_eventis called by a detector, the Engine looks up the incoming event in its loadedbindings. - If a matching binding is found, it uses the
ActionRegistryto find the corresponding action function and executes it. - Runs visual processing in a background thread.
- Initializes based on configuration loaded by the
- Configuration (
config/):config.json: The primary file (in the project root) storing user-defined bindings and settings.manager.py: Contains theConfigManagerclass responsible for reading and writingconfig.json.
- Actions (
actions/):registry.py: Maps Action IDs (like "PRESS_SPACE", "MOUSE_CLICK_LEFT") defined inconfig.jsonto actual Python functions.system_actions.py: Contains the low-level Python functions (usingpyautogui) that perform operations on the computer.
- User Interface (
ui/):main.py: The application entry point, initializes the main GUI window and the coreEngine.main_window.py: Defines the main application window with Start/Stop and Configure buttons. Interacts with theEngine.config_dialog.py: Defines the dialog window for adding, deleting, and saving bindings toconfig.json. Interacts withConfigManagerandEngine(to trigger reload).
- UI Voice Commander (
ui_commander.py): A separate script/process that runs alongside the main app. It listens for specific UI command phrases (like "start engine") and usespyautoguito simulate clicks on the main application window buttons.
- Clone the Repository (if applicable):
git clone https://github.com/YourRepo/AccessiCommand.git # Replace with your actual repo URL cd FacialGestures
- Create a Virtual Environment:
python -m venv .venv
- Activate the Virtual Environment:
- On Windows Command Prompt or PowerShell:
.\.venv\Scripts\activate
- On Linux or macOS (or Git Bash on Windows):
source .venv/bin/activate
- On Windows Command Prompt or PowerShell:
- Install Dependencies:
pip install -r requirements.txt
- (If you don't have
requirements.txt, manually install the key packages:pip install mediapipe opencv-python pyautogui speechrecognition pyaudio openai-whisper) - PyAudio Note for Windows:
pip install PyAudiocan sometimes fail on Windows. If it does, you'll need to download a pre-compiled.whlfile from https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio that matches your Python version (cp311for 3.11) and architecture (win_amd64) and install it manually (e.g.,pip install "PyAudio‑0.2.11‑cp311‑cp311‑win_amd64.whl"). - Whisper Model: The
openai-whisperlibrary will automatically download the necessary model file (e.g.,tiny.en.pt) on its first use, which might take a moment.
- (If you don't have
- Ensure
__init__.pyFiles Exist: Verify that you have an empty file named__init__.pyinside theaccessicommanddirectory and all its subdirectories (actions,config,core,detectors,ui). Python needs these to recognize directories as packages. - Create Default Configuration: Create a file named
config.jsonin the root directory (FacialGestures/) with your desired initial bindings and settings. See the example in the Architecture section above or create a basic one:{ "bindings": [ { "trigger_type": "voice", "trigger_event": "go", "action_id": "PRESS_RIGHT" }, { "trigger_type": "face", "trigger_event": "MOUTH_OPEN_START", "action_id": "PRESS_K_DOWN" }, { "trigger_type": "face", "trigger_event": "MOUTH_OPEN_STOP", "action_id": "PRESS_K_UP" } // Add more bindings here via the GUI later ], "settings": { "voice_detector": {}, // Use defaults "facial_detector": {"camera_index": 0}, // Use camera 0 for face "hand_detector": {"camera_index": 0} // Use camera 0 for hand (can be same or different) } } - Find PyAutoGUI Coordinates (for
ui_commander.py):- Run the main application:
python accessicommand/main.py - Position the window where you want it.
- Open a separate terminal (with venv active) and run:
python -c "import pyautogui, time; print('Hover mouse over target in 5s...'); time.sleep(5); print(f'Coords: {pyautogui.position()}')" - Quickly move your mouse over the center of the "START ENGINE", "STOP ENGINE", and "CONFIGURE BINDINGS" buttons and note the coordinates.
- Open
ai_commander.pyand replace the placeholder coordinates in theBUTTON_COORDSdictionary with the coordinates you found.
- Run the main application:
-
Start the Main Application:
- Open your terminal in the
AccessiCommand/root directory. - Activate your virtual environment (
.\.venv\Scripts\activateorsource .venv/bin/activate). - Run the main application script:
python accessicommand/main.py
- This will open the GUI window and initialize the core engine components (but they won't be actively detecting yet).
- Open your terminal in the
-
Start the UI Voice Commander (Optional):
- Open a separate terminal window (or use a split terminal).
- Navigate to the
AccessiCommand/root directory. - Activate your virtual environment.
- Run the standalone UI commander script:
python ai_commander.py
- This script will start listening for UI command phrases immediately.
-
Control the GUI (Using
ui_commander.py):- With both windows/scripts running, activate the terminal where
ai_commander.pyis running. - Speak clearly: Say the activation phrase ("computer") followed by a command.
- Examples:
- "Computer, start engine"
- "Computer, open configuration"
- "Computer, stop engine" (Only works if engine is running)
- "Computer, close application" (Will attempt to close the main window)
- Watch the console output of
ai_commander.pyto see the transcription and the simulated click action. Watch the main GUI to see if the corresponding button is clicked.
- With both windows/scripts running, activate the terminal where
-
Control the Computer (Using Configured Bindings):
- In the main GUI window, click the "START ENGINE" button (or use the voice command "Computer, start engine").
- The GUI status should change to "Engine Running". The console will show the engine and detectors starting (including camera initialization).
- Now, perform facial or hand gestures, or speak system trigger words defined in your
config.json(e.g., "go", "back"). - The corresponding actions (like pressing keys via
pyautogui) should be triggered.
-
Configure Bindings:
- Click the "CONFIGURE BINDINGS" button (or use the voice command "Computer, open configuration").
- A new configuration window will open.
- Add new bindings by selecting Type (voice, face, hand), Event (keyword or gesture event), and Action (from the list).
- Select existing bindings and click "Delete Selected" to remove them.
- Click "Save & Close".
- Console output will show the configuration being saved. The engine will be stopped (if running). The GUI buttons will reset to the "Stopped" state.
- To use the new bindings, click "START ENGINE" again (or use the voice command) to restart the engine with the updated configuration.
-
Stopping:
- To stop the AI Commander, press
Ctrl+Cin the terminal where you ranai_commander.py. - To stop the main application, close the GUI window or press
Ctrl+Cin the terminal where you ranpython accessicommand/main.py.
- To stop the AI Commander, press
- Python
- Tkinter / ttk (for GUI)
- MediaPipe (for Face and Hand tracking)
- OpenCV (
cv2) (for camera access and image manipulation) - SpeechRecognition (for voice input and Whisper transcription)
- PyAudio (for microphone access)
- PyAutoGUI (for simulating system input)
- threading (for running detection loops in background)