Unsupervised mouse cursor detection and tracking in instructional videos using tracking-by-detection.
# Install dependencies
poetry install && poetry shell
# From YouTube URL - single command for everything
python cursor_tracker.py \
--url https://youtube.com/watch?v=VIDEO_ID \
--output-dir ./data/my_video
# View results
open data/my_video/our_results_1/tracked_video_our_results_1.mp4- Fully Unsupervised: Automatically discovers cursor templates, no manual annotation needed
- End-to-End Pipeline: YouTube URL to Download to Extract to Track to Visualize
- Robust Tracking: Handles fast motion (over 200px per frame) and instant appearance changes
- Visual Output: Generates annotated videos with bounding boxes around detected cursors
- Unsupervised Template Discovery: Uses background subtraction + blob detection to identify cursor templates
- Multi-Scale Template Matching: Generates cursor proposals for each frame
- Spatiotemporal Path Optimization: Finds optimal tracking trajectory through entire video
- Visualization: Draws bounding boxes on frames and creates annotated video
# Install Poetry
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
git clone https://github.com/yourusername/CursorTracker.git
cd CursorTracker
poetry install
poetry shell
# Create directories
mkdir -p data templates saved_modelsBasic usage:
python cursor_tracker.py \
--url "https://youtube.com/watch?v=VIDEO_ID" \
--output-dir ./data/my_videoOptions:
# Custom quality
--quality 1080p # Options: 144p, 360p, 480p, 720p, 1080p, 1440p, 2160p
# Process specific frames
--start-frame 100 --end-frame 500
# Skip tracking (preprocessing only)
--skip-tracking
# Custom configuration
--config my_config.yaml# Step 1: Preprocess video
python preprocess_video.py \
--video_path /path/to/video.mp4 \
--output_dir ./data/my_video \
--extract_templates
# Step 2: Track cursor
python cursor_tracker_dp.py \
--video_name my_video \
--base_dir ./data
# Step 3: Visualize (optional - automatic with YouTube pipeline)
python visualize_results.py \
--video_name my_video \
--base_dir ./datadata/my_video/
├── original_video.mp4 # Downloaded video
├── images/ # Extracted frames
├── background/ # Background masks
├── estimated_templates/ # Auto-discovered cursor templates
└── our_results_1/
├── our_results.txt # Tracking results (CSV)
├── visualizations/ # Annotated frames
└── tracked_video_our_results_1.mp4 # Annotated video
Visualizations are generated automatically when using cursor_tracker.py.
python visualize_results.py \
--video_name my_video \
--base_dir ./data \
--bbox_color "0,255,0" \ # Green (BGR format)
--bbox_thickness 2 \
--fps 30 \
--quality 9Edit config/config.yaml to customize:
template_matching:
score_threshold: 0.5 # Min template match score
use_laplacian: true # Edge detection
template_vicinity: 300 # Temporal window for templates
max_scale: 2 # Max template scale factor
nms_overlap_threshold: 0.3 # IoU threshold for NMS
tracking:
enabled: true # Enable path optimization
dist_threshold: 150 # Max pixel distance between frames
scale_threshold: 1.3 # Max scale change ratioTested on 8 Adobe Photoshop instructional videos (3595 frames):
| Method | VIOU | Success Rate |
|---|---|---|
| CursorTracker (Ours) | 0.365 | ~87% |
| Faster-RCNN | 0.05 | ~25% |
| Online Trackers (TLD/MIL) | 0.03 | ~15% |
- Speed: ~0.5 seconds/frame (1280×720)
- Robustness: Handles 200+ pixel movements and instant appearance changes
| Script | Purpose |
|---|---|
cursor_tracker.py |
Main pipeline: YouTube → Track → Visualize |
preprocess_video.py |
Extract frames + background masks from local video |
extract_templates.py |
Discover cursor templates from preprocessed data |
cursor_tracker_dp.py |
Run cursor tracking with DP path optimization |
visualize_results.py |
Generate annotated frames and video |
Core:
- Python >=3.10,<3.13
- OpenCV, NumPy, scikit-image
- PyYAML, tqdm, imageio
- youtube-downloader (git dependency)
Optional (install with poetry install --with ml):
- TensorFlow, Keras (for CNN filtering)
@inproceedings{cursortracker2020,
title={Mouse Cursor Detection and Tracking in Instructional Videos},
booktitle={IEEE Winter Conference on Applications of Computer Vision (WACV)},
year={2020}
}Few templates discovered?
- Check background subtraction quality in
background/folder - Adjust
--consecutive_framesparameter in template extraction
Poor tracking results?
- Tune
dist_thresholdandscale_thresholdin config - Try adjusting
score_threshold(lower = more proposals)
Out of memory?
- Reduce
template_vicinityparameter - Process in segments with
--start-frame/--end-frame
- Apply MOG background subtraction
- Detect blobs (moving objects) in difference images
- Track sequences where exactly 1 blob appears for N consecutive frames
- Extract and save cursor templates from these sequences
- Select templates from temporal vicinity of current frame
- Generate multi-scale template versions
- Perform normalized cross-correlation matching
- Apply non-maximum suppression to proposals
- Model as graph optimization problem
- Find highest-scoring spatiotemporal path through video
- Enforce distance and scale constraints between consecutive frames
- Output optimal cursor trajectory
Key Insight: Cursors in screencasts exhibit unique motion signatures (movement while background stays static), enabling unsupervised discovery without labeled training data.
MIT