QuickTube is a simple, clean YouTube downloader with a GUI interface. Just paste a URL and click download - that's it!
- Simple Interface - Paste URL, click download, done
- Single Video or Channel - Download one video or entire channels
- Real-time Progress - Watch downloads happen live
- Download History - Track what you've downloaded
- Quality Settings - Choose video quality (best, 1080p, 720p, 480p)
- Audio Only Option - Download as MP3
- Download Queue - Queue multiple videos, downloads one at a time automatically
- Clean UI - Matches CCL launcher theme
- YouTube Search - Search videos without leaving the app
- Video Preview Panel - See thumbnail, title, duration before downloading
- 30-Second Preview Clips - Preview videos with embedded VLC player
- Quick Links - Open video/channel in browser directly
- Batch Selection - Select multiple videos to download at once
- Open Folder Button - Quick access to downloads folder in tab bar
- AI-Powered Physical Comedy Detection - Detect falls, slaps, chases, slapstick
- Scene-Based Analysis - PySceneDetect + perceptual hash deduplication
- CLIP Model (Current) - Semantic matching (limited accuracy for specific actions)
- Future: Qwen2.5-VL-7B - Recommended upgrade for true action understanding
- Multi-Select Action Filters - Choose categories: Slapstick, Falls, Fighting, Chases, etc.
- Configurable Thresholds - Min clip length, min confidence, max duration
- Download Detected Clips - Save clips directly to
visual_clipsfolder - Re-Analyze Support - Previously analyzed videos can be re-processed with new filters
- Automatic Codec Detection - Scans downloads using FFprobe
- Compatibility Warnings - Alerts for problematic files (Opus audio, VP9/AV1 video)
- One-Click Conversion - Convert incompatible files to H.264 + AAC
- Mobile/PLEX Friendly - Ensures videos work on all devices
pip install yt-dlp customtkinter pyperclip pillow requests python-vlc- FFmpeg - For codec detection and conversion
- VLC Media Player - For embedded video preview playback
All videos save to: D:\stacher_downloads
cd D:\QuickTube
python quicktube.py- Copy YouTube video URL
- Click Paste URL (or Ctrl+V)
- Click Download Video
- Watch progress in real-time
- Done! File is in
D:\stacher_downloads
- Click Paste Multiple URLs to open the bulk-add dialog
- Paste URLs (one per line) and click Add URLs to Queue
- Or paste a single URL and click Add to Queue to add one at a time
- The queue list shows all pending videos with a count
- Click Start Queue to begin downloading
- Videos download one at a time automatically - no babysitting needed
- Progress log shows
[QUEUE]messages with remaining count - Click Clear Queue to remove all pending videos
- Copy YouTube channel URL
- Click Paste URL
- Click Download Channel
- Confirm you want to download all videos
- Videos download one by one
- All videos save to
D:\stacher_downloads\[Channel Name]\
- Enter search terms in the search box
- Click Search or press Enter
- Browse results with thumbnails on the left
- Click Preview on any result to see details in the preview panel
- Click Play 30s Preview to watch a clip embedded in the app
- Check boxes to select videos for download
- Click Download Selected to download all checked videos
- Single videos:
https://youtube.com/watch?v=... - Short links:
https://youtu.be/... - Channels:
https://youtube.com/@channelname - Playlists:
https://youtube.com/playlist?list=...
Click Settings to configure:
- Best (default) - Highest quality available
- 1080p - Full HD
- 720p - HD
- 480p - SD
- Check this to download MP3 audio only (no video)
- Check Compatibility - Scan downloads for codec issues (default: on)
- Auto Convert - Automatically convert incompatible files (default: off)
- Prefer H.264 - Request H.264 codec from YouTube when available
QuickTube automatically checks downloaded videos for compatibility issues with mobile devices and media servers like PLEX.
YouTube sometimes encodes videos with codecs that don't work well on all devices:
- Opus audio in MP4 - Won't play on many mobile devices or PLEX mobile apps
- VP9 video - Limited support on older devices
- AV1 video - Cutting-edge codec with limited device support
Videos may play fine on your computer but fail on phones, tablets, or streaming servers.
| Level | Video | Audio | Works On |
|---|---|---|---|
| Excellent | H.264 | AAC | Everything - mobile, PLEX, TV, web |
| Good | H.264 | MP3 | Most devices |
| Moderate | H.265 | AAC | Newer devices, some issues |
| Poor | Any | Opus | Computer only, fails on mobile |
| Very Poor | VP9/AV1 | Opus | Limited device support |
When a compatibility issue is detected, QuickTube offers:
- Convert Now - Convert audio to AAC for mobile compatibility
- Skip - Keep the file as-is (works on computer)
- Auto-Convert - Enable in settings to always convert problematic files
The Visual Analysis tab uses AI to detect physical comedy moments in videos with automatic scene detection and duplicate removal.
- Search - Enter a search query (e.g., "benny hill slapstick") or browse local files
- Select ONE Video - Check a single video to analyze with full progress tracking
- Set Confidence - Set minimum confidence threshold (default 15%)
- Analyze - Click "Analyze Selected Videos" to start 6-step process:
- Step 1: Download video
- Step 2: Detect scene boundaries (PySceneDetect)
- Step 3: Extract thumbnails from each scene
- Step 4: Remove duplicate scenes (perceptual hash deduplication)
- Step 5: Classify scenes with CLIP AI
- Step 6: Build clip candidates
- Preview - View thumbnail grid with scene info, click "Preview" to watch clips
- Select & Download - Check clips to keep, click "Download Selected Clips"
- Visual step indicators (1-6) showing completed/current/pending steps
- Progress bar showing overall completion
- Spinner animation during processing
- Real-time status messages
When multiple videos are selected, uses the original batch processing mode without scene-based deduplication.
| Model | Best For | How It Works | Accuracy |
|---|---|---|---|
| CLIP (Current) | Visual similarity filtering | Matches frames to text descriptions | ~29% on slapstick |
| SlowFast | Sports, defined movements | Kinetics-400 action classes | 0.27% on slapstick |
| Qwen2.5-VL-7B (Planned) | True action understanding | Temporal video reasoning | High (recommended) |
Note: CLIP cannot reliably detect specific actions like "slaps" - see limitations section below.
- Slapstick: "slapstick comedy scene", "person slapping another person", "comedic fighting"
- Falls: "person falling down", "someone tripping", "pratfall"
- Chases: "people running and chasing", "comedy chase scene", "fast motion"
- Physical Humor: "pie in face", "being pushed", "knocked over", "silly movements"
- Excluded: "dialogue", "scenery", "credits" (auto-filtered)
# Separate conda environment required
conda create -n video_analysis python=3.10
conda activate video_analysis
pip install torch torchvision pytorchvideo pillow numpy imagehash
pip install git+https://github.com/openai/CLIP.git
pip install scenedetect[opencv]After extensive testing with Benny Hill slapstick videos, we discovered fundamental limitations with the CLIP-based approach:
The Problem:
- CLIP matches visual aesthetics, not temporal actions
- A single frame of a "slap" doesn't look distinctly like a slap
- CLIP sees "two people close together" rather than "someone slapping someone"
- 29% accuracy is insufficient for reliable action detection
- Many false positives from visually similar but non-action scenes
Why CLIP Fails for Action Detection:
- CLIP was trained on image-caption pairs, not video action sequences
- Physical actions like slaps, falls, and chases require temporal context (before/during/after)
- A mid-slap frame looks similar to someone waving or gesturing
We researched several alternatives for improved action detection. Hardware available: Dual RTX 4090 server with vLLM/Ollama, plus API accounts (Anthropic, Google, OpenAI).
Why This Is The Preferred Option:
- True video understanding with temporal reasoning
- Can process multiple frames and understand actions
- Natural language queries: "Find all scenes where someone gets slapped"
- Returns timestamps and descriptions
- Runs locally on 4090 (14-18GB VRAM)
Specifications:
| Property | Value |
|---|---|
| Model Size | 7B parameters |
| VRAM Required | 14-18GB (fits single 4090) |
| Framework | vLLM, Ollama, or transformers |
| Input | Video frames + text prompt |
| Output | Descriptions with timestamps |
Implementation Approach:
# Example workflow
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
# Process video in chunks
prompt = "Identify all scenes containing: slaps, falls, chases, physical comedy. Return timestamps."
result = model.generate(video_frames, prompt)
# Returns: "0:42-0:45: Man slaps another man on the back of the head"Advantages:
- Runs entirely locally (no API costs)
- True temporal understanding
- Can be fine-tuned on slapstick examples
- Supports natural language queries
- High accuracy on action recognition benchmarks
Best for highest accuracy without local GPU:
- Native video+audio processing
- Can hear slap sounds AND see actions
- Returns timestamps
- User already has Gemini account
| Property | Value |
|---|---|
| Input | Full video file (up to 2 hours) |
| Audio Analysis | Yes - detects slap sounds, screams |
| Timestamps | Yes - precise to seconds |
| Cost | ~$0.001/second of video |
Advantages:
- Highest accuracy (video + audio combined)
- No local GPU requirements
- Simple API integration
Disadvantages:
- Ongoing API costs
- Requires internet connection
- Data sent to Google servers
For detecting actions by sound:
- Slaps have distinct audio signatures
- Screams, crashes, running footsteps
- Can complement visual analysis
| Model | Purpose | Size |
|---|---|---|
| PANNs | Audio event detection | 80MB |
| Whisper-AT | Audio tagging + transcription | 1.5GB |
Best Used: As a secondary signal combined with visual analysis
For fine-grained temporal action localization:
- SlowFast, TimeSformer, VideoMAE models
- Requires training on slapstick dataset
- More complex setup
Best Used: If building a specialized slapstick detector with labeled training data
- Phase 1 (Immediate): Keep current CLIP system for basic filtering
- Phase 2 (Next): Implement Qwen2.5-VL-7B for accurate action detection
- Phase 3 (Optional): Add audio detection for slap sounds as secondary signal
- Phase 4 (Optional): Fine-tune on labeled slapstick clips for highest accuracy
Minimum:
- RTX 4090 (24GB) or RTX 3090 (24GB)
- 32GB system RAM
- 50GB disk space for model
Recommended (Current Server):
- Dual RTX 4090 (48GB total)
- vLLM for optimized inference
- Can process multiple videos in parallel
- Queue Multiple Videos - Add multiple URLs and download them one at a time
- Paste Multiple URLs - Bulk-add dialog for pasting a list of URLs (one per line)
- Add to Queue - Add single URL from the entry field to the queue
- Start Queue - Begin sequential download of all queued videos
- Queue Display - Shows pending URLs with count and status indicator
- Auto-chain - Each video starts downloading automatically after the previous one finishes
- Clear Queue - Remove all pending downloads
- CLIP Limitations Documented - CLIP cannot reliably detect temporal actions (slaps, falls)
- Research Completed - Evaluated Qwen2.5-VL-7B, Gemini API, MMAction2, audio detection
- Qwen2.5-VL-7B Recommended - True video understanding with temporal reasoning
- Hardware Ready - Dual RTX 4090 server available for local inference
- Upgrade Path Defined - Phased approach from CLIP to Qwen2.5-VL-7B
- Scene Detection - Uses PySceneDetect to find natural scene boundaries (not frame-by-frame)
- Perceptual Hash Deduplication - Removes duplicate/similar scenes automatically using imagehash
- 6-Step Progress UI - Visual step indicators, progress bar, and spinner animations
- Thumbnail Preview Grid - See scene thumbnails before downloading
- Preview Clips - Click to preview any scene with ffplay before saving
- User Selection - Check/uncheck scenes to include in download
- CLIP Model - Basic slapstick filtering (29% accuracy - limited for specific actions)
- Download Selected - Save only the scenes you want to
visual_clipsfolder
- New tab for AI-powered physical comedy detection
- Multi-select action category filters
- Max results and max duration search filters
- Min confidence threshold filtering
- Download detected clips feature
- Added embedded VLC player for preview clips
- Preview plays inside the app instead of launching external player
- 30-second clips download via yt-dlp
--download-sections
- Added Open Folder button to tab bar (always visible)
- Added Video/Channel/Preview link buttons under each search result
- Added preview panel on right side with thumbnail, title, and action buttons
- Two-column layout: search results (left) + preview panel (right)
- Fixed issue where cached videos caused wrong file to be downloaded
- Added parsing for "has already been downloaded" yt-dlp messages
- Improved fallback logic to sort temp files by modification time
- Added auto-cleanup of temp files older than 24 hours
- Added FFprobe-based codec detection
- Compatibility warnings for Opus audio in MP4
- Batch conversion tool for problematic files
- Fixed progress tracking with temp folder architecture
- Thread-safe UI updates
- Clean output without ANSI codes
- Duplicate file handling dialog
D:\stacher_downloads\
└── Video Title.mp4
D:\stacher_downloads\
└── Channel Name\
├── Video 1.mp4
├── Video 2.mp4
└── Video 3.mp4
D:\QuickTube\
├── quicktube.py # Main application
├── scene_analysis.py # Scene-based analysis (PySceneDetect + deduplication)
├── visual_analysis.py # Visual analysis module (CLIP + SlowFast)
├── audio_detection.py # Audio detection module
├── codec_utils.py # Codec detection utilities
├── settings.json # User preferences
├── download_history.json # Download history
├── processed_videos.json # Visual analysis cache
├── temp/ # Temporary download/frame folder
│ └── scene_cache/ # Downloaded videos and thumbnails for analysis
├── logs/ # Application logs
├── README.md # This file
└── FIXES.md # Bug fix documentation
- Ensure VLC Media Player is installed
- Check that python-vlc package is installed:
pip install python-vlc - Check logs in
D:\QuickTube\logs\for errors - Falls back to external player if embedded fails
- This was fixed in 2026-01-09 update
- Temp folder is now auto-cleaned every 24 hours
- If issue persists, manually clear
D:\QuickTube\temp\
- Open Firefox
- Go to youtube.com
- Make sure you're logged in (you should see your profile icon)
- Restart QuickTube
- Ensure download folder exists
- App will create folder if missing
- Falls back to subprocess if os.startfile fails
- Check your internet connection
- Video might be private or age-restricted
- Try updating yt-dlp:
pip install --upgrade yt-dlp
- Files with Opus audio may not play on mobile/PLEX
- Use the conversion feature to convert to AAC audio
- Check
D:\stacher_downloads\*_Converted\for converted files
Add QuickTube to your Command Center LaunchPad:
{
"name": "QuickTube",
"path": "D:\\QuickTube\\quicktube.py",
"category": "Media",
"monitor": 0
}- Ctrl+V - Paste URL from clipboard
- Enter - Submit search (in search tab)
YouTube requires authentication to bypass bot detection. QuickTube handles this automatically.
- Stay logged into YouTube in Firefox (your normal browser)
- QuickTube automatically reads Firefox cookies on startup
- Downloads work without any manual steps
- Firefox browser installed
- Logged into YouTube in Firefox
QuickTube uses yt-dlp behind the scenes. You can also use it directly:
# Download single video
yt-dlp -o "D:\stacher_downloads\%(title)s.%(ext)s" [URL]
# Download channel
yt-dlp -o "D:\stacher_downloads\%(uploader)s\%(title)s.%(ext)s" --yes-playlist [CHANNEL_URL]
# Audio only
yt-dlp -f bestaudio -x -o "D:\stacher_downloads\%(title)s.%(ext)s" [URL]For issues with:
- QuickTube UI: Check this README and FIXES.md
- yt-dlp downloads: Visit https://github.com/yt-dlp/yt-dlp
Built with CustomTkinter and yt-dlp | Theme matches CCL Launcher