Resonote is a next-generation Optical Music Recognition (OMR) platform designed to bridge the gap between static sheet music and digital interactivity. Leveraging the multimodal vision capabilities of Google's Gemini 3 Pro and Gemini 2.5 models, Resonote analyzes complex musical scores and converts them into semantically valid ABC notation in real-time.
Traditional OMR software often struggles with handwritten scores, complex polyphony, and lyric alignment. Resonote addresses these challenges by utilizing Large Multimodal Models (LMMs) to "reason" about the visual structure of music rather than relying solely on heuristic pixel-matching algorithms. The system generates high-fidelity ABC notation that preserves:
- Key and Time Signatures
- Multi-voice arrangements (polyphony)
- Lyric syllabification and alignment
- Dynamics and articulation markings
The application features a fully integrated development environment (IDE) for music, allowing users to scan, edit, listen to, and export their scores instantly.
- AI-Powered Transcription: Utilizes Gemini 3 Pro Vision to digitize images (PNG, JPG, SVG) into editable text.
- Syntax Validation Agent: Implements a self-correcting loop where the AI validates its own generated ABC code against the
abcjsparser to ensure syntactical correctness before outputting the result. - Real-time Rendering: Instant visual feedback using vector-based music engraving.
- In-Browser Synthesis: High-performance audio playback with tempo control, looping, and instrument selection.
- Professional UI: A flat, high-contrast Material You design interface optimized for dark mode and accessibility.
- Feedback Integration: Automated GitHub Issue generation for efficient bug tracking and feature requests.
- React 19: Utilizing the latest concurrent features for optimal UI performance.
- TypeScript: Statically typed codebase for robustness and maintainability.
- Vite: Next-generation frontend tooling.
- Google GenAI SDK: Direct integration with Gemini 1.5 Pro, Gemini 2.5 Flash, and Gemini 3 Pro Preview.
- Chain-of-Thought Prompting: specialized system instructions to force analytical reasoning before code generation.
- abcjs: Industry-standard library for parsing and rendering ABC notation in the browser.
- Tailwind CSS: Utility-first CSS framework.
- Material Symbols: Google's variable font icon set.
- Input: User uploads an image via the
UploadZonecomponent. - Preprocessing: Image is converted to Base64 and optimized for token usage.
- Inference:
- The client establishes a session with the Google Gemini API.
- A multimodal prompt containing the image and specific OMR constraints is sent.
- The model employs a "Thinking" process to analyze the score structure.
- Validation Loop:
- The model generates a candidate ABC string.
- The system executes a tool call (
validate_abc_notation) to check for syntax errors. - If errors exist, the model self-corrects based on the error log.
- Rendering: The final validated string is streamed to the
Editorand rendered by theMusicDisplaycomponent.
To run this project locally, ensure you have Node.js (v18+) installed.
git clone https://github.com/IRedDragonICY/resonote.git
cd resonotenpm installCreate a .env file in the root directory. You must obtain an API key from Google AI Studio.
API_KEY=your_google_ai_studio_api_key_herenpm run devThe application will be available at http://localhost:5173.
Contributions are welcome. Please strictly adhere to the following guidelines:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Please ensure all new code is typed correctly and passes the existing linting rules.
Distributed under the MIT License. See LICENSE for more information.
- Author: IRedDragonICY (Mohammad Farid Hendianto)
- Engine: Powered by Google Gemini