A simplified sophisticated biometric authentication system built with Node.js, TypeScript, and Digital Signal Processing. This project transforms raw human speech into a unique 13-dimensional mathematical "fingerprint" to distinguish between a target user and an imposter.
VoicePrint ID operates on the principle of Vocal Tract Resonance. By analyzing the frequency characteristics of speech (MFCCs), the system captures the physical unique qualities of a user's voice, rather than just the words spoken.
The first challenge was ensuring the "Brain" only receives high-quality speech data, not silence or air conditioning hum.
- Hardware Interface: Interfacing with system microphones using
@picovoice/pvrecorder-node. - Voice Activity Detection (VAD): Implementing
Picovoice Cobrato gate the pipeline, only allowing frames with a confidence probability > 0.75. - Audio Normalization: Converting 16-bit PCM integers to a normalized Float32 range [-1.0, 1.0].
- Hamming Window: Applying a smoothing function to each 512-sample frame (32ms) to prevent "spectral leakage" during frequency analysis.
Turning time-domain waves into frequency-domain features.
- MFCC Analysis: Using the
Meydalibrary to calculate Mel-Frequency Cepstral Coefficients. - Feature Selection: Extracting 13 coefficients, specifically focusing on indices 1-12 to ignore overall volume (Energy) and focus purely on vocal timbre.
- Centroid Averaging: Collecting a matrix of 150 valid speech frames and collapsing them into a single 13-digit vector (the "Voice Signature").
The final logic that determines "Access Granted" or "Denied."
- Cosine Similarity: Implementing a vector math engine to measure the angular distance between a live voiceprint and the stored identity.
- Threshold Calibration: Establishing a statistical baseline (0.85) to separate "Target" speakers from "Imposters."
- Node.js (v18 or higher)
- TypeScript (
npm install -g typescript) - Picovoice AccessKey: Required for the Cobra VAD engine. Get one at console.picovoice.ai.
Clone the repository and install dependencies:
npm installBuild the project
npm run buildThe system uses a command-line argument to switch between Enrollment and Verification modes.
Record your "Standard" voice signature. Speak clearly for about 5-8 seconds until you see a log Voiceprint calculated and saved
npm run enroll <optional enrollment file name here>If the file name for the enrollment wasnt provided, a voiceprint.print.json will be generated in the /prints. If the optional file name was provided, the saved print will have the same filename as the optional argument provided.
Test your live voice against the stored signature.
npm run verify <stored voiceprint label>The system will output your Similarity Score (e.g., 95.43%) based on the calibrated threshold.
| Metric | Result |
|---|---|
| Sample Rate | 16 KHz |
| Frame Length | 512 Samples (32ms) |
| Target Match Accuracy | 87% - 98% Similarity |
| Imposter Rejection | Typically < 80% Similarity |