A minimal Node.js starter project demonstrating how to call a locally-hosted LLM using Ollama and track token usage. Perfect for learning the basics of integrating AI into JavaScript applications without API costs or cloud dependencies.
- Node.js (v18 or higher)
- Ollama installed on your machine
Download and install Ollama:
- Go to https://ollama.com/download
- Download for your operating system (Windows/Mac/Linux)
- Run the installer
- Ollama will start automatically on Windows/Mac
Verify Ollama is installed:
ollama --versionYou should see something like: ollama version is 0.1.23
Check if the Ollama service is active:
curl http://localhost:11434You should see: Ollama is running
If Ollama is NOT running:
- On Windows/Mac: Ollama usually starts automatically. Check Task Manager (Windows) or Activity Monitor (Mac) for an "Ollama" process.
- If not running, start it manually:
ollama serveKeep this terminal window open.
List all downloaded models:
ollama listExpected output:
NAME ID SIZE MODIFIED
llama3.2:latest a80c4f17acd5 2.0 GB 2 days ago
phi4:latest ac896e5b8b34 9.1 GB 1 week ago
If you have NO models installed, download one:
Recommended models:
# Lightweight and fast (2GB)
ollama pull llama3.2:latest
# Or larger and more capable (9GB)
ollama pull phi4:latest
# Other options:
ollama pull llama3.2:1b # Even smaller (1.3GB)
ollama pull gemma2:2b # Google's model (1.6GB)
ollama pull mistral:latest # Popular alternative (4.1GB)Wait for the download to complete (shows progress bar).
Test Ollama directly before running the app:
ollama run llama3.2:latestType a message and press Enter. If you get a response, everything is working!
Type /bye to exit the chat.
Navigate to your project directory:
cd local-llm-nodejs-templateOpen src/llmCall.js and verify the model name matches what you have installed.
The model name in the code:
model: 'llama3.2:latest', // Must match output from 'ollama list'If you want to use a different model, change this line to match exactly what you see in ollama list.
For example, to use phi4:
model: 'phi4:latest',Make sure Ollama is running, then execute:
node src/index.jsOr use the npm script:
npm start🤖 Making a test call to local LLM (Llama 3.2)...
📝 Response: The Andromeda galaxy, our closest galactic neighbor, is approaching us at a speed of approximately 250,000 miles per hour and is expected to collide with the Milky Way in around 4 billion years.
📊 Token Usage (estimated):
Input tokens: 13
Output tokens: 43
Total tokens: 56
The application will:
- Send a test prompt to your local LLM (via Ollama)
- Receive a response
- Display the response text
- Show estimated token usage:
- Input tokens (tokens in your prompt)
- Output tokens (tokens in the response)
- Total tokens used
Note: Token counts are estimates based on word count, not exact like cloud APIs provide.
To test with different prompts, edit src/index.js:
const result = await callLLM("Your custom prompt here");You have multiple models installed. To switch:
- Check available models:
ollama list - Edit
src/llmCall.jsand change the model name - Run the script again
Model comparison:
llama3.2:latest(2GB) - Fast, good for most tasksphi4:latest(9GB) - More capable, slower, better reasoning
local-agent-practice/
├── package.json # Node.js configuration
├── .gitignore # Files to ignore in git
├── README.md # This file
└── src/
├── index.js # Main entry point
└── llmCall.js # Ollama API call handler
✅ Completely free - No API costs
✅ Private - Data never leaves your machine
✅ Offline - Works without internet
✅ Fast - No network latency
✅ Full control - Choose any model you want
❌ Less capable than largest cloud models (GPT-4, Claude Opus)
❌ Hardware dependent - Performance varies by CPU/GPU
❌ Disk space - Models are 1-10GB each
❌ Estimated tokens - Not as precise as cloud API counts
Running this application in production requires hosting both your Node.js application AND Ollama on a server.
What you have now:
- Ollama runs on your laptop
- Node.js script runs locally
- Perfect for learning and testing
Cost: Free
Use case: Personal development, learning, testing
Architecture:
VPS Server
├── Ollama (running as service)
└── Node.js app (calls local Ollama)
Best for:
- Personal projects
- Internal tools
- Low-traffic applications
Hosting providers:
CPU-based servers (cheaper, slower inference):
- DigitalOcean Droplets - $12/month (4GB RAM)
- Linode - $12/month (4GB RAM)
- Vultr - $12/month (4GB RAM)
- Hetzner - €9/month (4GB RAM, Europe)
GPU-based servers (expensive, fast inference):
- Paperspace - $8+/month
- AWS EC2 with GPU - $0.50-3/hour
- DigitalOcean GPU Droplets - $30+/month
- RunPod - $0.20-1/hour (pay as you go)
Setup on VPS:
- Rent a server (minimum 4GB RAM, 20GB disk)
- SSH into your server
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - Download model:
ollama pull llama3.2:latest - Install Node.js
- Clone your repository
- Run your app:
node src/index.js
To keep it running 24/7:
# Install PM2 process manager
npm install -g pm2
# Start your app with PM2
pm2 start src/index.js --name "llm-app"
# Make it restart on server reboot
pm2 startup
pm2 saveCost: $12-50+/month depending on specs
Pros:
- Full control
- Predictable costs
- Data stays on your server
- Can run 24/7
Cons:
- You manage everything
- Need server administration skills
- Must handle updates and security
Architecture:
Cloud Platform (Railway/Render/Fly.io)
├── Ollama (in container)
└── Node.js app
Platforms that support Docker (needed for Ollama):
- Railway - $5/month minimum, usage-based
- Render - $7+/month
- Fly.io - Free tier, then usage-based
- Google Cloud Run - Pay per use
- AWS ECS - Pay per use
Setup on Railway (example):
- Create a
Dockerfilein your project root:
FROM ollama/ollama:latest
# Install Node.js
RUN apt-get update && apt-get install -y nodejs npm curl
# Copy your app
WORKDIR /app
COPY package*.json ./
COPY src ./src
# Install dependencies
RUN npm install
# Download model (this happens at build time)
RUN ollama serve & \
sleep 5 && \
ollama pull llama3.2:latest
# Start both Ollama and your app
CMD ollama serve & \
sleep 5 && \
node src/index.js- Push to GitHub
- Connect Railway to your repository
- Deploy automatically
Cost: $7-25/month typically
Pros:
- Easier than managing VPS
- Auto-deployment from Git
- Automatic HTTPS
- Better developer experience
Cons:
- More expensive than raw VPS at scale
- Less control than VPS
- Container size limits may apply
Architecture:
User → Frontend (Netlify/Vercel) → Your API (VPS with Ollama) → Local LLM
Best for:
- Web applications
- Multiple clients accessing same LLM
- When you want a separate frontend
Setup:
- Deploy frontend (React/Vue/etc) to Netlify/Vercel (free)
- Build an Express.js API that wraps your Ollama calls
- Deploy API to VPS or Railway
- Frontend makes requests to your API
Example Express.js wrapper:
import express from "express";
import { callLLM } from "./llmCall.js";
const app = express();
app.use(express.json());
app.post("/api/generate", async (req, res) => {
try {
const result = await callLLM(req.body.prompt);
res.json(result);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000);Cost:
- Frontend: Free (Netlify/Vercel)
- Backend API: $12-50/month (VPS) or $7-25/month (PaaS)
Architecture:
User → Your App → Routes to either:
├── Ollama (for simple queries)
└── Claude/GPT API (for complex queries)
Best for:
- Cost optimization
- Balancing quality and price
- High-volume applications
Implementation: Add routing logic to choose which LLM based on query complexity:
async function smartLLMCall(prompt, complexity = "simple") {
if (complexity === "simple") {
return await callOllama(prompt); // Free, local
} else {
return await callClaudeAPI(prompt); // Paid, better quality
}
}Cost: Variable, optimized based on usage
Choose your deployment based on:
- Just learning or personal use? → Keep running locally (what you have now)
- Need 24/7 availability? → Deploy to VPS ($12+/month)
- Want easy deployment? → Use Railway/Render ($7-25/month)
- Building a web app? → Split architecture (Frontend free, Backend $12+/month)
- Need fast inference? → Get GPU server ($30+/month) or use cloud APIs instead
- High volume + budget conscious? → Hybrid approach (local + API)
For a service running 24/7:
| Option | Monthly Cost | Setup Complexity | Performance |
|---|---|---|---|
| Local (your laptop) | $0 | Easy | Medium |
| CPU VPS | $12 | Medium | Medium |
| GPU VPS | $30-100 | Medium | Fast |
| Railway/Render | $15-40 | Easy | Medium |
| AWS/GCP Serverless | $10-50 | Hard | Variable |
For comparison:
- Claude API (if you used it instead): ~$3-10/month for light usage
- Self-hosting makes sense when: You exceed $12/month in API costs OR need privacy
- DO NOT commit Ollama models to Git
- Models stay on whatever machine runs Ollama
- Your repository only contains code, not models
- Each server needs to download models separately
- Commit your code (
src/,package.json,README.md) - Don't commit
node_modules/(it's in.gitignore) - Document which Ollama models are required
- Llama3.2 is fast enough for most uses on modern CPUs
- GPU greatly speeds up larger models like phi4
- For production, monitor response times and adjust model accordingly
- Ollama is running but model name is wrong
- Run
ollama listand verify exact model name - Update
src/llmCall.jswith correct model name
- Ollama service isn't running
- Run
curl http://localhost:11434to test - Start Ollama with
ollama serveif needed - Check Task Manager (Windows) or Activity Monitor (Mac) for Ollama process
- Model isn't downloaded
- Run
ollama pull llama3.2:latest - Wait for download to complete
- Model too large for your hardware
- Try smaller model:
ollama pull llama3.2:1b - Consider GPU server for production
- Model requires more RAM than available
- Use smaller model or upgrade RAM
- Close other applications
Beginner projects:
- Modify prompts and observe different responses
- Compare llama3.2 vs phi4 outputs
- Build a simple question-answering system
- Create a CLI chatbot with conversation history
Intermediate projects:
- Add Express.js to create a REST API
- Build a web interface (React/Vue + your API)
- Implement conversation memory/context
- Create a document Q&A system
Advanced projects:
- Deploy to production VPS
- Implement streaming responses
- Build a multi-agent system
- Create a RAG (Retrieval Augmented Generation) app
- Ollama Documentation
- Ollama Model Library
- Ollama API Reference
- Node.js Best Practices
- Awesome Ollama - Community projects