Local Large Language Model Inference Engine for R
edgemodelr enables R users to run large language models locally using GGUF model files and the llama.cpp inference engine. Perfect for data scientists who need privacy-preserving AI capabilities integrated seamlessly into their R workflows.
- π Complete Privacy - All inference runs locally, no data leaves your machine
- β‘ High Performance - Leverages optimized llama.cpp C++ implementation
- π― R-Native Interface - Seamlessly integrates with R data science workflows
- πΎ Memory Efficient - Supports quantized models for resource-constrained environments
- π οΈ Zero Dependencies - Self-contained installation with no external requirements
- π Universal Compatibility - Works with any GGUF model from Hugging Face, Ollama, or custom sources
install.packages("edgemodelr")# Install from GitHub
devtools::install_github("PawanRamaMali/edgemodelr")- R: Version 4.0 or higher
- Compiler: C++17 compatible compiler (GCC, Clang, or MSVC)
- Memory: 4GB+ RAM (varies by model size)
- Storage: 1GB+ for model files
library(edgemodelr)
# One-line setup with automatic model download
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context
# Generate text
result <- edge_completion(ctx,
prompt = "Explain what R is in one sentence:",
n_predict = 50)
cat("Response:", result, "\n")
# Clean up when done
edge_free_model(ctx)library(edgemodelr)
# Setup model
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context
# Start interactive chat with streaming responses
edge_chat_stream(ctx,
system_prompt = "You are a helpful R programming assistant.",
max_history = 5,
n_predict = 200,
temperature = 0.7)
# Type your questions and get real-time responses!
# Type 'quit' to exitlibrary(edgemodelr)
# Automatically find and use existing GGUF models from Ollama, Downloads, etc.
models_info <- edge_find_gguf_models()
if (length(models_info$models) > 0) {
# Use any compatible model
first_model <- models_info$models[[1]]
ctx <- edge_load_model(first_model$path)
# Same API as always
result <- edge_completion(ctx, "What is machine learning?", n_predict = 100)
cat(result)
edge_free_model(ctx)
}| Function | Description |
|---|---|
edge_load_model(path, n_ctx, n_gpu_layers) |
Load a GGUF model for inference |
edge_free_model(ctx) |
Release model memory |
is_valid_model(ctx) |
Check if model context is valid |
edge_quick_setup(model_name) |
One-line model download and setup |
| Function | Description |
|---|---|
edge_completion(ctx, prompt, n_predict, temperature, top_p) |
Generate text completion |
edge_stream_completion(ctx, prompt, callback, ...) |
Stream tokens in real-time |
edge_chat_stream(ctx, system_prompt, max_history, ...) |
Interactive chat session |
| Function | Description |
|---|---|
edge_find_gguf_models(source_dirs, model_pattern, ...) |
Find existing GGUF models |
edge_list_models() |
List pre-configured popular models |
edge_download_model(model_id, filename) |
Download from HuggingFace |
edge_download_url(url, filename) |
Download from any direct URL (GPT4All, etc.) |
| Function | Description |
|---|---|
edge_small_model_config(model_size_mb, available_ram_gb, target) |
Get optimized settings for small models |
edge_benchmark(ctx, prompt, n_predict, iterations) |
Benchmark model performance |
edge_set_verbose(enabled) |
Control logging verbosity |
NEW in v0.1.4: Automatic optimizations for small language models (1B-3B parameters)!
library(edgemodelr)
# Get optimized configuration for your device
config <- edge_small_model_config(
model_size_mb = 700, # TinyLlama size
available_ram_gb = 8, # Your system RAM
target = "laptop" # mobile/laptop/desktop/server
)
# View recommendations
print(config$tips)
# Load model with optimized settings
ctx <- edge_load_model(
"path/to/model.gguf",
n_ctx = config$n_ctx,
n_gpu_layers = config$n_gpu_layers
)
# Faster inference with optimized parameters
result <- edge_completion(
ctx,
prompt = "Hello!",
n_predict = config$recommended_n_predict,
temperature = config$recommended_temperature
)Key Optimizations:
- π 15-30% faster inference for small models through adaptive batch sizing
- πΎ Reduced memory usage with context-aware thread allocation
- π± Device-specific presets optimized for mobile, laptop, desktop, and server
- π― Automatic tuning based on model size and available RAM
All models download directly without authentication - ready for offline use!
| Model | Size | Source | Use Case | Command |
|---|---|---|---|---|
| TinyLlama-1.1B | ~700MB | HuggingFace | Testing, development | edge_quick_setup("TinyLlama-1.1B") |
| TinyLlama-OpenOrca | ~700MB | HuggingFace | Better chat | edge_quick_setup("TinyLlama-OpenOrca") |
| Model | Size | Source | Use Case | Command |
|---|---|---|---|---|
| Llama-3-8B | ~4.7GB | GPT4All CDN | General purpose | edge_quick_setup("llama3-8b") |
| Mistral-7B | ~4.1GB | GPT4All CDN | High quality | edge_quick_setup("mistral-7b") |
| Phi-3-mini | ~2.4GB | HuggingFace | Reasoning | edge_quick_setup("phi3-mini") |
| Model | Size | Source | Use Case | Command |
|---|---|---|---|---|
| Orca-2-13B | ~7.4GB | GPT4All CDN | 13B Chat model | edge_quick_setup("orca2-13b") |
| WizardLM-13B | ~7.4GB | GPT4All CDN | 13B Instruct | edge_quick_setup("wizardlm-13b") |
| Hermes-13B | ~7.4GB | GPT4All CDN | 13B Chat | edge_quick_setup("hermes-13b") |
| Starcoder | ~9GB | GPT4All CDN | Code generation (15B) | edge_quick_setup("starcoder") |
Note: GPT4All models download directly from their CDN - no HuggingFace account or authentication required!
library(edgemodelr)
# Analyze survey responses
analyze_sentiment <- function(texts, model_name = "Mistral-7B") {
setup <- edge_quick_setup(model_name)
ctx <- setup$context
results <- data.frame(
text = texts,
sentiment = sapply(texts, function(text) {
prompt <- paste("Analyze sentiment (positive/negative/neutral):", text)
trimws(edge_completion(ctx, prompt, n_predict = 10))
}),
stringsAsFactors = FALSE
)
edge_free_model(ctx)
return(results)
}
# Example usage
survey_data <- c(
"This product exceeded my expectations!",
"Poor customer service, very disappointed.",
"It's okay, nothing special but works fine."
)
sentiment_results <- analyze_sentiment(survey_data)
print(sentiment_results)library(edgemodelr)
# R programming helper
setup <- edge_quick_setup("CodeLlama-7B")
ctx <- setup$context
code_prompt <- "Create an R function to calculate correlation matrix with p-values:"
response <- edge_completion(ctx,
paste("# R Programming Task\n", code_prompt, "\n\n# Solution:\n"),
n_predict = 250,
temperature = 0.3)
cat(response)
edge_free_model(ctx)library(edgemodelr)
# Process multiple documents
summarize_documents <- function(file_paths, model_name = "mistral-7b") {
setup <- edge_quick_setup(model_name)
ctx <- setup$context
summaries <- sapply(file_paths, function(path) {
text <- readLines(path, warn = FALSE)
content <- paste(text, collapse = " ")
prompt <- paste("Summarize this document in 2-3 sentences:", content)
edge_completion(ctx, prompt, n_predict = 100)
})
edge_free_model(ctx)
return(summaries)
}Complete example downloading and running a 13 billion parameter model:
library(edgemodelr)
# =============================================================
# Step 1: Download and Setup (one-time, ~7.4GB download)
# =============================================================
cat("Setting up Orca-2-13B (7.4GB)...\n")
cat("Source: GPT4All CDN (direct download, no auth required)\n\n")
setup <- edge_quick_setup("orca2-13b")
ctx <- setup$context
cat("Model loaded successfully!\n")
cat("Model path:", setup$model_path, "\n\n")
# =============================================================
# Step 2: Math Test
# =============================================================
cat("--- Math Test ---\n")
prompt <- "What is 123 + 456? The answer is"
result <- edge_completion(ctx, prompt, n_predict = 30, temperature = 0.1)
cat("Prompt:", prompt, "\n")
cat("Response:", trimws(gsub(prompt, "", result, fixed = TRUE)), "\n\n")
# Expected output: "579"
# =============================================================
# Step 3: Knowledge Test
# =============================================================
cat("--- Knowledge Test ---\n")
prompt <- "The theory of relativity was developed by"
result <- edge_completion(ctx, prompt, n_predict = 50, temperature = 0.1)
cat("Prompt:", prompt, "\n")
cat("Response:", trimws(gsub(prompt, "", result, fixed = TRUE)), "\n\n")
# Expected output: "Albert Einstein in the early 20th century..."
# =============================================================
# Step 4: Reasoning Test
# =============================================================
cat("--- Reasoning Test ---\n")
prompt <- "If all cats are animals, and some animals are pets, can we conclude that some cats are pets? Answer:"
result <- edge_completion(ctx, prompt, n_predict = 80, temperature = 0.3)
cat("Prompt:", prompt, "\n")
cat("Response:", trimws(gsub(prompt, "", result, fixed = TRUE)), "\n\n")
# Expected: Logical reasoning about the syllogism
# =============================================================
# Step 5: Cleanup
# =============================================================
edge_free_model(ctx)
cat("Model freed. Done!\n")These results were obtained on a Windows 11 system with CPU-only inference:
| Test | Prompt | Response | Time | Tokens/sec |
|---|---|---|---|---|
| Math | "What is 123 + 456?" | 579 (correct!) | 103s | 0.29 |
| Knowledge | "The theory of relativity was developed by" | Albert Einstein in the early 20th century | 261s | 0.15 |
| Reasoning | Syllogism about cats/animals/pets | Correct logical deduction with step-by-step reasoning | 415s | 0.12 |
Download any GGUF model from a direct URL:
library(edgemodelr)
# Download from GPT4All CDN (or any direct URL)
model_path <- edge_download_url(
url = "https://gpt4all.io/models/gguf/orca-2-13b.Q4_0.gguf",
filename = "orca-2-13b.Q4_0.gguf"
)
# Load and use
ctx <- edge_load_model(model_path, n_ctx = 2048)
result <- edge_completion(ctx, "Explain quantum computing:", n_predict = 100)
cat(result)
edge_free_model(ctx)library(edgemodelr)
# Setup Starcoder - a 15B parameter code generation model
setup <- edge_quick_setup("starcoder")
ctx <- setup$context
# Generate Python code
prompt <- "def quicksort(arr):"
result <- edge_completion(ctx, prompt, n_predict = 150, temperature = 0.2)
cat(result)
# Generate R code
prompt2 <- "# R function to calculate moving average\nmoving_avg <- function(x, n) {"
result2 <- edge_completion(ctx, prompt2, n_predict = 100, temperature = 0.2)
cat(result2)
edge_free_model(ctx)| Model | Parameters | File Size | Load Time | Tokens/sec (CPU) |
|---|---|---|---|---|
| TinyLlama-1.1B | 1.1B | 637 MB | 0.1s | 1-2 |
| Phi-3-mini | 3.8B | 2.3 GB | 0.3s | 0.15-0.2 |
| Llama-3.2-3B | 3.2B | 1.9 GB | 0.4s | 0.2-0.3 |
| Mistral-7B | 7B | 4.1 GB | 0.5s | 0.1-0.15 |
| Orca-2-13B | 13B | 6.9 GB | 0.4s | 0.12-0.29 |
| Starcoder | 15B | 8.4 GB | 0.6s | 0.08-0.12 |
Note: Inference speed depends heavily on your CPU. GPU acceleration can increase speed 10-50x.
| Model Size | File Size | Min RAM | Recommended RAM | CPU Cores | Typical Speed |
|---|---|---|---|---|---|
| 1B params | ~700 MB | 4 GB | 8 GB | 2+ cores | 1-2 tok/s |
| 3B params | ~2 GB | 8 GB | 12 GB | 4+ cores | 0.2-0.3 tok/s |
| 7B params | ~4 GB | 12 GB | 16 GB | 4+ cores | 0.1-0.15 tok/s |
| 13B params | ~7 GB | 16 GB | 24 GB | 6+ cores | 0.1-0.3 tok/s |
| 15B params | ~9 GB | 20 GB | 32 GB | 8+ cores | 0.08-0.12 tok/s |
Tip: For faster inference, use GPU acceleration with
n_gpu_layersparameter.
- SIMD Instructions: Automatic AVX/AVX2/FMA detection
- Multi-threading: Uses all available CPU cores
- Memory Efficiency: Optimized batch processing
- Smart Quantization: Q4_K_M and Q5_K_M support
library(edgemodelr)
# Test your system performance
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context
# Run benchmark
perf <- edge_benchmark(ctx, iterations = 5)
print(perf)
edge_free_model(ctx)# Enable GPU layers (requires compatible hardware)
ctx <- edge_load_model("model.gguf",
n_ctx = 2048,
n_gpu_layers = 35) # Offload layers to GPU# Search specific directories for models
models <- edge_find_gguf_models(
source_dirs = c("~/Downloads", "~/my_models"),
model_pattern = "llama",
min_size_mb = 100,
test_compatibility = TRUE
)# Advanced streaming with progress tracking
result <- edge_stream_completion(ctx, prompt,
callback = function(data) {
if (!data$is_final) {
cat(data$token)
flush.console()
# Custom progress logic
if (data$total_tokens %% 10 == 0) {
cat(sprintf(" [%d tokens]", data$total_tokens))
}
}
return(TRUE) # Continue generation
},
n_predict = 200
)Installation Problems:
# Ensure build tools are available
install.packages(c("Rcpp", "devtools"))
# Check compiler
system("gcc --version") # Should show C++17 supportMemory Issues:
- Use smaller models (TinyLlama for testing)
- Reduce
n_ctxparameter - Try quantized models (Q4_K_M variant)
Performance Issues:
- Ensure all CPU cores are being used
- Check available RAM
- Consider GPU acceleration for large models
- π Function Documentation
- π Report Issues
- π¬ Discussions
- π§ Contact: prm@outlook.in
We welcome contributions! Please see our Contributing Guide for details.
git clone https://github.com/PawanRamaMali/edgemodelr.git
cd edgemodelr
R CMD build .
R CMD check edgemodelr_*.tar.gz- π§ͺ Testing with different models and platforms
- π Documentation and examples
- β‘ Performance optimizations
- π§ New features and integrations
MIT License - see LICENSE file for details.
Built with β€οΈ using:
- llama.cpp - Fast LLM inference engine
- GGML - Machine learning tensor library
- Rcpp - R and C++ integration
Special thanks to the open-source AI community for making local LLM inference accessible to everyone.
β Star this repo if you find edgemodelr useful for your R projects!