Skip to content

PawanRamaMali/edgemodelr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

202 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

edgemodelr

Local Large Language Model Inference Engine for R

CRAN Status R Package MIT License R-CMD-check

edgemodelr enables R users to run large language models locally using GGUF model files and the llama.cpp inference engine. Perfect for data scientists who need privacy-preserving AI capabilities integrated seamlessly into their R workflows.

✨ Key Features

  • πŸ”’ Complete Privacy - All inference runs locally, no data leaves your machine
  • ⚑ High Performance - Leverages optimized llama.cpp C++ implementation
  • 🎯 R-Native Interface - Seamlessly integrates with R data science workflows
  • πŸ’Ύ Memory Efficient - Supports quantized models for resource-constrained environments
  • πŸ› οΈ Zero Dependencies - Self-contained installation with no external requirements
  • πŸ”„ Universal Compatibility - Works with any GGUF model from Hugging Face, Ollama, or custom sources

πŸ“¦ Installation

From CRAN (Recommended)

install.packages("edgemodelr")

Development Version

# Install from GitHub
devtools::install_github("PawanRamaMali/edgemodelr")

System Requirements

  • R: Version 4.0 or higher
  • Compiler: C++17 compatible compiler (GCC, Clang, or MSVC)
  • Memory: 4GB+ RAM (varies by model size)
  • Storage: 1GB+ for model files

πŸš€ Quick Start

1. Basic Text Generation

library(edgemodelr)

# One-line setup with automatic model download
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context

# Generate text
result <- edge_completion(ctx,
                         prompt = "Explain what R is in one sentence:",
                         n_predict = 50)
cat("Response:", result, "\n")

# Clean up when done
edge_free_model(ctx)

2. Interactive Chat Session

library(edgemodelr)

# Setup model
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context

# Start interactive chat with streaming responses
edge_chat_stream(ctx,
  system_prompt = "You are a helpful R programming assistant.",
  max_history = 5,
  n_predict = 200,
  temperature = 0.7)

# Type your questions and get real-time responses!
# Type 'quit' to exit

3. Use Your Existing Models

library(edgemodelr)

# Automatically find and use existing GGUF models from Ollama, Downloads, etc.
models_info <- edge_find_gguf_models()

if (length(models_info$models) > 0) {
  # Use any compatible model
  first_model <- models_info$models[[1]]
  ctx <- edge_load_model(first_model$path)

  # Same API as always
  result <- edge_completion(ctx, "What is machine learning?", n_predict = 100)
  cat(result)

  edge_free_model(ctx)
}

πŸ“š Core API Reference

Model Management

Function Description
edge_load_model(path, n_ctx, n_gpu_layers) Load a GGUF model for inference
edge_free_model(ctx) Release model memory
is_valid_model(ctx) Check if model context is valid
edge_quick_setup(model_name) One-line model download and setup

Text Generation

Function Description
edge_completion(ctx, prompt, n_predict, temperature, top_p) Generate text completion
edge_stream_completion(ctx, prompt, callback, ...) Stream tokens in real-time
edge_chat_stream(ctx, system_prompt, max_history, ...) Interactive chat session

Model Discovery & Download

Function Description
edge_find_gguf_models(source_dirs, model_pattern, ...) Find existing GGUF models
edge_list_models() List pre-configured popular models
edge_download_model(model_id, filename) Download from HuggingFace
edge_download_url(url, filename) Download from any direct URL (GPT4All, etc.)

Performance Optimization

Function Description
edge_small_model_config(model_size_mb, available_ram_gb, target) Get optimized settings for small models
edge_benchmark(ctx, prompt, n_predict, iterations) Benchmark model performance
edge_set_verbose(enabled) Control logging verbosity

⚑ Performance Optimizations for Small Models

NEW in v0.1.4: Automatic optimizations for small language models (1B-3B parameters)!

library(edgemodelr)

# Get optimized configuration for your device
config <- edge_small_model_config(
  model_size_mb = 700,      # TinyLlama size
  available_ram_gb = 8,     # Your system RAM
  target = "laptop"         # mobile/laptop/desktop/server
)

# View recommendations
print(config$tips)

# Load model with optimized settings
ctx <- edge_load_model(
  "path/to/model.gguf",
  n_ctx = config$n_ctx,
  n_gpu_layers = config$n_gpu_layers
)

# Faster inference with optimized parameters
result <- edge_completion(
  ctx,
  prompt = "Hello!",
  n_predict = config$recommended_n_predict,
  temperature = config$recommended_temperature
)

Key Optimizations:

  • πŸš€ 15-30% faster inference for small models through adaptive batch sizing
  • πŸ’Ύ Reduced memory usage with context-aware thread allocation
  • πŸ“± Device-specific presets optimized for mobile, laptop, desktop, and server
  • 🎯 Automatic tuning based on model size and available RAM

πŸ€– Available Models

All models download directly without authentication - ready for offline use!

Small Models (Testing & Development)

Model Size Source Use Case Command
TinyLlama-1.1B ~700MB HuggingFace Testing, development edge_quick_setup("TinyLlama-1.1B")
TinyLlama-OpenOrca ~700MB HuggingFace Better chat edge_quick_setup("TinyLlama-OpenOrca")

Medium Models (4-5GB)

Model Size Source Use Case Command
Llama-3-8B ~4.7GB GPT4All CDN General purpose edge_quick_setup("llama3-8b")
Mistral-7B ~4.1GB GPT4All CDN High quality edge_quick_setup("mistral-7b")
Phi-3-mini ~2.4GB HuggingFace Reasoning edge_quick_setup("phi3-mini")

Large Models (7-9GB) - NEW!

Model Size Source Use Case Command
Orca-2-13B ~7.4GB GPT4All CDN 13B Chat model edge_quick_setup("orca2-13b")
WizardLM-13B ~7.4GB GPT4All CDN 13B Instruct edge_quick_setup("wizardlm-13b")
Hermes-13B ~7.4GB GPT4All CDN 13B Chat edge_quick_setup("hermes-13b")
Starcoder ~9GB GPT4All CDN Code generation (15B) edge_quick_setup("starcoder")

Note: GPT4All models download directly from their CDN - no HuggingFace account or authentication required!

πŸ’‘ Use Cases & Examples

Data Science Workflows

library(edgemodelr)

# Analyze survey responses
analyze_sentiment <- function(texts, model_name = "Mistral-7B") {
  setup <- edge_quick_setup(model_name)
  ctx <- setup$context

  results <- data.frame(
    text = texts,
    sentiment = sapply(texts, function(text) {
      prompt <- paste("Analyze sentiment (positive/negative/neutral):", text)
      trimws(edge_completion(ctx, prompt, n_predict = 10))
    }),
    stringsAsFactors = FALSE
  )

  edge_free_model(ctx)
  return(results)
}

# Example usage
survey_data <- c(
  "This product exceeded my expectations!",
  "Poor customer service, very disappointed.",
  "It's okay, nothing special but works fine."
)

sentiment_results <- analyze_sentiment(survey_data)
print(sentiment_results)

Code Generation Assistant

library(edgemodelr)

# R programming helper
setup <- edge_quick_setup("CodeLlama-7B")
ctx <- setup$context

code_prompt <- "Create an R function to calculate correlation matrix with p-values:"
response <- edge_completion(ctx,
  paste("# R Programming Task\n", code_prompt, "\n\n# Solution:\n"),
  n_predict = 250,
  temperature = 0.3)

cat(response)
edge_free_model(ctx)

Batch Document Processing

library(edgemodelr)

# Process multiple documents
summarize_documents <- function(file_paths, model_name = "mistral-7b") {
  setup <- edge_quick_setup(model_name)
  ctx <- setup$context

  summaries <- sapply(file_paths, function(path) {
    text <- readLines(path, warn = FALSE)
    content <- paste(text, collapse = " ")

    prompt <- paste("Summarize this document in 2-3 sentences:", content)
    edge_completion(ctx, prompt, n_predict = 100)
  })

  edge_free_model(ctx)
  return(summaries)
}

πŸ”₯ Large Model Examples (7-9GB)

End-to-End: Orca-2-13B (7GB Model)

Complete example downloading and running a 13 billion parameter model:

library(edgemodelr)

# =============================================================
# Step 1: Download and Setup (one-time, ~7.4GB download)
# =============================================================
cat("Setting up Orca-2-13B (7.4GB)...\n")
cat("Source: GPT4All CDN (direct download, no auth required)\n\n")

setup <- edge_quick_setup("orca2-13b")
ctx <- setup$context

cat("Model loaded successfully!\n")
cat("Model path:", setup$model_path, "\n\n")

# =============================================================
# Step 2: Math Test
# =============================================================
cat("--- Math Test ---\n")
prompt <- "What is 123 + 456? The answer is"

result <- edge_completion(ctx, prompt, n_predict = 30, temperature = 0.1)
cat("Prompt:", prompt, "\n")
cat("Response:", trimws(gsub(prompt, "", result, fixed = TRUE)), "\n\n")
# Expected output: "579"

# =============================================================
# Step 3: Knowledge Test
# =============================================================
cat("--- Knowledge Test ---\n")
prompt <- "The theory of relativity was developed by"

result <- edge_completion(ctx, prompt, n_predict = 50, temperature = 0.1)
cat("Prompt:", prompt, "\n")
cat("Response:", trimws(gsub(prompt, "", result, fixed = TRUE)), "\n\n")
# Expected output: "Albert Einstein in the early 20th century..."

# =============================================================
# Step 4: Reasoning Test
# =============================================================
cat("--- Reasoning Test ---\n")
prompt <- "If all cats are animals, and some animals are pets, can we conclude that some cats are pets? Answer:"

result <- edge_completion(ctx, prompt, n_predict = 80, temperature = 0.3)
cat("Prompt:", prompt, "\n")
cat("Response:", trimws(gsub(prompt, "", result, fixed = TRUE)), "\n\n")
# Expected: Logical reasoning about the syllogism

# =============================================================
# Step 5: Cleanup
# =============================================================
edge_free_model(ctx)
cat("Model freed. Done!\n")

Actual Test Results (Orca-2-13B)

These results were obtained on a Windows 11 system with CPU-only inference:

Test Prompt Response Time Tokens/sec
Math "What is 123 + 456?" 579 (correct!) 103s 0.29
Knowledge "The theory of relativity was developed by" Albert Einstein in the early 20th century 261s 0.15
Reasoning Syllogism about cats/animals/pets Correct logical deduction with step-by-step reasoning 415s 0.12

End-to-End: Direct URL Download

Download any GGUF model from a direct URL:

library(edgemodelr)

# Download from GPT4All CDN (or any direct URL)
model_path <- edge_download_url(
  url = "https://gpt4all.io/models/gguf/orca-2-13b.Q4_0.gguf",
  filename = "orca-2-13b.Q4_0.gguf"
)

# Load and use
ctx <- edge_load_model(model_path, n_ctx = 2048)
result <- edge_completion(ctx, "Explain quantum computing:", n_predict = 100)
cat(result)
edge_free_model(ctx)

End-to-End: Starcoder (9GB Code Model)

library(edgemodelr)

# Setup Starcoder - a 15B parameter code generation model
setup <- edge_quick_setup("starcoder")
ctx <- setup$context

# Generate Python code
prompt <- "def quicksort(arr):"
result <- edge_completion(ctx, prompt, n_predict = 150, temperature = 0.2)
cat(result)

# Generate R code
prompt2 <- "# R function to calculate moving average\nmoving_avg <- function(x, n) {"
result2 <- edge_completion(ctx, prompt2, n_predict = 100, temperature = 0.2)
cat(result2)

edge_free_model(ctx)

Performance Comparison by Model Size

Model Parameters File Size Load Time Tokens/sec (CPU)
TinyLlama-1.1B 1.1B 637 MB 0.1s 1-2
Phi-3-mini 3.8B 2.3 GB 0.3s 0.15-0.2
Llama-3.2-3B 3.2B 1.9 GB 0.4s 0.2-0.3
Mistral-7B 7B 4.1 GB 0.5s 0.1-0.15
Orca-2-13B 13B 6.9 GB 0.4s 0.12-0.29
Starcoder 15B 8.4 GB 0.6s 0.08-0.12

Note: Inference speed depends heavily on your CPU. GPU acceleration can increase speed 10-50x.

⚑ Performance & Hardware

Hardware Requirements by Model Size

Model Size File Size Min RAM Recommended RAM CPU Cores Typical Speed
1B params ~700 MB 4 GB 8 GB 2+ cores 1-2 tok/s
3B params ~2 GB 8 GB 12 GB 4+ cores 0.2-0.3 tok/s
7B params ~4 GB 12 GB 16 GB 4+ cores 0.1-0.15 tok/s
13B params ~7 GB 16 GB 24 GB 6+ cores 0.1-0.3 tok/s
15B params ~9 GB 20 GB 32 GB 8+ cores 0.08-0.12 tok/s

Tip: For faster inference, use GPU acceleration with n_gpu_layers parameter.

Performance Optimizations

  • SIMD Instructions: Automatic AVX/AVX2/FMA detection
  • Multi-threading: Uses all available CPU cores
  • Memory Efficiency: Optimized batch processing
  • Smart Quantization: Q4_K_M and Q5_K_M support

Benchmark Your Setup

library(edgemodelr)

# Test your system performance
setup <- edge_quick_setup("TinyLlama-1.1B")
ctx <- setup$context

# Run benchmark
perf <- edge_benchmark(ctx, iterations = 5)
print(perf)

edge_free_model(ctx)

πŸ”§ Advanced Configuration

GPU Acceleration (Experimental)

# Enable GPU layers (requires compatible hardware)
ctx <- edge_load_model("model.gguf",
                      n_ctx = 2048,
                      n_gpu_layers = 35)  # Offload layers to GPU

Custom Model Sources

# Search specific directories for models
models <- edge_find_gguf_models(
  source_dirs = c("~/Downloads", "~/my_models"),
  model_pattern = "llama",
  min_size_mb = 100,
  test_compatibility = TRUE
)

Streaming with Custom Callbacks

# Advanced streaming with progress tracking
result <- edge_stream_completion(ctx, prompt,
  callback = function(data) {
    if (!data$is_final) {
      cat(data$token)
      flush.console()

      # Custom progress logic
      if (data$total_tokens %% 10 == 0) {
        cat(sprintf(" [%d tokens]", data$total_tokens))
      }
    }
    return(TRUE)  # Continue generation
  },
  n_predict = 200
)

πŸ› οΈ Troubleshooting

Common Issues

Installation Problems:

# Ensure build tools are available
install.packages(c("Rcpp", "devtools"))

# Check compiler
system("gcc --version")  # Should show C++17 support

Memory Issues:

  • Use smaller models (TinyLlama for testing)
  • Reduce n_ctx parameter
  • Try quantized models (Q4_K_M variant)

Performance Issues:

  • Ensure all CPU cores are being used
  • Check available RAM
  • Consider GPU acceleration for large models

Getting Help

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/PawanRamaMali/edgemodelr.git
cd edgemodelr
R CMD build .
R CMD check edgemodelr_*.tar.gz

Areas for Contribution

  • πŸ§ͺ Testing with different models and platforms
  • πŸ“š Documentation and examples
  • ⚑ Performance optimizations
  • πŸ”§ New features and integrations

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

Built with ❀️ using:

  • llama.cpp - Fast LLM inference engine
  • GGML - Machine learning tensor library
  • Rcpp - R and C++ integration

Special thanks to the open-source AI community for making local LLM inference accessible to everyone.


⭐ Star this repo if you find edgemodelr useful for your R projects!

About

Run LLM models natively from R

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •