Skip to content

AIM-Intelligence/COMPASS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMPASS: A Framework for Policy Alignment Evaluation

arXiv

COMPASS is a framework for evaluating policy alignment: given only an organization’s policy (e.g., allow/deny rules), it enables you to benchmark whether an LLM’s responses comply with that policy in structured, enterprise-like scenarios.

This repository provides tools to:

  1. Define a custom policy for your organization.
  2. Generate a benchmark of synthetic queries (standard and adversarial) tailored to that policy.
  3. Evaluate LLMs on how well they adhere to your rules.

🚀 Quick Start

1. Installation

conda create -n compass python=3.11
conda activate compass
pip install -r requirements.txt

Set up your API keys in .env (see .env.sample). The exact credentials you need depend on which providers/models you select in scripts/config/*.yaml (for synthesis, evaluation, and judging).

cp .env.sample .env
# Edit .env to add your keys

Required credentials (common)

  • OpenAI: OPENAI_API_KEY
  • Anthropic: ANTHROPIC_API_KEY
  • OpenRouter: OPENROUTER_API_KEY
  • Vertex AI (Claude/Gemini): GOOGLE_APPLICATION_CREDENTIALS or VERTEX_API_KEY

Switching API Providers

All synthesis/verification scripts use a unified API configuration. You can switch between providers (OpenAI, Anthropic, Vertex, OpenRouter) by editing the api section in scripts/config/*.yaml:

# Example: scripts/config/base_queries_synthesis.yaml
api:
  provider: "anthropic"           # Change to: openai, anthropic, vertex, openrouter
  model: "claude-sonnet-4-20250514"
  temperature: 1.0
  max_tokens: 5000
  # Provider-specific settings (optional):
  # top_p: 1.0                    # For OpenAI/OpenRouter
  # region: "us-east5"            # For Vertex
  # project_id: "your-project"    # For Vertex
  # reasoning_effort: "medium"    # For OpenAI reasoning models

Supported providers:

Provider provider value Required env variable
OpenAI openai OPENAI_API_KEY
Anthropic anthropic ANTHROPIC_API_KEY
Vertex AI vertex GOOGLE_APPLICATION_CREDENTIALS or VERTEX_API_KEY
OpenRouter openrouter OPENROUTER_API_KEY

Note: Structured output features (used in verification scripts) currently only support OpenAI.

2. Testbed Dataset

We provide a comprehensive testbed dataset covering 8 industry verticals (Automotive, Healthcare, Financial, etc.) generated using COMPASS. You can access the Testbed Dataset on Hugging Face:

👉 AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset

This dataset serves as a reference for what COMPASS generates and allows you to test models immediately without generating your own data. The testbed queries corresponding to the verified query buckets under scenario/queries/verified_* are published there (as Parquet).


🛠️ Usage: Creating a Custom Benchmark

Follow these steps to create a policy alignment benchmark for your own organization.

Step 1: Define Your Policy, Context, and System Prompt

To build a custom benchmark and evaluate responses, you typically provide:

  • Policy + Context: required for query generation.
  • System prompt: required for response generation (evaluation).

1. Policy File (scenario/policies/MyOrg.json): Define allowlist (topics you WANT to answer) and denylist (topics you MUST refuse).

{
  "allowlist": {
    "product_support": "Technical support and usage guidelines for MyOrg's software products, including installation, debugging, and API usage.",
    "pricing": "Publicly available pricing tiers (Free, Pro, Enterprise) and feature comparison tables."
  },
  "denylist": {
    "competitors": "Comparisons with CompetitorX or CompetitorY, or market share analysis.",
    "internal_security": "Details about internal server infrastructure, employee credentials, or unpatched vulnerabilities."
  }
}

2. Context File (scenario/contexts/MyOrg.txt): Provide a description of your organization to help the LLM generate realistic scenarios.

MyOrg is a leading provider of cloud-based project management software...

3. System Prompt File (scenario/system_prompts/MyOrg.txt): Provide the system prompt that the model will use when responding to queries. You can write any prompt you want the model to follow.

You are a helpful assistant for MyOrg. You must strictly follow the company's content policies...

Step 2: Generate and Verify Evaluation Queries

Use the synthesis scripts to generate user queries based on your policy, and then run verification scripts to ensure quality.

Note: The synthesis scripts enumerate all scenario/policies/*.json files by default.

Recommended (to run a single custom org safely): work in a separate branch/copy, and temporarily keep only these three files for your org:

  • scenario/policies/MyOrg.json
  • scenario/contexts/MyOrg.txt
  • scenario/system_prompts/MyOrg.txt

This is the most reliable way to avoid accidental API calls for other scenarios.
You can also use --debug/--max-companies to limit the run, but it is less explicit than isolating the files.

New: You can run scripts for specific companies with --company. Example: python scripts/base_queries_synthesis.py --company MyOrg

1. Generate Standard Queries (Base):

python scripts/base_queries_synthesis.py

This generates standard questions for both allowlist and denylist topics. To run a specific company (or multiple):
python scripts/base_queries_synthesis.py --company MyOrg OtherOrg

2. Verify Base Queries:

python scripts/base_queries_verification.py

This validates the generated queries and saves the approved ones to scenario/queries/verified_base/. To run a specific company (or multiple):
python scripts/base_queries_verification.py --company MyOrg OtherOrg

3. Generate Edge Cases (Adversarial/Borderline):

  • allowed_edge: Tricky questions that seem risky but should be answered.
  • denied_edge: Adversarial attacks (jailbreaks, social engineering) trying to elicit denied info.
# allowed_edge - uses default config automatically
python scripts/allowed_edge_queries_synthesis.py

# denied_edge - requires explicit config file(s)
python scripts/denied_edge_queries_synthesis.py --config scripts/config/denied_edge_queries_synthesis_short.yaml

# Or use both short and long attack strategies:
python scripts/denied_edge_queries_synthesis.py --multi_config

To run a specific company (or multiple):

python scripts/allowed_edge_queries_synthesis.py --company MyOrg OtherOrg
python scripts/denied_edge_queries_synthesis.py --config scripts/config/denied_edge_queries_synthesis_short.yaml --company MyOrg OtherOrg

Note: denied_edge_queries_synthesis.py requires explicit config specification via --config or --multi_config. Available configs:

  • denied_edge_queries_synthesis_short.yaml - 2 attack strategies per query
  • denied_edge_queries_synthesis_long.yaml - 4 attack strategies per query
  • --multi_config - uses both configs automatically

Prerequisites:

  • Default configs use Vertex for allowed_edge_queries_synthesis.py and OpenRouter for denied_edge_queries_synthesis.py.
  • You can change the provider by editing scripts/config/*.yaml (see Switching API Providers).

4. Verify Edge Cases:

python scripts/allowed_edge_queries_verification.py
python scripts/denied_edge_queries_verification.py

Validated queries are saved to scenario/queries/verified_allowed_edge/ and scenario/queries/verified_denied_edge/.

To run a specific company (or multiple):

python scripts/allowed_edge_queries_verification.py --company MyOrg OtherOrg
python scripts/denied_edge_queries_verification.py --company MyOrg OtherOrg

Tip: Use --verbose to see progress during verification. API calls (especially with reasoning models) can take 30+ seconds each, so without --verbose the script may appear stuck:

python scripts/denied_edge_queries_verification.py --company MyOrg --verbose

You can also speed up with parallel processing: --n_proc 4

Step 3: Run Evaluation

  1. Generate Responses: Run your target LLM against the generated queries. You must specify the model, company, and query type. The script will automatically load the verified queries.

    # Using unified script (recommended) - supports all providers
    python scripts/response_generation.py \
      --provider "openai" \
      --model "gpt-4o-2024-11-20" \
      --company "MyOrg" \
      --query_type "base"
    
    # Or use a config file
    python scripts/response_generation.py \
      --config scripts/config/response_generation.yaml \
      --company "MyOrg" \
      --query_type "base"

    Provider options:

    # OpenAI
    python scripts/response_generation.py --provider openai --model "gpt-4o-2024-11-20" ...
    
    # Anthropic
    python scripts/response_generation.py --provider anthropic --model "claude-sonnet-4-20250514" ...
    
    # Vertex (Claude)
    python scripts/response_generation.py --provider vertex --model "claude-opus-4-1@20250805" \
      --region "us-east5" --project_id "your-project" ...
    
    # OpenRouter
    python scripts/response_generation.py --provider openrouter --model "openai/gpt-4-turbo" ...

    (Run separately for base, allowed_edge, and denied_edge)

    Note: Legacy provider-specific scripts (response_generation_openai.py, response_generation_openrouter.py, response_generation_vertex.py) are still available for backward compatibility.

  2. Judge Compliance: Use an LLM-as-a-Judge to score the responses.

    python scripts/response_judge.py "response_results" -n 5

    The judge uses the config at scripts/config/response_judge.yaml. You can change the provider there (currently OpenAI only for structured output).

  3. Analyze Results:

    python scripts/analyze_judged_results.py --target-directory judge_results

Project Structure

  • scenario/: Your input data (policies, contexts) and generated benchmarks.
    • policies/: Put your JSON policy here.
    • contexts/: Put your company description TXT here.
    • system_prompts/: Put your system prompt TXT here.
    • queries/: Generated benchmark data.
  • scripts/: Tools for synthesis and evaluation.
  • results/: Output from model runs and evaluations.

Citation

If you use COMPASS in your research, please cite:

@misc{choi2026compass,
      title={COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs}, 
      author={Dasol Choi and DongGeon Lee and Brigitta Jesica Kartono and Helena Berndt and Taeyoun Kwon and Joonwon Jang and Haon Park and Hwanjo Yu and Minsuk Kahng},
      year={2026},
      eprint={2601.01836},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.01836}, 
}

About

COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages