feat: Add constitutional transforms based on Anthropic Constitutional Classifiers++ paper #300

rdheekonda · 2026-01-16T01:43:55Z

Add constitutional classifiers probing transforms based on Cunningham et al., 2025 - Constitutional Classifiers++: Efficient Production-Grade Defenses Against Universal Jailbreaks (https://arxiv.org/abs/2601.04603)

Key Changes:

Add 7 new constitutional transform functions for AI red teaming
Support multiple transformation modes (static, LLM-powered, hybrid)
Add comprehensive example notebook with TAP attack integration

Added:

dreadnode/transforms/constitutional.py - Core constitutional transforms module
- Reconstruction attacks: code_fragmentation, document_fragmentation, multi_turn_fragmentation
- Obfuscation attacks: metaphor_encoding, riddle_encoding, contextual_substitution, character_separation
examples/airt/constitutional_attacks.ipynb - Complete example notebook demonstrating all transforms with TAP integration
Support for static mappings, LLM-powered generation, and hybrid modes
Integration with existing TAP (Tree of Attacks) framework

Changed:

Updated dreadnode/transforms/__init__.py to export constitutional module

Technical Details:

Implements defense probing techniques from Constitutional Classifiers++ paper
Static mode uses predefined mappings (chemistry_to_cooking domain)
LLM mode uses generative models for creative transformations
Hybrid mode combines static mappings with LLM fallback
All transforms work seamlessly with evaluation hooks and TAP attacks
Notebook outputs stripped for clean commit

Generated Summary:

Introduced a new module constitutional.py that implements Constitutional Classifier transforms.
Added metaphor encoding techniques for evading classifiers based on "Constitutional Classifiers++: Efficient Production-Grade Defenses Against Universal Jailbreaks".
Updated __init__.py to include constitutional in the exported modules and the __all__ list.
Implemented various metaphors for technical terms related to chemistry, biology, and weapons, enhancing the obfuscation capabilities.
Included support for static and dynamic metaphor generation modes (LLM-powered) to dynamically map harmful terms to benign alternatives.
Added specific transforms such as code_fragmentation and document_fragmentation that help evade input and output classifiers by fragmenting harmful queries across benign contexts.
Improved utility functions for encoding and hint generation to enhance contextual understanding of metaphorical substitutions.
Overall, these changes significantly augment the existing functionality of the dreadnode transforms by providing advanced techniques for content obfuscation and safety in AI outputs.

This summary was generated with ❤️ by rigging

Add constitutional classifiers probing transforms based on Cunningham et al. 2025 paper: - Reconstruction attacks: code_fragmentation, document_fragmentation, multi_turn_fragmentation - Obfuscation attacks: metaphor_encoding, riddle_encoding, contextual_substitution, character_separation - Supports static, LLM-powered, and hybrid transformation modes - Add comprehensive example notebook demonstrating all transforms with TAP integration - Strip notebook outputs for clean commit

Replace # noqa: S311 with # nosec B311 for bandit security scanner compatibility

Add both # noqa: S311 (ruff) and # nosec B311 (bandit) to suppress security warnings for non-cryptographic random usage

dreadnode-renovate-bot bot added the area/examples Changes to example code and demonstrations label Jan 16, 2026

rdheekonda added 2 commits January 15, 2026 17:48

fix: Change noqa to nosec for bandit compatibility

59ee788

Replace # noqa: S311 with # nosec B311 for bandit security scanner compatibility

fix: Add noqa comments for both ruff and bandit

c2ff6a8

Add both # noqa: S311 (ruff) and # nosec B311 (bandit) to suppress security warnings for non-cryptographic random usage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add constitutional transforms based on Anthropic Constitutional Classifiers++ paper #300

feat: Add constitutional transforms based on Anthropic Constitutional Classifiers++ paper #300

Uh oh!

rdheekonda commented Jan 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add constitutional transforms based on Anthropic Constitutional Classifiers++ paper #300

Are you sure you want to change the base?

feat: Add constitutional transforms based on Anthropic Constitutional Classifiers++ paper #300

Uh oh!

Conversation

rdheekonda commented Jan 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generated Summary:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rdheekonda commented Jan 16, 2026 •

edited by github-actions bot

Loading