Skip to content

Conversation

@andreatgretel
Copy link
Contributor

Summary

Adds support for third-party processor plugins via the existing plugin discovery mechanism. This PR builds on top of #294 (callback-based processors).

Changes

Plugin Infrastructure

  • PluginType.PROCESSOR for external processor plugins
  • ProcessorRegistry discovers and loads processor plugins
  • processor_types.py with plugin-injected type union
  • PluginRegistry uses RLock instead of Lock for nested imports

Demo Processors

  • RegexFilterProcessor - filters rows based on regex patterns (preprocess stage)
  • SemanticDedupProcessor - removes duplicate content via embeddings (postprocess stage)

Documentation

  • Plugin overview and development guide

Depends On

Test Plan

  • Demo processors work when installed as plugins
  • Plugin discovery correctly registers external processors
  • Type hints include plugin processor configs

Replace stage parameter with callback methods (preprocess, process_after_batch,
postprocess). The builder now invokes these callbacks at appropriate stages:
PRE_GENERATION, POST_BATCH, and POST_GENERATION.

- Remove build_stage from ProcessorConfig
- Add callback methods to Processor base class
- Update DropColumns and SchemaTransform to use process_after_batch
- Simplify ColumnWiseBuilder processor invocation
Adds support for third-party processor plugins via plugin discovery:

- PluginType.PROCESSOR for external processor plugins
- ProcessorRegistry discovers and loads processor plugins
- processor_types.py with plugin-injected type union
- PluginRegistry uses RLock for nested imports

Demo processors:
- RegexFilterProcessor (preprocess stage)
- SemanticDedupProcessor (postprocess stage)
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/processor-plugins branch 7 times, most recently from 403bc69 to a61848e Compare February 11, 2026 20:29
Base automatically changed from andreatgretel/feat/processor-plugins to main February 12, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant