Skip to content

Conversation

@andreatgretel
Copy link
Contributor

@andreatgretel andreatgretel commented Feb 3, 2026

Summary

Adds allow_resize parameter to CustomColumnConfig enabling custom column generators to produce a different number of records than the input. This supports 1:N expansion (e.g., generating multiple variations per input) and N:1 retraction (e.g., filtering or aggregating records) patterns. Addresses #265.

Changes

Added

  • allow_resize field on CustomColumnConfig with validation requiring full_column strategy
  • allow_resize parameter to update_records() in DatasetBatchManager
  • actual_num_records tracking in dataset metadata (may differ from target_num_records when resizing)
  • Informative logging when batch size changes during generation
  • example_allow_resize.py demonstrating expansion (1:N) and retraction (N:1) patterns
  • Documentation for the feature with examples
  • Comprehensive tests for config validation, expansion, retraction, and metadata tracking

Changed

  • column_wise_builder.py - logs resize operations, passes allow_resize to batch manager
  • CustomColumnGenerator.log_pre_generation() - logs allow_resize when enabled

Attention Areas

Reviewers: Please pay special attention to the following:


Description updated with AI

@andreatgretel andreatgretel force-pushed the andreatgretel/feat/custom-column branch 3 times, most recently from 3c9fa49 to 8ba264c Compare February 3, 2026 20:05
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/allow-resize branch 2 times, most recently from 0cedd56 to 8c9a33e Compare February 3, 2026 22:33
@andreatgretel andreatgretel changed the base branch from andreatgretel/feat/custom-column to main February 3, 2026 22:35
default=None,
description="Optional typed configuration object passed as second argument to generator function",
)
allow_resize: bool = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should elevate this to a property on the base column config (default is False), which you can override in custom columns and plugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was the initial solution, then I ended up doing a mixin instead.
Thought it was a bit opaque for plugins specifically, that they developer to find out about a specific attribute/property 🤔 But it makes things simpler I suppose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pattern we already use for custom emojis. Also the required_columns and side_effect_columns (these ones have to be set, though).

Adds support for generators that produce a different number of records
than the input (expansion or retraction). This addresses GitHub issue #265.

Changes:
- Add `allow_resize` parameter to `update_records()` in DatasetBatchManager
- Add `allow_resize` field to CustomColumnConfig
- Add validation requiring FULL_COLUMN strategy when allow_resize=True
- Track and report actual_num_records in metadata (may differ from target)
- Add logging when batch size changes
- Add example_allow_resize.py demonstrating the feature
- Add comprehensive tests
- Merge update_records and replace_buffer into a single replace_buffer
  method with allow_resize parameter on DatasetBatchManager
- Move allow_resize field from CustomColumnConfig to SingleColumnConfig
  so plugins inherit it without needing a mixin
- Align example and logging with final CustomColumn API
- Parametrize resize tests and extract shared stub in test_columns
- Add expand->retract->expand chaining test (single batch)
- Add multi-batch resize test verifying combined parquet output
- Update example to chain expand/retract/expand with preview+build
- Use 💥/✂️ emojis for resize logging (expand/retract)
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/allow-resize branch from d4a6100 to d84d799 Compare February 12, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants