-
Notifications
You must be signed in to change notification settings - Fork 57
feat: add image generation support with multi-modal context #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…eInferenceParameters, EmbeddingInferenceParameters
…th BaseInferenceParameters
…lved based on the type of InferenceParameters
packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py
Show resolved
Hide resolved
packages/data-designer-config/src/data_designer/config/utils/image_helpers.py
Outdated
Show resolved
Hide resolved
- Reduce num_records to 2 for image generation in tutorial notebook - Add tests for different image response formats (dict and plain string) - Parametrize PNG/JPG media storage tests for better maintainability
packages/data-designer-config/src/data_designer/config/utils/image_helpers.py
Outdated
Show resolved
Hide resolved
packages/data-designer-config/src/data_designer/config/utils/image_helpers.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: added a comment here that was supposed to go into a file, sorry!
|
Curious regarding default models - should we add |
andreatgretel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more nits but overall everything looks good! Tried it out locally, tutorial runs fine. Excited about generating images on Data Designer 🖼️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
45 files reviewed, 1 comment
packages/data-designer-config/src/data_designer/config/utils/image_helpers.py
Show resolved
Hide resolved
Yes, may be but perhaps in a different PR! I don't see many options on build.nvidia.com that work with the standard nvidia endpoint .... |
📋 Summary
Adds native image generation capabilities to DataDesigner, enabling synthetic image generation using diffusion and auto-regressive image generation models. Supports both standalone image generation and multi-modal context (using previously generated text/images as input), with robust storage management and comprehensive testing.
🏗️ Architecture
Key Design Decisions:
Auto-detection of API type:
generate_image()automatically routes to the correct LiteLLM API:image_generationAPIcompletionAPIMulti-modal context: Images can reference previously generated columns (text or images) using
multi_modal_contextfor image-to-image generationDual storage modes:
🔄 Changes
✨ Added
New Files - Core Implementation:
image.py(80 lines) -ImageCellGeneratorwith Jinja2 prompt rendering and multi-modal context resolutionmedia_storage.py(137 lines) -MediaStorageclass with DISK/DATAFRAME storage modesimage_helpers.py(238 lines) - Base64/PIL conversion, validation, format detection, diffusion model detectionNew Files - Documentation & Tests:
5-generating-images.py(296 lines) - Complete tutorial with examplestest_image.py(218 lines) - Image generator teststest_media_storage.py(228 lines) - Storage teststest_image_helpers.py(349 lines) - Utility testsConfiguration Classes:
ImageColumnConfig- Column config with prompt, multi_modal_context, and required_columns (column_configs.py)ImageInferenceParams- Parameters: size, format, quality, steps, cfg_scale, seed, n (models.py)ImageUsageStats- Usage tracking for generated images (usage.py)🔧 Changed
Model System:
facade.py- Added methods:generate_image()- Main entry point with automatic API routing_generate_image_diffusion()- Diffusion model path viaimage_generationAPI_generate_image_chat_completion()- Autoregressive model path viacompletionAPI_track_token_usage_from_image_diffusion()- Usage trackingDataset Building:
column_wise_builder.py- IntegratedMediaStoragefor image artifact managementartifact_storage.py- Addedmedia_storageattributeVisualization:
visualization.py- Enhanceddisplay_sample_record()with image handling:_display_image_if_in_notebook()for IPython/Jupyter rendering (~132 lines added)Configuration & Registry:
ImageCellGeneratorin column generator registryColumnType.IMAGEenumerationPILinlazy_heavy_imports.pyDependencies:
pillowfor image processing🗑️ Removed
🔍 Attention Areas
facade.py:307-470- Image generation implementation with auto-detection logic and dual API supportmedia_storage.py- Storage abstraction with dual modes and file organization (UUID + column subfolders)image.py:62-67- Image generator with multi-modal context injectionvisualization.py:289-418- Image display integration indisplay_sample_record()🚀 Extensibility & Future Work
Extensibility to Other Modalities:
This implementation establishes patterns that extend naturally to other media types:
AudioColumnConfig+MediaStorage.save_audio()Key extensibility points:
ModelFacade- Addgenerate_audio(),generate_video()following same patternMediaStorage- Already designed for multiple media types (see comments about future audio/video support)GenerationTypeenum - Easy to addAUDIO,VIDEO, etc.ImageCellGeneratorpattern for new modalitiesPlanned Future Work:
Improve
display_sample_record()method - Enhanced notebook display with better layouts, grid views, and interactive controls for image-containing recordsMove
artifact_storage.pyto storage module - Consolidate all storage logic (MediaStorage,ArtifactStorage) underengine/storage/for better organization (done in chore: move ArtifactStorage to engine/storage/ module #321)Documentation - Feature currently has no docs except a tutorial notebook. (done in docs: add image generation documentation and image-to-image editing tutorial #319)
✅ Testing
Comprehensive test coverage (800+ lines):
ImageUsageStatsintegrationclose #125
🤖 Generated with AI