Skip to content

Conversation

@nabinchha
Copy link
Contributor

@nabinchha nabinchha commented Feb 10, 2026

📋 Summary

Adds native image generation capabilities to DataDesigner, enabling synthetic image generation using diffusion and auto-regressive image generation models. Supports both standalone image generation and multi-modal context (using previously generated text/images as input), with robust storage management and comprehensive testing.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Configuration Layer                          │
│  ┌──────────────────┐         ┌─────────────────────────────┐ │
│  │ ImageColumnConfig│◄────────│ ImageInferenceParams        │ │
│  │  - model_alias   │         │  - size, format, quality    │ │
│  │  - prompt        │         │  - steps, cfg_scale, seed   │ │
│  │  - context cols  │         │  - n (number of images)     │ │
│  └────────┬─────────┘         └─────────────────────────────┘ │
└───────────┼───────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Engine Layer                               │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  ImageCellGenerator (Cell-by-Cell)                       │  │
│  │   1. Render Jinja2 prompt template with record data      │  │
│  │   2. Resolve multi-modal context from previous columns   │  │
│  │   3. Call ModelFacade.generate_image()                   │  │
│  │   4. Save via MediaStorage                               │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  ModelFacade.generate_image()                            │  │
│  │   • Auto-detects model type:                             │  │
│  │     - Diffusion models → image_generation API            │  │
│  │     - Autoregressive models → completion API             │  │
│  │   • Returns list[base64_string]                          │  │
│  │   • Tracks usage (images + tokens)                       │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  LiteLLM Router                                          │  │
│  │   • image_generation(prompt) - for diffusion models      │  │
│  │   • completion(messages) - for autoregressive models     │  │
│  └─────────────────┬────────────────────────────────────────┘  │
│                    │                                            │
│                    ▼                                            │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  MediaStorage                                            │  │
│  │   • DISK mode: Save to disk, return paths                │  │
│  │   • DATAFRAME mode: Return base64 directly               │  │
│  │   • Validates images, creates UUID filenames             │  │
│  │   • Organizes by column name in subfolders               │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│              Visualization & Display                            │
│  • Enhanced display_sample_record() with image support          │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions:

  1. Auto-detection of API type: generate_image() automatically routes to the correct LiteLLM API:

    • Diffusion models (DALL-E, Stable Diffusion, Imagen) → image_generation API
    • Autoregressive models (multi-modal chat models) → completion API
  2. Multi-modal context: Images can reference previously generated columns (text or images) using multi_modal_context for image-to-image generation

  3. Dual storage modes:

    • DISK mode (dataset creation): Saves images to disk, stores relative paths
    • DATAFRAME mode (preview): Stores base64 directly for quick exploration

🔄 Changes

✨ Added

New Files - Core Implementation:

  • image.py (80 lines) - ImageCellGenerator with Jinja2 prompt rendering and multi-modal context resolution
  • media_storage.py (137 lines) - MediaStorage class with DISK/DATAFRAME storage modes
  • image_helpers.py (238 lines) - Base64/PIL conversion, validation, format detection, diffusion model detection

New Files - Documentation & Tests:

Configuration Classes:

  • ImageColumnConfig - Column config with prompt, multi_modal_context, and required_columns (column_configs.py)
  • ImageInferenceParams - Parameters: size, format, quality, steps, cfg_scale, seed, n (models.py)
  • ImageUsageStats - Usage tracking for generated images (usage.py)

🔧 Changed

Model System:

  • facade.py - Added methods:
    • generate_image() - Main entry point with automatic API routing
    • _generate_image_diffusion() - Diffusion model path via image_generation API
    • _generate_image_chat_completion() - Autoregressive model path via completion API
    • _track_token_usage_from_image_diffusion() - Usage tracking

Dataset Building:

Visualization:

  • visualization.py - Enhanced display_sample_record() with image handling:
    • Added _display_image_if_in_notebook() for IPython/Jupyter rendering (~132 lines added)
    • Image table in record display showing base64 previews
    • Automatic image rendering at bottom of record display in notebooks

Configuration & Registry:

  • Registered ImageCellGenerator in column generator registry
  • Added ColumnType.IMAGE enumeration
  • Added lazy import for PIL in lazy_heavy_imports.py

Dependencies:

  • Added pillow for image processing

🗑️ Removed

  • Health checks workflow (unrelated cleanup)
  • Seed dataset documentation (reorganization)

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to:

  1. facade.py:307-470 - Image generation implementation with auto-detection logic and dual API support

  2. media_storage.py - Storage abstraction with dual modes and file organization (UUID + column subfolders)

  3. image.py:62-67 - Image generator with multi-modal context injection

  4. visualization.py:289-418 - Image display integration in display_sample_record()

🚀 Extensibility & Future Work

Extensibility to Other Modalities:

This implementation establishes patterns that extend naturally to other media types:

  • Audio generation: Similar AudioColumnConfig + MediaStorage.save_audio()
  • Video generation: Can reuse image storage patterns with video-specific format handling
  • 3D assets: Storage layer is format-agnostic, adaptable to GLB/USD/FBX

Key extensibility points:

  • ModelFacade - Add generate_audio(), generate_video() following same pattern
  • MediaStorage - Already designed for multiple media types (see comments about future audio/video support)
  • GenerationType enum - Easy to add AUDIO, VIDEO, etc.
  • Column generators - Follow ImageCellGenerator pattern for new modalities

Planned Future Work:

  1. Improve display_sample_record() method - Enhanced notebook display with better layouts, grid views, and interactive controls for image-containing records

  2. Move artifact_storage.py to storage module - Consolidate all storage logic (MediaStorage, ArtifactStorage) under engine/storage/ for better organization (done in chore: move ArtifactStorage to engine/storage/ module #321)

  3. Documentation - Feature currently has no docs except a tutorial notebook. (done in docs: add image generation documentation and image-to-image editing tutorial #319)

✅ Testing

Comprehensive test coverage (800+ lines):

  • Image generation: 218 lines - single/multiple images, context resolution, error handling
  • Media storage: 228 lines - DISK/DATAFRAME modes, validation, cleanup
  • Image helpers: 349 lines - base64/PIL conversion, format detection, validation
  • Model facade: Extended tests for image generation paths (diffusion + chat completion)
  • Usage tracking: Tests for ImageUsageStats integration
  • Integration: Full end-to-end example in tutorial notebook

close #125

🤖 Generated with AI

…eInferenceParameters, EmbeddingInferenceParameters
…lved based on the type of InferenceParameters
- Reduce num_records to 2 for image generation in tutorial notebook
- Add tests for different image response formats (dict and plain string)
- Parametrize PNG/JPG media storage tests for better maintainability
Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: added a comment here that was supposed to go into a file, sorry!

@andreatgretel
Copy link
Contributor

Curious regarding default models - should we add {nvidia|openai|openrouter}-image aliases?

andreatgretel
andreatgretel previously approved these changes Feb 11, 2026
Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more nits but overall everything looks good! Tried it out locally, tutorial runs fine. Excited about generating images on Data Designer 🖼️

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@nabinchha
Copy link
Contributor Author

nabinchha commented Feb 11, 2026

Curious regarding default models - should we add {nvidia|openai|openrouter}-image aliases?

Yes, may be but perhaps in a different PR! I don't see many options on build.nvidia.com that work with the standard nvidia endpoint ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native image generation support

2 participants