diff --git a/docs/code_reference/models.md b/docs/code_reference/models.md index 882e03f3..98023d51 100644 --- a/docs/code_reference/models.md +++ b/docs/code_reference/models.md @@ -1,11 +1,12 @@ # Models -The `models` module defines configuration objects for model-based generation. [ModelProvider](#data_designer.config.models.ModelProvider), specifies connection and authentication details for custom providers. [ModelConfig](#data_designer.config.models.ModelConfig) encapsulates model details including the model alias, identifier, and inference parameters. [Inference Parameters](../concepts/models/inference-parameters.md) controls model behavior through settings like `temperature`, `top_p`, and `max_tokens`, with support for both fixed values and distribution-based sampling. The module includes [ImageContext](#data_designer.config.models.ImageContext) for providing image inputs to multimodal models. +The `models` module defines configuration objects for model-based generation. [ModelProvider](#data_designer.config.models.ModelProvider) specifies connection and authentication details for custom providers. [ModelConfig](#data_designer.config.models.ModelConfig) encapsulates model details including the model alias, identifier, and inference parameters. [Inference Parameters](../concepts/models/inference-parameters.md) controls model behavior through settings like `temperature`, `top_p`, and `max_tokens`, with support for both fixed values and distribution-based sampling. The module includes [ImageContext](#data_designer.config.models.ImageContext) for providing image inputs to multimodal models, and [ImageInferenceParams](#data_designer.config.models.ImageInferenceParams) for configuring image generation models. For more information on how they are used, see below: - **[Model Providers](../concepts/models/model-providers.md)** - **[Model Configs](../concepts/models/model-configs.md)** -- **[Image Context](/notebooks/4-providing-images-as-context/)** +- **[Image Context](../notebooks/4-providing-images-as-context.ipynb)** +- **[Generating Images](../notebooks/5-generating-images.ipynb)** ::: data_designer.config.models diff --git a/docs/colab_notebooks/1-the-basics.ipynb b/docs/colab_notebooks/1-the-basics.ipynb index f50209f7..9a2456e6 100644 --- a/docs/colab_notebooks/1-the-basics.ipynb +++ b/docs/colab_notebooks/1-the-basics.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "96178d08", + "id": "00c21026", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: The Basics\n", @@ -14,7 +14,7 @@ }, { "cell_type": "markdown", - "id": "1d02a1d6", + "id": "ece3d9a9", "metadata": {}, "source": [ "### 📦 Import Data Designer\n", @@ -26,7 +26,7 @@ }, { "cell_type": "markdown", - "id": "2292d817", + "id": "38d1b88f", "metadata": {}, "source": [ "### ⚡ Colab Setup\n", @@ -37,7 +37,7 @@ { "cell_type": "code", "execution_count": null, - "id": "8af621fc", + "id": "53321634", "metadata": {}, "outputs": [], "source": [ @@ -48,7 +48,7 @@ { "cell_type": "code", "execution_count": null, - "id": "70e6a11c", + "id": "5e8544d6", "metadata": {}, "outputs": [], "source": [ @@ -66,7 +66,7 @@ { "cell_type": "code", "execution_count": null, - "id": "41031828", + "id": "4a9e48bc", "metadata": {}, "outputs": [], "source": [ @@ -76,7 +76,7 @@ }, { "cell_type": "markdown", - "id": "0b480b10", + "id": "21b12719", "metadata": {}, "source": [ "### ⚙️ Initialize the Data Designer interface\n", @@ -89,7 +89,7 @@ { "cell_type": "code", "execution_count": null, - "id": "d434a8e2", + "id": "7d689c22", "metadata": {}, "outputs": [], "source": [ @@ -98,7 +98,7 @@ }, { "cell_type": "markdown", - "id": "f88f6792", + "id": "3db3eab3", "metadata": {}, "source": [ "### 🎛️ Define model configurations\n", @@ -115,7 +115,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4261574c", + "id": "f4447bbe", "metadata": {}, "outputs": [], "source": [ @@ -145,7 +145,7 @@ }, { "cell_type": "markdown", - "id": "bbbc3d58", + "id": "b5af9991", "metadata": {}, "source": [ "### 🏗️ Initialize the Data Designer Config Builder\n", @@ -160,7 +160,7 @@ { "cell_type": "code", "execution_count": null, - "id": "92c0cf35", + "id": "40bdb697", "metadata": {}, "outputs": [], "source": [ @@ -169,7 +169,7 @@ }, { "cell_type": "markdown", - "id": "44246c7d", + "id": "4dad8aa0", "metadata": {}, "source": [ "## 🎲 Getting started with sampler columns\n", @@ -186,7 +186,7 @@ { "cell_type": "code", "execution_count": null, - "id": "07d20f3f", + "id": "8eecf6e8", "metadata": {}, "outputs": [], "source": [ @@ -195,7 +195,7 @@ }, { "cell_type": "markdown", - "id": "9d3c87b0", + "id": "e4d6a23a", "metadata": {}, "source": [ "Let's start designing our product review dataset by adding product category and subcategory columns.\n" @@ -204,7 +204,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c646b021", + "id": "c3ce7276", "metadata": {}, "outputs": [], "source": [ @@ -285,7 +285,7 @@ }, { "cell_type": "markdown", - "id": "ff18b032", + "id": "a8aafd2c", "metadata": {}, "source": [ "Next, let's add samplers to generate data related to the customer and their review.\n" @@ -294,7 +294,7 @@ { "cell_type": "code", "execution_count": null, - "id": "78846d99", + "id": "3bdb3991", "metadata": {}, "outputs": [], "source": [ @@ -331,7 +331,7 @@ }, { "cell_type": "markdown", - "id": "97059bfc", + "id": "743bb645", "metadata": {}, "source": [ "## 🦜 LLM-generated columns\n", @@ -346,7 +346,7 @@ { "cell_type": "code", "execution_count": null, - "id": "98c66eff", + "id": "da2b9677", "metadata": {}, "outputs": [], "source": [ @@ -382,7 +382,7 @@ }, { "cell_type": "markdown", - "id": "ff2d52b9", + "id": "febed040", "metadata": {}, "source": [ "### 🔁 Iteration is key – preview the dataset!\n", @@ -399,7 +399,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6e622478", + "id": "af574e1c", "metadata": {}, "outputs": [], "source": [ @@ -409,7 +409,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1addc7d8", + "id": "c5cddea8", "metadata": {}, "outputs": [], "source": [ @@ -420,7 +420,7 @@ { "cell_type": "code", "execution_count": null, - "id": "7af4b9c3", + "id": "523da02f", "metadata": {}, "outputs": [], "source": [ @@ -430,7 +430,7 @@ }, { "cell_type": "markdown", - "id": "91d0ee89", + "id": "b58b6a23", "metadata": {}, "source": [ "### 📊 Analyze the generated data\n", @@ -443,7 +443,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e1e3aed0", + "id": "26b9a54a", "metadata": {}, "outputs": [], "source": [ @@ -453,7 +453,7 @@ }, { "cell_type": "markdown", - "id": "6eaa402e", + "id": "ae2f9efe", "metadata": {}, "source": [ "### 🆙 Scale up!\n", @@ -466,7 +466,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f6b148d4", + "id": "d8341c24", "metadata": {}, "outputs": [], "source": [ @@ -476,7 +476,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f4e62e5b", + "id": "746166bb", "metadata": {}, "outputs": [], "source": [ @@ -489,7 +489,7 @@ { "cell_type": "code", "execution_count": null, - "id": "7d426ab0", + "id": "4c67992b", "metadata": {}, "outputs": [], "source": [ @@ -501,7 +501,7 @@ }, { "cell_type": "markdown", - "id": "449d003c", + "id": "65da8b83", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb b/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb index a6e04680..75e2d72d 100644 --- a/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb +++ b/docs/colab_notebooks/2-structured-outputs-and-jinja-expressions.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "ba22504d", + "id": "3d5ec9c5", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: Structured Outputs and Jinja Expressions\n", @@ -16,7 +16,7 @@ }, { "cell_type": "markdown", - "id": "c176fe63", + "id": "3813ccb2", "metadata": {}, "source": [ "### 📦 Import Data Designer\n", @@ -28,7 +28,7 @@ }, { "cell_type": "markdown", - "id": "32c80f72", + "id": "86173a51", "metadata": {}, "source": [ "### ⚡ Colab Setup\n", @@ -39,7 +39,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4ab45e3a", + "id": "6ee5a0e0", "metadata": {}, "outputs": [], "source": [ @@ -50,7 +50,7 @@ { "cell_type": "code", "execution_count": null, - "id": "2ae70d67", + "id": "87742e65", "metadata": {}, "outputs": [], "source": [ @@ -68,7 +68,7 @@ { "cell_type": "code", "execution_count": null, - "id": "2cdc070b", + "id": "450a862c", "metadata": {}, "outputs": [], "source": [ @@ -78,7 +78,7 @@ }, { "cell_type": "markdown", - "id": "a04261b9", + "id": "8f06cd05", "metadata": {}, "source": [ "### ⚙️ Initialize the Data Designer interface\n", @@ -91,7 +91,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c8bef18a", + "id": "9a880c00", "metadata": {}, "outputs": [], "source": [ @@ -100,7 +100,7 @@ }, { "cell_type": "markdown", - "id": "ed555636", + "id": "d862ae5c", "metadata": {}, "source": [ "### 🎛️ Define model configurations\n", @@ -117,7 +117,7 @@ { "cell_type": "code", "execution_count": null, - "id": "47208094", + "id": "84e6f76a", "metadata": {}, "outputs": [], "source": [ @@ -147,7 +147,7 @@ }, { "cell_type": "markdown", - "id": "36c200d9", + "id": "07b038aa", "metadata": {}, "source": [ "### 🏗️ Initialize the Data Designer Config Builder\n", @@ -162,7 +162,7 @@ { "cell_type": "code", "execution_count": null, - "id": "57c0d82f", + "id": "b7e42df4", "metadata": {}, "outputs": [], "source": [ @@ -171,7 +171,7 @@ }, { "cell_type": "markdown", - "id": "01ff63ca", + "id": "600127e0", "metadata": {}, "source": [ "### 🧑‍🎨 Designing our data\n", @@ -198,7 +198,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4fb0f1ca", + "id": "ecebc077", "metadata": {}, "outputs": [], "source": [ @@ -226,7 +226,7 @@ }, { "cell_type": "markdown", - "id": "8f35bd87", + "id": "6f24c511", "metadata": {}, "source": [ "Next, let's design our product review dataset using a few more tricks compared to the previous notebook.\n" @@ -235,7 +235,7 @@ { "cell_type": "code", "execution_count": null, - "id": "43341f16", + "id": "6cd4a4a5", "metadata": {}, "outputs": [], "source": [ @@ -344,7 +344,7 @@ }, { "cell_type": "markdown", - "id": "34c3e08b", + "id": "3fa250c7", "metadata": {}, "source": [ "Next, we will use more advanced Jinja expressions to create new columns.\n", @@ -361,7 +361,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c168c089", + "id": "77895d82", "metadata": {}, "outputs": [], "source": [ @@ -414,7 +414,7 @@ }, { "cell_type": "markdown", - "id": "7e6521a2", + "id": "236f32c0", "metadata": {}, "source": [ "### 🔁 Iteration is key – preview the dataset!\n", @@ -431,7 +431,7 @@ { "cell_type": "code", "execution_count": null, - "id": "03510f78", + "id": "719d3d7f", "metadata": {}, "outputs": [], "source": [ @@ -441,7 +441,7 @@ { "cell_type": "code", "execution_count": null, - "id": "ad599c43", + "id": "d25b2a23", "metadata": {}, "outputs": [], "source": [ @@ -452,7 +452,7 @@ { "cell_type": "code", "execution_count": null, - "id": "dbd3e17c", + "id": "8cfff7c2", "metadata": {}, "outputs": [], "source": [ @@ -462,7 +462,7 @@ }, { "cell_type": "markdown", - "id": "4db52c26", + "id": "acfc4317", "metadata": {}, "source": [ "### 📊 Analyze the generated data\n", @@ -475,7 +475,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f1007ac4", + "id": "02a90c0a", "metadata": {}, "outputs": [], "source": [ @@ -485,7 +485,7 @@ }, { "cell_type": "markdown", - "id": "dcd68de4", + "id": "60bac583", "metadata": {}, "source": [ "### 🆙 Scale up!\n", @@ -498,7 +498,7 @@ { "cell_type": "code", "execution_count": null, - "id": "27b6bfe8", + "id": "fd92ca3c", "metadata": {}, "outputs": [], "source": [ @@ -508,7 +508,7 @@ { "cell_type": "code", "execution_count": null, - "id": "d4e9a395", + "id": "ca5eded6", "metadata": {}, "outputs": [], "source": [ @@ -521,7 +521,7 @@ { "cell_type": "code", "execution_count": null, - "id": "946b3aa8", + "id": "29f4b884", "metadata": {}, "outputs": [], "source": [ @@ -533,7 +533,7 @@ }, { "cell_type": "markdown", - "id": "f50d996e", + "id": "18914be2", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb b/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb index 639e88df..91c13986 100644 --- a/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb +++ b/docs/colab_notebooks/3-seeding-with-a-dataset.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "25501772", + "id": "30b0205f", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: Seeding Synthetic Data Generation with an External Dataset\n", @@ -16,7 +16,7 @@ }, { "cell_type": "markdown", - "id": "67ffc49e", + "id": "fd7184e7", "metadata": {}, "source": [ "### 📦 Import Data Designer\n", @@ -28,7 +28,7 @@ }, { "cell_type": "markdown", - "id": "54a42504", + "id": "f229a5f3", "metadata": {}, "source": [ "### ⚡ Colab Setup\n", @@ -39,7 +39,7 @@ { "cell_type": "code", "execution_count": null, - "id": "05b45354", + "id": "3cfdeadf", "metadata": {}, "outputs": [], "source": [ @@ -50,7 +50,7 @@ { "cell_type": "code", "execution_count": null, - "id": "039360fe", + "id": "8ad3bee9", "metadata": {}, "outputs": [], "source": [ @@ -68,7 +68,7 @@ { "cell_type": "code", "execution_count": null, - "id": "028d5e8a", + "id": "b7a8d675", "metadata": {}, "outputs": [], "source": [ @@ -78,7 +78,7 @@ }, { "cell_type": "markdown", - "id": "15a1df61", + "id": "e52b2806", "metadata": {}, "source": [ "### ⚙️ Initialize the Data Designer interface\n", @@ -91,7 +91,7 @@ { "cell_type": "code", "execution_count": null, - "id": "a87b6ff6", + "id": "21ad21d1", "metadata": {}, "outputs": [], "source": [ @@ -100,7 +100,7 @@ }, { "cell_type": "markdown", - "id": "b9166cfd", + "id": "e313e1c7", "metadata": {}, "source": [ "### 🎛️ Define model configurations\n", @@ -117,7 +117,7 @@ { "cell_type": "code", "execution_count": null, - "id": "4961d3b0", + "id": "5927e232", "metadata": {}, "outputs": [], "source": [ @@ -147,7 +147,7 @@ }, { "cell_type": "markdown", - "id": "b1d8588a", + "id": "3fe284f0", "metadata": {}, "source": [ "### 🏗️ Initialize the Data Designer Config Builder\n", @@ -162,7 +162,7 @@ { "cell_type": "code", "execution_count": null, - "id": "cf42a4dd", + "id": "0475564b", "metadata": {}, "outputs": [], "source": [ @@ -171,7 +171,7 @@ }, { "cell_type": "markdown", - "id": "8d6b26aa", + "id": "588837c2", "metadata": {}, "source": [ "## 🏥 Prepare a seed dataset\n", @@ -196,7 +196,7 @@ { "cell_type": "code", "execution_count": null, - "id": "fc90401d", + "id": "e8dfb164", "metadata": {}, "outputs": [], "source": [ @@ -214,7 +214,7 @@ }, { "cell_type": "markdown", - "id": "6f5ee960", + "id": "ca5f46ea", "metadata": {}, "source": [ "## 🎨 Designing our synthetic patient notes dataset\n", @@ -227,7 +227,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e9db2ff0", + "id": "830810e8", "metadata": {}, "outputs": [], "source": [ @@ -308,7 +308,7 @@ }, { "cell_type": "markdown", - "id": "00efc894", + "id": "cbb1e2ad", "metadata": {}, "source": [ "### 🔁 Iteration is key – preview the dataset!\n", @@ -325,7 +325,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3e3d824e", + "id": "f9c39104", "metadata": {}, "outputs": [], "source": [ @@ -335,7 +335,7 @@ { "cell_type": "code", "execution_count": null, - "id": "27785af7", + "id": "5750e220", "metadata": {}, "outputs": [], "source": [ @@ -346,7 +346,7 @@ { "cell_type": "code", "execution_count": null, - "id": "430998d1", + "id": "b3573753", "metadata": {}, "outputs": [], "source": [ @@ -356,7 +356,7 @@ }, { "cell_type": "markdown", - "id": "dda6458b", + "id": "14937896", "metadata": {}, "source": [ "### 📊 Analyze the generated data\n", @@ -369,7 +369,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f45bc088", + "id": "cd3adb37", "metadata": {}, "outputs": [], "source": [ @@ -379,7 +379,7 @@ }, { "cell_type": "markdown", - "id": "1e913fd8", + "id": "aa4fee79", "metadata": {}, "source": [ "### 🆙 Scale up!\n", @@ -392,7 +392,7 @@ { "cell_type": "code", "execution_count": null, - "id": "30b8b7f7", + "id": "29024ffc", "metadata": {}, "outputs": [], "source": [ @@ -402,7 +402,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b7ff96d1", + "id": "73da6149", "metadata": {}, "outputs": [], "source": [ @@ -415,7 +415,7 @@ { "cell_type": "code", "execution_count": null, - "id": "dbfef8a8", + "id": "bc2f927d", "metadata": {}, "outputs": [], "source": [ @@ -427,7 +427,7 @@ }, { "cell_type": "markdown", - "id": "5db3f38d", + "id": "29990c5d", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/colab_notebooks/4-providing-images-as-context.ipynb b/docs/colab_notebooks/4-providing-images-as-context.ipynb index 9797695e..cc10ec63 100644 --- a/docs/colab_notebooks/4-providing-images-as-context.ipynb +++ b/docs/colab_notebooks/4-providing-images-as-context.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "19e57933", + "id": "911877e5", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: Providing Images as Context for Vision-Based Data Generation" @@ -10,7 +10,7 @@ }, { "cell_type": "markdown", - "id": "25e3cc64", + "id": "c6756afd", "metadata": {}, "source": [ "#### 📚 What you'll learn\n", @@ -25,7 +25,7 @@ }, { "cell_type": "markdown", - "id": "4aae5c82", + "id": "d73b25ce", "metadata": {}, "source": [ "### 📦 Import Data Designer\n", @@ -37,7 +37,7 @@ }, { "cell_type": "markdown", - "id": "24dfae6c", + "id": "f05ece3e", "metadata": {}, "source": [ "### ⚡ Colab Setup\n", @@ -48,7 +48,7 @@ { "cell_type": "code", "execution_count": null, - "id": "619b1aae", + "id": "d84f4489", "metadata": {}, "outputs": [], "source": [ @@ -59,7 +59,7 @@ { "cell_type": "code", "execution_count": null, - "id": "0d49a542", + "id": "5e4cc2d4", "metadata": {}, "outputs": [], "source": [ @@ -77,7 +77,7 @@ { "cell_type": "code", "execution_count": null, - "id": "1b28f160", + "id": "4e4e8d45", "metadata": {}, "outputs": [], "source": [ @@ -100,7 +100,7 @@ }, { "cell_type": "markdown", - "id": "63dc34de", + "id": "0cdd2a8a", "metadata": {}, "source": [ "### ⚙️ Initialize the Data Designer interface\n", @@ -113,7 +113,7 @@ { "cell_type": "code", "execution_count": null, - "id": "672155c8", + "id": "4bb0ca16", "metadata": {}, "outputs": [], "source": [ @@ -122,7 +122,7 @@ }, { "cell_type": "markdown", - "id": "4b32c25e", + "id": "bd17820d", "metadata": {}, "source": [ "### 🎛️ Define model configurations\n", @@ -139,7 +139,7 @@ { "cell_type": "code", "execution_count": null, - "id": "72971915", + "id": "301f2bd2", "metadata": {}, "outputs": [], "source": [ @@ -162,7 +162,7 @@ }, { "cell_type": "markdown", - "id": "115ad20f", + "id": "ad04f82a", "metadata": {}, "source": [ "### 🏗️ Initialize the Data Designer Config Builder\n", @@ -177,7 +177,7 @@ { "cell_type": "code", "execution_count": null, - "id": "11e844d2", + "id": "ac8e2885", "metadata": {}, "outputs": [], "source": [ @@ -186,7 +186,7 @@ }, { "cell_type": "markdown", - "id": "77862fce", + "id": "7b8aafc0", "metadata": {}, "source": [ "### 🌱 Seed Dataset Creation\n", @@ -203,7 +203,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e415a502", + "id": "432edd4a", "metadata": {}, "outputs": [], "source": [ @@ -218,7 +218,7 @@ { "cell_type": "code", "execution_count": null, - "id": "335f2611", + "id": "c4f94627", "metadata": {}, "outputs": [], "source": [ @@ -266,7 +266,7 @@ { "cell_type": "code", "execution_count": null, - "id": "f055e88d", + "id": "9b697311", "metadata": {}, "outputs": [], "source": [ @@ -284,7 +284,7 @@ { "cell_type": "code", "execution_count": null, - "id": "47a1c586", + "id": "bcfc97e8", "metadata": {}, "outputs": [], "source": [ @@ -294,7 +294,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3a77fc52", + "id": "0a3bdc13", "metadata": {}, "outputs": [], "source": [ @@ -306,7 +306,7 @@ { "cell_type": "code", "execution_count": null, - "id": "c0941cc7", + "id": "f9665355", "metadata": {}, "outputs": [], "source": [ @@ -335,7 +335,7 @@ }, { "cell_type": "markdown", - "id": "578e77dc", + "id": "6d900aaa", "metadata": {}, "source": [ "### 🔁 Iteration is key – preview the dataset!\n", @@ -352,7 +352,7 @@ { "cell_type": "code", "execution_count": null, - "id": "9f0c11ce", + "id": "51a80346", "metadata": {}, "outputs": [], "source": [ @@ -362,7 +362,7 @@ { "cell_type": "code", "execution_count": null, - "id": "b10412c1", + "id": "ea217964", "metadata": {}, "outputs": [], "source": [ @@ -373,7 +373,7 @@ { "cell_type": "code", "execution_count": null, - "id": "766ee2d7", + "id": "be0e4ef0", "metadata": {}, "outputs": [], "source": [ @@ -383,7 +383,7 @@ }, { "cell_type": "markdown", - "id": "6370bfa5", + "id": "0c75f531", "metadata": {}, "source": [ "### 📊 Analyze the generated data\n", @@ -396,7 +396,7 @@ { "cell_type": "code", "execution_count": null, - "id": "d57ded0e", + "id": "bcbf86d1", "metadata": {}, "outputs": [], "source": [ @@ -406,7 +406,7 @@ }, { "cell_type": "markdown", - "id": "5afd8e8c", + "id": "0ab35029", "metadata": {}, "source": [ "### 🔎 Visual Inspection\n", @@ -417,7 +417,7 @@ { "cell_type": "code", "execution_count": null, - "id": "aa4bfcc3", + "id": "03314ae9", "metadata": { "lines_to_next_cell": 2 }, @@ -441,7 +441,7 @@ }, { "cell_type": "markdown", - "id": "4eeaada6", + "id": "e76a3e3b", "metadata": {}, "source": [ "### 🆙 Scale up!\n", @@ -454,7 +454,7 @@ { "cell_type": "code", "execution_count": null, - "id": "0ee5b1b9", + "id": "d16566c0", "metadata": {}, "outputs": [], "source": [ @@ -464,7 +464,7 @@ { "cell_type": "code", "execution_count": null, - "id": "e5e8b241", + "id": "8e7796ba", "metadata": {}, "outputs": [], "source": [ @@ -477,7 +477,7 @@ { "cell_type": "code", "execution_count": null, - "id": "23ebb3ca", + "id": "14bc1042", "metadata": {}, "outputs": [], "source": [ @@ -489,7 +489,7 @@ }, { "cell_type": "markdown", - "id": "14a78533", + "id": "1e676330", "metadata": {}, "source": [ "## ⏭️ Next Steps\n", diff --git a/docs/colab_notebooks/5-generating-images.ipynb b/docs/colab_notebooks/5-generating-images.ipynb index c8092938..ea9e0b8f 100644 --- a/docs/colab_notebooks/5-generating-images.ipynb +++ b/docs/colab_notebooks/5-generating-images.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "735e6197", + "id": "3b8abde3", "metadata": {}, "source": [ "# 🎨 Data Designer Tutorial: Generating Images\n", @@ -15,14 +15,16 @@ "- 📝 **Jinja2 prompts**: Drive diversity by referencing other columns in your prompt template\n", "- 💾 **Preview vs create**: Preview stores base64 in the dataframe; create saves images to disk and stores paths\n", "\n", - "Data Designer supports both **diffusion** (e.g. DALL·E, Stable Diffusion, Imagen) and **autoregressive** (e.g. Gemini image, GPT image) models; the API is chosen automatically from the model name.\n", + "Data Designer supports both **diffusion** (e.g. DALL·E, Stable Diffusion, Imagen) and **autoregressive** (e.g. Gemini image, GPT image) models.\n", + "\n", + "> **Prerequisites**: This tutorial uses [OpenRouter](https://openrouter.ai) with the Flux 2 Pro image model. Set `OPENROUTER_API_KEY` in your environment before running.\n", "\n", "If this is your first time using Data Designer, we recommend starting with the [first notebook](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/1-the-basics/) in this tutorial series.\n" ] }, { "cell_type": "markdown", - "id": "92ae4afe", + "id": "1da8d75f", "metadata": {}, "source": [ "### 📦 Import Data Designer\n", @@ -33,7 +35,7 @@ }, { "cell_type": "markdown", - "id": "ccc77347", + "id": "cc461005", "metadata": {}, "source": [ "### ⚡ Colab Setup\n", @@ -44,7 +46,7 @@ { "cell_type": "code", "execution_count": null, - "id": "23627c23", + "id": "206037bf", "metadata": {}, "outputs": [], "source": [ @@ -55,7 +57,7 @@ { "cell_type": "code", "execution_count": null, - "id": "bf958dc6", + "id": "db5a4929", "metadata": {}, "outputs": [], "source": [ @@ -73,7 +75,7 @@ { "cell_type": "code", "execution_count": null, - "id": "ab0cfff8", + "id": "b3cba8b6", "metadata": {}, "outputs": [], "source": [ @@ -86,18 +88,18 @@ }, { "cell_type": "markdown", - "id": "a18ef5ce", + "id": "444aa9dc", "metadata": {}, "source": [ "### ⚙️ Initialize the Data Designer interface\n", "\n", - "When initialized without arguments, [default model providers](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/models/default-model-settings/) are used. This tutorial uses [OpenRouter](https://openrouter.ai) with the Flux 2 Pro image model; set `OPENROUTER_API_KEY` in your environment.\n" + "We initialize Data Designer without arguments here—the image model is configured explicitly in the next cell. No default text model is needed for this tutorial.\n" ] }, { "cell_type": "code", "execution_count": null, - "id": "5fe11301", + "id": "1932342c", "metadata": {}, "outputs": [], "source": [ @@ -106,7 +108,7 @@ }, { "cell_type": "markdown", - "id": "b913d454", + "id": "aa7b90c5", "metadata": {}, "source": [ "### 🎛️ Define an image-generation model\n", @@ -118,7 +120,7 @@ { "cell_type": "code", "execution_count": null, - "id": "a50d26ee", + "id": "df7e4385", "metadata": {}, "outputs": [], "source": [ @@ -140,7 +142,7 @@ }, { "cell_type": "markdown", - "id": "122374d9", + "id": "a1325e38", "metadata": {}, "source": [ "### 🏗️ Build the config: samplers + image column\n", @@ -151,7 +153,7 @@ { "cell_type": "code", "execution_count": null, - "id": "940f2b70", + "id": "95064ed0", "metadata": {}, "outputs": [], "source": [ @@ -324,7 +326,7 @@ }, { "cell_type": "markdown", - "id": "e13e0bb4", + "id": "c6fe0620", "metadata": {}, "source": [ "### 🔁 Preview: images as base64\n", @@ -335,7 +337,7 @@ { "cell_type": "code", "execution_count": null, - "id": "2a60a76f", + "id": "7323dce5", "metadata": {}, "outputs": [], "source": [ @@ -345,7 +347,7 @@ { "cell_type": "code", "execution_count": null, - "id": "3c831ee8", + "id": "510b933c", "metadata": {}, "outputs": [], "source": [ @@ -356,7 +358,7 @@ { "cell_type": "code", "execution_count": null, - "id": "143e762f", + "id": "0c8c197f", "metadata": {}, "outputs": [], "source": [ @@ -365,7 +367,7 @@ }, { "cell_type": "markdown", - "id": "a84606b4", + "id": "4cffd205", "metadata": {}, "source": [ "### 🆙 Create: images saved to disk\n", @@ -376,7 +378,7 @@ { "cell_type": "code", "execution_count": null, - "id": "89147954", + "id": "308bf2b8", "metadata": {}, "outputs": [], "source": [ @@ -386,7 +388,7 @@ { "cell_type": "code", "execution_count": null, - "id": "04c96063", + "id": "02610965", "metadata": {}, "outputs": [], "source": [ @@ -397,23 +399,23 @@ { "cell_type": "code", "execution_count": null, - "id": "edb794bb", + "id": "189af389", "metadata": {}, "outputs": [], "source": [ - "# Display all image from the created dataset. Paths are relative to the artifact output directory.\n", + "# Display all images from the created dataset. Paths are relative to the artifact output directory.\n", "for index, row in dataset.iterrows():\n", " path_or_list = row.get(\"generated_image\")\n", " if path_or_list is not None:\n", - " for path in path_or_list:\n", - " base = results.artifact_storage.base_dataset_path\n", - " full_path = base / path\n", - " display(IPImage(data=full_path))" + " paths = path_or_list if not isinstance(path_or_list, str) else [path_or_list]\n", + " for path in paths:\n", + " full_path = results.artifact_storage.base_dataset_path / path\n", + " display(IPImage(filename=str(full_path)))" ] }, { "cell_type": "markdown", - "id": "e0a72bf6", + "id": "51558182", "metadata": {}, "source": [ "## ⏭️ Next steps\n", @@ -421,7 +423,8 @@ "- [The basics](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/1-the-basics/): samplers and LLM text columns\n", "- [Structured outputs and Jinja](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/2-structured-outputs-and-jinja-expressions/)\n", "- [Seeding with a dataset](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/3-seeding-with-a-dataset/)\n", - "- [Providing images as context](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/4-providing-images-as-context/)\n" + "- [Providing images as context](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/4-providing-images-as-context/)\n", + "- [Image-to-image editing](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/6-editing-images-with-image-context/): edit existing images with seed datasets\n" ] } ], diff --git a/docs/colab_notebooks/6-editing-images-with-image-context.ipynb b/docs/colab_notebooks/6-editing-images-with-image-context.ipynb new file mode 100644 index 00000000..ddfe9d37 --- /dev/null +++ b/docs/colab_notebooks/6-editing-images-with-image-context.ipynb @@ -0,0 +1,492 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c7129daf", + "metadata": {}, + "source": [ + "# 🎨 Data Designer Tutorial: Image-to-Image Editing\n", + "\n", + "#### 📚 What you'll learn\n", + "\n", + "This notebook shows how to edit existing images by combining a seed dataset with image generation. You'll load animal portrait photographs from HuggingFace, feed them as context to an autoregressive model, and generate fun edited versions with accessories like sunglasses, top hats, and bow ties.\n", + "\n", + "- 🌱 **Seed datasets with images**: Load a HuggingFace image dataset and use it as a seed\n", + "- 🖼️ **Image context for editing**: Pass existing images to an image-generation model via `multi_modal_context`\n", + "- 🎲 **Sampler-driven diversity**: Combine sampled accessories and settings with seed images for varied results\n", + "- 💾 **Preview vs create**: Preview stores base64 in the dataframe; create saves images to disk\n", + "\n", + "This tutorial uses an **autoregressive** model (one that supports both image input *and* image output via the chat completions API). Diffusion models (DALL·E, Stable Diffusion, etc.) do not support image context—see [Tutorial 5](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/5-generating-images/) for text-to-image generation with diffusion models.\n", + "\n", + "> **Prerequisites**: This tutorial uses [OpenRouter](https://openrouter.ai) with the Flux 2 Pro model. Set `OPENROUTER_API_KEY` in your environment before running.\n", + "\n", + "If this is your first time using Data Designer, we recommend starting with the [first notebook](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/1-the-basics/) in this tutorial series.\n" + ] + }, + { + "cell_type": "markdown", + "id": "6a438ee3", + "metadata": {}, + "source": [ + "### 📦 Import Data Designer\n", + "\n", + "- `data_designer.config` provides the configuration API.\n", + "- `DataDesigner` is the main interface for generation.\n" + ] + }, + { + "cell_type": "markdown", + "id": "1a022157", + "metadata": {}, + "source": [ + "### ⚡ Colab Setup\n", + "\n", + "Run the cells below to install the dependencies and set up the API key. If you don't have an API key, you can generate one from [build.nvidia.com](https://build.nvidia.com).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "752fe3eb", + "metadata": {}, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install -U data-designer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "49266cc2", + "metadata": {}, + "outputs": [], + "source": [ + "import getpass\n", + "import os\n", + "\n", + "from google.colab import userdata\n", + "\n", + "try:\n", + " os.environ[\"NVIDIA_API_KEY\"] = userdata.get(\"NVIDIA_API_KEY\")\n", + "except userdata.SecretNotFoundError:\n", + " os.environ[\"NVIDIA_API_KEY\"] = getpass.getpass(\"Enter your NVIDIA API key: \")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d87dfa0b", + "metadata": {}, + "outputs": [], + "source": [ + "import base64\n", + "import io\n", + "import uuid\n", + "\n", + "import pandas as pd\n", + "from datasets import load_dataset\n", + "from IPython.display import Image as IPImage\n", + "from IPython.display import display\n", + "\n", + "import data_designer.config as dd\n", + "from data_designer.interface import DataDesigner" + ] + }, + { + "cell_type": "markdown", + "id": "c99ff426", + "metadata": {}, + "source": [ + "### ⚙️ Initialize the Data Designer interface\n", + "\n", + "We initialize Data Designer without arguments here—the image-editing model is configured explicitly in the next cell. No default text model is needed for this tutorial.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9be6231b", + "metadata": {}, + "outputs": [], + "source": [ + "data_designer = DataDesigner()" + ] + }, + { + "cell_type": "markdown", + "id": "3e242b51", + "metadata": {}, + "source": [ + "### 🎛️ Define an image-editing model\n", + "\n", + "We need an **autoregressive** model that supports both image input and image output via the chat completions API. This lets us pass existing images as context and receive edited images back.\n", + "\n", + "- Use `ImageInferenceParams` so Data Designer treats this model as an image generator.\n", + "- Image-specific options are model-dependent; pass them via `extra_body`.\n", + "\n", + "> **Note**: This tutorial uses the Flux 2 Pro model via [OpenRouter](https://openrouter.ai). Set `OPENROUTER_API_KEY` in your environment.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "34dd8eed", + "metadata": {}, + "outputs": [], + "source": [ + "MODEL_PROVIDER = \"openrouter\"\n", + "MODEL_ID = \"black-forest-labs/flux.2-pro\"\n", + "MODEL_ALIAS = \"image-editor\"\n", + "\n", + "model_configs = [\n", + " dd.ModelConfig(\n", + " alias=MODEL_ALIAS,\n", + " model=MODEL_ID,\n", + " provider=MODEL_PROVIDER,\n", + " inference_parameters=dd.ImageInferenceParams(\n", + " extra_body={\"height\": 512, \"width\": 512},\n", + " ),\n", + " )\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "98abe1a9", + "metadata": {}, + "source": [ + "### 🌱 Load animal portraits from HuggingFace\n", + "\n", + "We'll load animal face photographs from the [AFHQ](https://huggingface.co/datasets/huggan/AFHQv2) (Animal Faces-HQ) dataset, convert them to base64, and use them as a seed dataset.\n", + "\n", + "AFHQ contains high-quality 512×512 close-up portraits of cats, dogs, and wildlife—perfect subjects for adding fun accessories.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "233f483b", + "metadata": {}, + "outputs": [], + "source": [ + "SEED_COUNT = 10\n", + "BASE64_IMAGE_HEIGHT = 512\n", + "\n", + "ANIMAL_LABELS = {0: \"cat\", 1: \"dog\", 2: \"wild\"}\n", + "\n", + "\n", + "def resize_image(image, height: int):\n", + " \"\"\"Resize image maintaining aspect ratio.\"\"\"\n", + " original_width, original_height = image.size\n", + " width = int(original_width * (height / original_height))\n", + " return image.resize((width, height))\n", + "\n", + "\n", + "def prepare_record(record: dict, height: int) -> dict:\n", + " \"\"\"Convert a HuggingFace record to base64 with metadata.\"\"\"\n", + " image = resize_image(record[\"image\"], height)\n", + " img_buffer = io.BytesIO()\n", + " image.save(img_buffer, format=\"PNG\")\n", + " base64_string = base64.b64encode(img_buffer.getvalue()).decode(\"utf-8\")\n", + " return {\n", + " \"uuid\": str(uuid.uuid4()),\n", + " \"base64_image\": base64_string,\n", + " \"animal\": ANIMAL_LABELS[record[\"label\"]],\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b1a7b59", + "metadata": {}, + "outputs": [], + "source": [ + "print(\"📥 Streaming animal portraits from HuggingFace...\")\n", + "hf_dataset = load_dataset(\"huggan/AFHQv2\", split=\"train\", streaming=True)\n", + "\n", + "hf_iter = iter(hf_dataset)\n", + "records = [prepare_record(next(hf_iter), BASE64_IMAGE_HEIGHT) for _ in range(SEED_COUNT)]\n", + "df_seed = pd.DataFrame(records)\n", + "\n", + "print(f\"✅ Prepared {len(df_seed)} animal portraits with columns: {list(df_seed.columns)}\")\n", + "df_seed.head()" + ] + }, + { + "cell_type": "markdown", + "id": "2956a5a6", + "metadata": {}, + "source": [ + "### 🏗️ Build the configuration\n", + "\n", + "We combine three ingredients:\n", + "\n", + "1. **Seed dataset** — original animal portraits as base64 and their species labels\n", + "2. **Sampler columns** — randomly sample accessories and settings for each image\n", + "3. **Image column with context** — generate an edited image using the original as reference\n", + "\n", + "The `multi_modal_context` parameter on `ImageColumnConfig` tells Data Designer to pass the seed image to the model alongside the text prompt. The model receives both the image and the editing instructions, and generates a new image.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f79ffa72", + "metadata": {}, + "outputs": [], + "source": [ + "config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs)\n", + "\n", + "# 1. Seed the original animal portraits\n", + "config_builder.with_seed_dataset(dd.DataFrameSeedSource(df=df_seed))\n", + "\n", + "# 2. Add sampler columns for accessory diversity\n", + "config_builder.add_column(\n", + " dd.SamplerColumnConfig(\n", + " name=\"accessory\",\n", + " sampler_type=dd.SamplerType.CATEGORY,\n", + " params=dd.CategorySamplerParams(\n", + " values=[\n", + " \"a tiny top hat\",\n", + " \"oversized sunglasses\",\n", + " \"a red bow tie\",\n", + " \"a knitted beanie\",\n", + " \"a flower crown\",\n", + " \"a monocle and mustache\",\n", + " \"a pirate hat and eye patch\",\n", + " \"a chef hat\",\n", + " ],\n", + " ),\n", + " )\n", + ")\n", + "\n", + "config_builder.add_column(\n", + " dd.SamplerColumnConfig(\n", + " name=\"setting\",\n", + " sampler_type=dd.SamplerType.CATEGORY,\n", + " params=dd.CategorySamplerParams(\n", + " values=[\n", + " \"a cozy living room\",\n", + " \"a sunny park\",\n", + " \"a photo studio with soft lighting\",\n", + " \"a red carpet event\",\n", + " \"a holiday card backdrop with snowflakes\",\n", + " \"a tropical beach at sunset\",\n", + " ],\n", + " ),\n", + " )\n", + ")\n", + "\n", + "config_builder.add_column(\n", + " dd.SamplerColumnConfig(\n", + " name=\"art_style\",\n", + " sampler_type=dd.SamplerType.CATEGORY,\n", + " params=dd.CategorySamplerParams(\n", + " values=[\n", + " \"a photorealistic style\",\n", + " \"a Disney Pixar 3D render\",\n", + " \"a watercolor painting\",\n", + " \"a pop art poster\",\n", + " ],\n", + " ),\n", + " )\n", + ")\n", + "\n", + "# 3. Image column that reads the seed image as context and generates an edited version\n", + "config_builder.add_column(\n", + " dd.ImageColumnConfig(\n", + " name=\"edited_image\",\n", + " prompt=(\n", + " \"Edit this {{ animal }} portrait photo. \"\n", + " \"Add {{ accessory }} on the animal. \"\n", + " \"Place the {{ animal }} in {{ setting }}. \"\n", + " \"Render the result in {{ art_style }}. \"\n", + " \"Keep the animal's face, expression, and features faithful to the original photo.\"\n", + " ),\n", + " model_alias=MODEL_ALIAS,\n", + " multi_modal_context=[\n", + " dd.ImageContext(\n", + " column_name=\"base64_image\",\n", + " data_type=dd.ModalityDataType.BASE64,\n", + " image_format=dd.ImageFormat.PNG,\n", + " )\n", + " ],\n", + " )\n", + ")\n", + "\n", + "data_designer.validate(config_builder)" + ] + }, + { + "cell_type": "markdown", + "id": "0cba69c0", + "metadata": {}, + "source": [ + "### 🔁 Preview: quick iteration\n", + "\n", + "In **preview** mode, generated images are stored as base64 strings in the dataframe. Use this to iterate on your prompts, accessories, and sampler values before scaling up.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ec669ae2", + "metadata": {}, + "outputs": [], + "source": [ + "preview = data_designer.preview(config_builder, num_records=2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "41ac4a95", + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(len(preview.dataset)):\n", + " preview.display_sample_record()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6f041d9d", + "metadata": {}, + "outputs": [], + "source": [ + "preview.dataset" + ] + }, + { + "cell_type": "markdown", + "id": "483fa24a", + "metadata": { + "lines_to_next_cell": 2 + }, + "source": [ + "### 🔎 Compare original vs edited\n", + "\n", + "Let's display the original animal portraits next to their accessorized versions.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd4d7dff", + "metadata": {}, + "outputs": [], + "source": [ + "def display_before_after(row: pd.Series, index: int, base_path=None) -> None:\n", + " \"\"\"Display original vs edited image for a single record.\n", + "\n", + " When base_path is None (preview mode), edited_image is decoded from base64.\n", + " When base_path is provided (create mode), edited_image is loaded from disk.\n", + " \"\"\"\n", + " print(f\"\\n{'=' * 60}\")\n", + " print(f\"Record {index}: {row['animal']} wearing {row['accessory']}\")\n", + " print(f\"Setting: {row['setting']}\")\n", + " print(f\"Style: {row['art_style']}\")\n", + " print(f\"{'=' * 60}\")\n", + "\n", + " print(\"\\n📷 Original portrait:\")\n", + " display(IPImage(data=base64.b64decode(row[\"base64_image\"])))\n", + "\n", + " print(\"\\n🎨 Edited version:\")\n", + " edited = row.get(\"edited_image\")\n", + " if edited is None:\n", + " return\n", + " if base_path is None:\n", + " images = edited if isinstance(edited, list) else [edited]\n", + " for img_b64 in images:\n", + " display(IPImage(data=base64.b64decode(img_b64)))\n", + " else:\n", + " paths = edited if not isinstance(edited, str) else [edited]\n", + " for path in paths:\n", + " display(IPImage(filename=str(base_path / path)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "af08dc6c", + "metadata": {}, + "outputs": [], + "source": [ + "for index, row in preview.dataset.iterrows():\n", + " display_before_after(row, index)" + ] + }, + { + "cell_type": "markdown", + "id": "9ee15c83", + "metadata": {}, + "source": [ + "### 🆙 Create at scale\n", + "\n", + "In **create** mode, images are saved to disk in an `images//` folder with UUID filenames. The dataframe stores relative paths.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f0d27f8", + "metadata": {}, + "outputs": [], + "source": [ + "results = data_designer.create(config_builder, num_records=5, dataset_name=\"tutorial-6-edited-images\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cc17414a", + "metadata": {}, + "outputs": [], + "source": [ + "dataset = results.load_dataset()\n", + "dataset.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "849c03b6", + "metadata": {}, + "outputs": [], + "source": [ + "for index, row in dataset.head(10).iterrows():\n", + " display_before_after(row, index, base_path=results.artifact_storage.base_dataset_path)" + ] + }, + { + "cell_type": "markdown", + "id": "b7385f02", + "metadata": {}, + "source": [ + "## ⏭️ Next steps\n", + "\n", + "- Experiment with different autoregressive models for image editing\n", + "- Try more creative editing prompts (style transfer, background replacement, artistic filters)\n", + "- Combine image editing with text generation (e.g., generate captions for edited images using an LLM-Text column)\n", + "\n", + "Related tutorials:\n", + "\n", + "- [The basics](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/1-the-basics/): samplers and LLM text columns\n", + "- [Providing images as context](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/4-providing-images-as-context/): image-to-text with VLMs\n", + "- [Generating images](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/5-generating-images/): text-to-image generation with diffusion models\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/concepts/columns.md b/docs/concepts/columns.md index 48eed5e0..03fda5de 100644 --- a/docs/concepts/columns.md +++ b/docs/concepts/columns.md @@ -7,7 +7,7 @@ Columns are the fundamental building blocks in Data Designer. Each column repres ## Column Types -Data Designer provides ten built-in column types, each optimized for different generation scenarios. +Data Designer provides eleven built-in column types, each optimized for different generation scenarios. ### 🎲 Sampler Columns @@ -96,6 +96,24 @@ Define scoring rubrics (relevance, accuracy, fluency, helpfulness) and the judge Use judge columns for data quality filtering (e.g., keep only 4+ rated responses), A/B testing generation strategies, and quality monitoring over time. +### 🖼️ Image Columns + +Image columns generate images from text prompts using either **diffusion** models (DALL·E, Stable Diffusion, Imagen) or **autoregressive** models (Gemini image, GPT image). + +Use **Jinja2 templating** in the prompt to reference other columns, driving diversity across generated images. For example, reference sampled attributes like style, subject, and composition to produce varied images without manually writing different prompts. + +Image columns require a model configured with `ImageInferenceParams`. Model-specific options (size, quality, aspect ratio) are passed via `extra_body` in the inference parameters. + +**Output modes:** + +- **Preview** (`data_designer.preview()`): Images are stored as base64-encoded strings directly in the DataFrame for quick iteration +- **Create** (`data_designer.create()`): Images are saved to disk in an `images//` folder with UUID filenames; the DataFrame stores relative paths + +Image columns also support `multi_modal_context` for autoregressive models that accept image inputs, enabling image-to-image generation workflows. + +!!! tip "Tutorials" + The image tutorials cover three workflows: [Providing Images as Context](../notebooks/4-providing-images-as-context.ipynb) (image → text), [Generating Images](../notebooks/5-generating-images.ipynb) (text → image), and [Editing Images with Image Context](../notebooks/6-editing-images-with-image-context.ipynb) (image → image). + ### 🧬 Embedding Columns Embedding columns generate vector embeddings (numerical representations) for text content using embedding models. These embeddings capture semantic meaning, enabling similarity search, clustering, and semantic analysis. diff --git a/docs/concepts/models/inference-parameters.md b/docs/concepts/models/inference-parameters.md index 3470611c..e0c3b678 100644 --- a/docs/concepts/models/inference-parameters.md +++ b/docs/concepts/models/inference-parameters.md @@ -1,6 +1,6 @@ # Inference Parameters -Inference parameters control how models generate responses during synthetic data generation. Data Designer provides two types of inference parameters: `ChatCompletionInferenceParams` for text/code/structured generation and `EmbeddingInferenceParams` for embedding generation. +Inference parameters control how models generate responses during synthetic data generation. Data Designer provides three types of inference parameters: `ChatCompletionInferenceParams` for text/code/structured generation, `EmbeddingInferenceParams` for embedding generation, and `ImageInferenceParams` for image generation. ## Overview @@ -136,6 +136,44 @@ The `EmbeddingInferenceParams` class controls how models generate embeddings. Th | `extra_body` | `dict[str, Any]` | No | Additional parameters to include in the API request body | +## Image Inference Parameters + +The `ImageInferenceParams` class is used for image generation models, including both diffusion models (DALL·E, Stable Diffusion, Imagen) and autoregressive models (Gemini image, GPT image). Unlike text models, image-specific options are passed entirely via `extra_body`, since they vary significantly between providers. + +### Fields + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `max_parallel_requests` | `int` | No | Maximum concurrent API requests (default: 4, ≥ 1) | +| `timeout` | `int` | No | API request timeout in seconds (≥ 1) | +| `extra_body` | `dict[str, Any]` | No | Model-specific image options (size, quality, aspect ratio, etc.) | + +### Examples + +```python +import data_designer.config as dd + +# Autoregressive model (chat completions API, supports image context) +dd.ModelConfig( + alias="image-model", + model="black-forest-labs/flux.2-pro", + provider="openrouter", + inference_parameters=dd.ImageInferenceParams( + extra_body={"height": 512, "width": 512} + ), +) + +# Diffusion model (e.g., DALL·E, Stable Diffusion) +dd.ModelConfig( + alias="dalle", + model="dall-e-3", + inference_parameters=dd.ImageInferenceParams( + extra_body={"size": "1024x1024", "quality": "hd"} + ), +) +``` + + ## See Also - **[Default Model Settings](default-model-settings.md)**: Pre-configured model settings included with Data Designer diff --git a/docs/notebook_source/5-generating-images.py b/docs/notebook_source/5-generating-images.py index b445b950..dfdc5782 100644 --- a/docs/notebook_source/5-generating-images.py +++ b/docs/notebook_source/5-generating-images.py @@ -23,7 +23,9 @@ # - 📝 **Jinja2 prompts**: Drive diversity by referencing other columns in your prompt template # - 💾 **Preview vs create**: Preview stores base64 in the dataframe; create saves images to disk and stores paths # -# Data Designer supports both **diffusion** (e.g. DALL·E, Stable Diffusion, Imagen) and **autoregressive** (e.g. Gemini image, GPT image) models; the API is chosen automatically from the model name. +# Data Designer supports both **diffusion** (e.g. DALL·E, Stable Diffusion, Imagen) and **autoregressive** (e.g. Gemini image, GPT image) models. +# +# > **Prerequisites**: This tutorial uses [OpenRouter](https://openrouter.ai) with the Flux 2 Pro image model. Set `OPENROUTER_API_KEY` in your environment before running. # # If this is your first time using Data Designer, we recommend starting with the [first notebook](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/1-the-basics/) in this tutorial series. # @@ -45,7 +47,7 @@ # %% [markdown] # ### ⚙️ Initialize the Data Designer interface # -# When initialized without arguments, [default model providers](https://nvidia-nemo.github.io/DataDesigner/latest/concepts/models/default-model-settings/) are used. This tutorial uses [OpenRouter](https://openrouter.ai) with the Flux 2 Pro image model; set `OPENROUTER_API_KEY` in your environment. +# We initialize Data Designer without arguments here—the image model is configured explicitly in the next cell. No default text model is needed for this tutorial. # # %% @@ -277,14 +279,14 @@ dataset.head() # %% -# Display all image from the created dataset. Paths are relative to the artifact output directory. +# Display all images from the created dataset. Paths are relative to the artifact output directory. for index, row in dataset.iterrows(): path_or_list = row.get("generated_image") if path_or_list is not None: - for path in path_or_list: - base = results.artifact_storage.base_dataset_path - full_path = base / path - display(IPImage(data=full_path)) + paths = path_or_list if not isinstance(path_or_list, str) else [path_or_list] + for path in paths: + full_path = results.artifact_storage.base_dataset_path / path + display(IPImage(filename=str(full_path))) # %% [markdown] # ## ⏭️ Next steps @@ -293,4 +295,5 @@ # - [Structured outputs and Jinja](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/2-structured-outputs-and-jinja-expressions/) # - [Seeding with a dataset](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/3-seeding-with-a-dataset/) # - [Providing images as context](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/4-providing-images-as-context/) +# - [Image-to-image editing](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/6-editing-images-with-image-context/): edit existing images with seed datasets # diff --git a/docs/notebook_source/6-editing-images-with-image-context.py b/docs/notebook_source/6-editing-images-with-image-context.py new file mode 100644 index 00000000..c419ad23 --- /dev/null +++ b/docs/notebook_source/6-editing-images-with-image-context.py @@ -0,0 +1,316 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.18.1 +# kernelspec: +# display_name: .venv +# language: python +# name: python3 +# --- + +# %% [markdown] +# # 🎨 Data Designer Tutorial: Image-to-Image Editing +# +# #### 📚 What you'll learn +# +# This notebook shows how to edit existing images by combining a seed dataset with image generation. You'll load animal portrait photographs from HuggingFace, feed them as context to an autoregressive model, and generate fun edited versions with accessories like sunglasses, top hats, and bow ties. +# +# - 🌱 **Seed datasets with images**: Load a HuggingFace image dataset and use it as a seed +# - 🖼️ **Image context for editing**: Pass existing images to an image-generation model via `multi_modal_context` +# - 🎲 **Sampler-driven diversity**: Combine sampled accessories and settings with seed images for varied results +# - 💾 **Preview vs create**: Preview stores base64 in the dataframe; create saves images to disk +# +# This tutorial uses an **autoregressive** model (one that supports both image input *and* image output via the chat completions API). Diffusion models (DALL·E, Stable Diffusion, etc.) do not support image context—see [Tutorial 5](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/5-generating-images/) for text-to-image generation with diffusion models. +# +# > **Prerequisites**: This tutorial uses [OpenRouter](https://openrouter.ai) with the Flux 2 Pro model. Set `OPENROUTER_API_KEY` in your environment before running. +# +# If this is your first time using Data Designer, we recommend starting with the [first notebook](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/1-the-basics/) in this tutorial series. +# + +# %% [markdown] +# ### 📦 Import Data Designer +# +# - `data_designer.config` provides the configuration API. +# - `DataDesigner` is the main interface for generation. +# + +# %% +import base64 +import io +import uuid + +import pandas as pd +from datasets import load_dataset +from IPython.display import Image as IPImage +from IPython.display import display + +import data_designer.config as dd +from data_designer.interface import DataDesigner + +# %% [markdown] +# ### ⚙️ Initialize the Data Designer interface +# +# We initialize Data Designer without arguments here—the image-editing model is configured explicitly in the next cell. No default text model is needed for this tutorial. +# + +# %% +data_designer = DataDesigner() + +# %% [markdown] +# ### 🎛️ Define an image-editing model +# +# We need an **autoregressive** model that supports both image input and image output via the chat completions API. This lets us pass existing images as context and receive edited images back. +# +# - Use `ImageInferenceParams` so Data Designer treats this model as an image generator. +# - Image-specific options are model-dependent; pass them via `extra_body`. +# +# > **Note**: This tutorial uses the Flux 2 Pro model via [OpenRouter](https://openrouter.ai). Set `OPENROUTER_API_KEY` in your environment. +# + +# %% +MODEL_PROVIDER = "openrouter" +MODEL_ID = "black-forest-labs/flux.2-pro" +MODEL_ALIAS = "image-editor" + +model_configs = [ + dd.ModelConfig( + alias=MODEL_ALIAS, + model=MODEL_ID, + provider=MODEL_PROVIDER, + inference_parameters=dd.ImageInferenceParams( + extra_body={"height": 512, "width": 512}, + ), + ) +] + +# %% [markdown] +# ### 🌱 Load animal portraits from HuggingFace +# +# We'll load animal face photographs from the [AFHQ](https://huggingface.co/datasets/huggan/AFHQv2) (Animal Faces-HQ) dataset, convert them to base64, and use them as a seed dataset. +# +# AFHQ contains high-quality 512×512 close-up portraits of cats, dogs, and wildlife—perfect subjects for adding fun accessories. +# + +# %% +SEED_COUNT = 10 +BASE64_IMAGE_HEIGHT = 512 + +ANIMAL_LABELS = {0: "cat", 1: "dog", 2: "wild"} + + +def resize_image(image, height: int): + """Resize image maintaining aspect ratio.""" + original_width, original_height = image.size + width = int(original_width * (height / original_height)) + return image.resize((width, height)) + + +def prepare_record(record: dict, height: int) -> dict: + """Convert a HuggingFace record to base64 with metadata.""" + image = resize_image(record["image"], height) + img_buffer = io.BytesIO() + image.save(img_buffer, format="PNG") + base64_string = base64.b64encode(img_buffer.getvalue()).decode("utf-8") + return { + "uuid": str(uuid.uuid4()), + "base64_image": base64_string, + "animal": ANIMAL_LABELS[record["label"]], + } + + +# %% +print("📥 Streaming animal portraits from HuggingFace...") +hf_dataset = load_dataset("huggan/AFHQv2", split="train", streaming=True) + +hf_iter = iter(hf_dataset) +records = [prepare_record(next(hf_iter), BASE64_IMAGE_HEIGHT) for _ in range(SEED_COUNT)] +df_seed = pd.DataFrame(records) + +print(f"✅ Prepared {len(df_seed)} animal portraits with columns: {list(df_seed.columns)}") +df_seed.head() + +# %% [markdown] +# ### 🏗️ Build the configuration +# +# We combine three ingredients: +# +# 1. **Seed dataset** — original animal portraits as base64 and their species labels +# 2. **Sampler columns** — randomly sample accessories and settings for each image +# 3. **Image column with context** — generate an edited image using the original as reference +# +# The `multi_modal_context` parameter on `ImageColumnConfig` tells Data Designer to pass the seed image to the model alongside the text prompt. The model receives both the image and the editing instructions, and generates a new image. +# + +# %% +config_builder = dd.DataDesignerConfigBuilder(model_configs=model_configs) + +# 1. Seed the original animal portraits +config_builder.with_seed_dataset(dd.DataFrameSeedSource(df=df_seed)) + +# 2. Add sampler columns for accessory diversity +config_builder.add_column( + dd.SamplerColumnConfig( + name="accessory", + sampler_type=dd.SamplerType.CATEGORY, + params=dd.CategorySamplerParams( + values=[ + "a tiny top hat", + "oversized sunglasses", + "a red bow tie", + "a knitted beanie", + "a flower crown", + "a monocle and mustache", + "a pirate hat and eye patch", + "a chef hat", + ], + ), + ) +) + +config_builder.add_column( + dd.SamplerColumnConfig( + name="setting", + sampler_type=dd.SamplerType.CATEGORY, + params=dd.CategorySamplerParams( + values=[ + "a cozy living room", + "a sunny park", + "a photo studio with soft lighting", + "a red carpet event", + "a holiday card backdrop with snowflakes", + "a tropical beach at sunset", + ], + ), + ) +) + +config_builder.add_column( + dd.SamplerColumnConfig( + name="art_style", + sampler_type=dd.SamplerType.CATEGORY, + params=dd.CategorySamplerParams( + values=[ + "a photorealistic style", + "a Disney Pixar 3D render", + "a watercolor painting", + "a pop art poster", + ], + ), + ) +) + +# 3. Image column that reads the seed image as context and generates an edited version +config_builder.add_column( + dd.ImageColumnConfig( + name="edited_image", + prompt=( + "Edit this {{ animal }} portrait photo. " + "Add {{ accessory }} on the animal. " + "Place the {{ animal }} in {{ setting }}. " + "Render the result in {{ art_style }}. " + "Keep the animal's face, expression, and features faithful to the original photo." + ), + model_alias=MODEL_ALIAS, + multi_modal_context=[ + dd.ImageContext( + column_name="base64_image", + data_type=dd.ModalityDataType.BASE64, + image_format=dd.ImageFormat.PNG, + ) + ], + ) +) + +data_designer.validate(config_builder) + +# %% [markdown] +# ### 🔁 Preview: quick iteration +# +# In **preview** mode, generated images are stored as base64 strings in the dataframe. Use this to iterate on your prompts, accessories, and sampler values before scaling up. +# + +# %% +preview = data_designer.preview(config_builder, num_records=2) + +# %% +for i in range(len(preview.dataset)): + preview.display_sample_record() + +# %% +preview.dataset + +# %% [markdown] +# ### 🔎 Compare original vs edited +# +# Let's display the original animal portraits next to their accessorized versions. +# + + +# %% +def display_before_after(row: pd.Series, index: int, base_path=None) -> None: + """Display original vs edited image for a single record. + + When base_path is None (preview mode), edited_image is decoded from base64. + When base_path is provided (create mode), edited_image is loaded from disk. + """ + print(f"\n{'=' * 60}") + print(f"Record {index}: {row['animal']} wearing {row['accessory']}") + print(f"Setting: {row['setting']}") + print(f"Style: {row['art_style']}") + print(f"{'=' * 60}") + + print("\n📷 Original portrait:") + display(IPImage(data=base64.b64decode(row["base64_image"]))) + + print("\n🎨 Edited version:") + edited = row.get("edited_image") + if edited is None: + return + if base_path is None: + images = edited if isinstance(edited, list) else [edited] + for img_b64 in images: + display(IPImage(data=base64.b64decode(img_b64))) + else: + paths = edited if not isinstance(edited, str) else [edited] + for path in paths: + display(IPImage(filename=str(base_path / path))) + + +# %% +for index, row in preview.dataset.iterrows(): + display_before_after(row, index) + +# %% [markdown] +# ### 🆙 Create at scale +# +# In **create** mode, images are saved to disk in an `images//` folder with UUID filenames. The dataframe stores relative paths. +# + +# %% +results = data_designer.create(config_builder, num_records=5, dataset_name="tutorial-6-edited-images") + +# %% +dataset = results.load_dataset() +dataset.head() + +# %% +for index, row in dataset.head(10).iterrows(): + display_before_after(row, index, base_path=results.artifact_storage.base_dataset_path) + +# %% [markdown] +# ## ⏭️ Next steps +# +# - Experiment with different autoregressive models for image editing +# - Try more creative editing prompts (style transfer, background replacement, artistic filters) +# - Combine image editing with text generation (e.g., generate captions for edited images using an LLM-Text column) +# +# Related tutorials: +# +# - [The basics](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/1-the-basics/): samplers and LLM text columns +# - [Providing images as context](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/4-providing-images-as-context/): image-to-text with VLMs +# - [Generating images](https://nvidia-nemo.github.io/DataDesigner/latest/notebooks/5-generating-images/): text-to-image generation with diffusion models +# diff --git a/docs/notebook_source/_README.md b/docs/notebook_source/_README.md index 7bcd77d1..bbd29f9e 100644 --- a/docs/notebook_source/_README.md +++ b/docs/notebook_source/_README.md @@ -106,6 +106,15 @@ Generate synthetic image data with Data Designer: - Preview (base64 in dataframe) vs create (images saved to disk, paths in dataframe) - Displaying generated images in the notebook +### [6. Image-to-Image Editing](6-editing-images-with-image-context.ipynb) + +Edit existing images by combining seed datasets with image generation: + +- Loading a HuggingFace image dataset and using it as a seed +- Passing existing images to an image-generation model via `multi_modal_context` +- Combining sampled accessories and settings with seed images for varied results +- Comparing original vs edited images in preview and create modes + ## 📖 Important Documentation Sections Before diving into the tutorials, familiarize yourself with these key documentation sections: diff --git a/mkdocs.yml b/mkdocs.yml index 03b852d7..27c5a74c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -36,6 +36,8 @@ nav: - Structured Outputs and Jinja Expressions: notebooks/2-structured-outputs-and-jinja-expressions.ipynb - Seeding with an External Dataset: notebooks/3-seeding-with-a-dataset.ipynb - Providing Images as Context: notebooks/4-providing-images-as-context.ipynb + - Generating Images: notebooks/5-generating-images.ipynb + - Image-to-Image Editing: notebooks/6-editing-images-with-image-context.ipynb - Recipes: - Recipe Cards: recipes/cards.md - Code Generation: diff --git a/packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py b/packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py index 6e42844b..e8845ae1 100644 --- a/packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py +++ b/packages/data-designer-engine/src/data_designer/engine/dataset_builders/column_wise_builder.py @@ -182,6 +182,10 @@ def _has_image_columns(self) -> bool: """Check if config has any image generation columns.""" return any(col.column_type == DataDesignerColumnType.IMAGE for col in self.single_column_configs) + def _has_image_columns(self) -> bool: + """Check if config has any image generation columns.""" + return any(col.column_type == DataDesignerColumnType.IMAGE for col in self.single_column_configs) + def _initialize_generators(self) -> list[ColumnGenerator]: return [ self._registry.column_generators.get_for_config_type(type(config))(