Add multi-step tool-calling SDG tutorial for workplace assistant #327

shashank3959 · 2026-02-12T22:56:46Z

Add notebook, tool definitions, and utility modules for generating synthetic multi-step tool-calling training data using Data Designer. Includes dual-level LLM judge filtering and NeMo Gym export.

Add notebook, tool definitions, and utility modules for generating synthetic multi-step tool-calling training data using Data Designer. Includes dual-level LLM judge filtering and NeMo Gym export. Signed-off-by: Shashank Verma <shashankv@nvidia.com>

github-actions · 2026-02-12T22:56:57Z

Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:

I have read the DCO document and I hereby sign the DCO.

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the DCO Assistant Lite bot.}

greptile-apps · 2026-02-12T22:59:21Z

Greptile Overview

Greptile Summary

Adds comprehensive multi-step tool-calling synthetic data generation tutorial using Data Designer. Implements a complete pipeline for generating realistic workplace assistant queries with simulated agent trajectories, dual-level LLM judge filtering for quality control, and NeMo Gym export format compatibility.

Key Components:

Tutorial notebook with 27 workplace assistant tools across 6 databases (email, calendar, CRM, analytics, project management, company directory)
Dual-level quality filtering utilities to validate both user queries and generated trajectories
NeMo Gym format conversion for RL training compatibility
27 multi-step patterns for diverse task generation (lookup-then-send, search-then-update, etc.)

Style Issue:

All 3 Python utility files are missing required NVIDIA SPDX license headers as specified in AGENTS.md

Confidence Score: 4/5

Safe to merge after adding license headers to Python files
Well-structured tutorial with proper type annotations and good code organization. The only issue is missing NVIDIA license headers on 3 Python utility files, which is a style requirement that should be fixed before merging
The 3 Python utility files in docs/colab_notebooks/5-multistep-toolcalling/utils/ need NVIDIA SPDX license headers added

Important Files Changed

Filename	Overview
docs/colab_notebooks/5-multistep-toolcalling/multistep-toolcalling.ipynb	Comprehensive tutorial notebook for multi-step tool-calling SDG with clear examples and dual-level quality filtering
docs/colab_notebooks/5-multistep-toolcalling/utils/init.py	Package initialization - missing NVIDIA license headers required per AGENTS.md
docs/colab_notebooks/5-multistep-toolcalling/utils/convert_to_nemo_gym_format.py	NeMo Gym format converter with proper type hints - missing NVIDIA license headers required per AGENTS.md
docs/colab_notebooks/5-multistep-toolcalling/utils/quality_filtering.py	Quality filtering utilities with dual-level validation - missing NVIDIA license headers required per AGENTS.md
docs/colab_notebooks/5-multistep-toolcalling/tools/environment.json	Environment configuration with 27 multi-step patterns covering all tool combinations

Sequence Diagram

sequenceDiagram
    participant User
    participant DataDesigner
    participant LLM
    participant QualityFilter
    participant NeMoGym

    User->>DataDesigner: Load tool schemas & seed data
    DataDesigner->>LLM: Generate user query from pattern
    LLM-->>DataDesigner: Return user query
    DataDesigner->>LLM: Judge user query (feasibility, schema compliance)
    LLM-->>DataDesigner: Return query scores
    DataDesigner->>LLM: Generate trajectory (tool calls)
    LLM-->>DataDesigner: Return agent trajectory
    DataDesigner->>LLM: Judge trajectory (tool validity, completeness)
    LLM-->>DataDesigner: Return trajectory scores
    DataDesigner->>QualityFilter: Filter by dual-level scores
    QualityFilter->>QualityFilter: Stage 1: Validate query
    QualityFilter->>QualityFilter: Stage 2: Validate trajectory
    QualityFilter-->>User: Return filtered dataset
    User->>NeMoGym: Convert to NeMo Gym JSONL format
    NeMoGym-->>User: Training data ready for RL

_{Last reviewed commit: eb2b52b}

…olcalling.ipynb Signed-off-by: Shashank Verma <shashankv@nvidia.com>

- quality_filtering.py: remove FilterThresholds dataclass, quickstart print, and verbose ASCII output; de-duplicate show_rejection_reasons (285 → 85 lines) - convert_to_nemo_gym_format.py: remove factory pattern and quickstart print (108 → 76 lines) - __init__.py: export only 4 functions (24 → 10 lines) - Notebook: fix imports to use data_designer.config/interface instead of removed essentials module; use functools.partial for converter Signed-off-by: Shashank Verma <shashankv@nvidia.com>

greptile-apps

_{11 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-12T23:54:49Z

docs/colab_notebooks/5-multistep-toolcalling/utils/__init__.py

@@ -0,0 +1,9 @@
+from .quality_filtering import filter_high_quality, show_rejection_reasons


Missing NVIDIA license headers. Per AGENTS.md, all Python files must include:

Suggested change

from .quality_filtering import filter_high_quality, show_rejection_reasons

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

# SPDX-License-Identifier: Apache-2.0

from .quality_filtering import filter_high_quality, show_rejection_reasons

Context Used: Context from dashboard - AGENTS.md (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/colab_notebooks/5-multistep-toolcalling/utils/__init__.py Line: 1:1 Comment: Missing NVIDIA license headers. Per AGENTS.md, all Python files must include: ```suggestion # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 from .quality_filtering import filter_high_quality, show_rejection_reasons ``` **Context Used:** Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=95f8243f-5118-40bc-a3ab-69210b72e57e)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-12T23:54:50Z

docs/colab_notebooks/5-multistep-toolcalling/utils/convert_to_nemo_gym_format.py

@@ -0,0 +1,75 @@
+"""Utilities for converting generated records to NeMo Gym JSONL format."""


Missing NVIDIA license headers. Per AGENTS.md, all Python files must include:

Suggested change

"""Utilities for converting generated records to NeMo Gym JSONL format."""

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

# SPDX-License-Identifier: Apache-2.0

"""Utilities for converting generated records to NeMo Gym JSONL format."""

Context Used: Context from dashboard - AGENTS.md (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/colab_notebooks/5-multistep-toolcalling/utils/convert_to_nemo_gym_format.py Line: 1:1 Comment: Missing NVIDIA license headers. Per AGENTS.md, all Python files must include: ```suggestion # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 """Utilities for converting generated records to NeMo Gym JSONL format.""" ``` **Context Used:** Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=95f8243f-5118-40bc-a3ab-69210b72e57e)) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-12T23:54:51Z

docs/colab_notebooks/5-multistep-toolcalling/utils/quality_filtering.py

@@ -0,0 +1,86 @@
+"""Utilities for dual-level quality filtering of generated datasets."""


Missing NVIDIA license headers. Per AGENTS.md, all Python files must include:

Suggested change

"""Utilities for dual-level quality filtering of generated datasets."""

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

# SPDX-License-Identifier: Apache-2.0

"""Utilities for dual-level quality filtering of generated datasets."""

Context Used: Context from dashboard - AGENTS.md (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: docs/colab_notebooks/5-multistep-toolcalling/utils/quality_filtering.py Line: 1:1 Comment: Missing NVIDIA license headers. Per AGENTS.md, all Python files must include: ```suggestion # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 """Utilities for dual-level quality filtering of generated datasets.""" ``` **Context Used:** Context from `dashboard` - AGENTS.md ([source](https://app.greptile.com/review/custom-context?memory=95f8243f-5118-40bc-a3ab-69210b72e57e)) How can I resolve this? If you propose a fix, please make it concise.

shashank3959 requested a review from a team as a code owner February 12, 2026 22:56

shashank3959 added 2 commits February 12, 2026 14:59

Rename folder to 5-multistep-toolcalling and notebook to multistep-to…

f8903fc

…olcalling.ipynb Signed-off-by: Shashank Verma <shashankv@nvidia.com>

shashank3959 force-pushed the dev/multistep-toolcalling-sdg branch from 70ab957 to eb2b52b Compare February 12, 2026 23:51

greptile-apps bot reviewed Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-step tool-calling SDG tutorial for workplace assistant #327

Add multi-step tool-calling SDG tutorial for workplace assistant #327

shashank3959 commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

greptile-apps bot commented Feb 12, 2026 •

edited

Loading

Confidence Score: 4/5

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 12, 2026

Uh oh!

greptile-apps bot Feb 12, 2026

Uh oh!

greptile-apps bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,9 @@
		from .quality_filtering import filter_high_quality, show_rejection_reasons

		@@ -0,0 +1,75 @@
		"""Utilities for converting generated records to NeMo Gym JSONL format."""

		@@ -0,0 +1,86 @@
		"""Utilities for dual-level quality filtering of generated datasets."""

Add multi-step tool-calling SDG tutorial for workplace assistant #327

Are you sure you want to change the base?

Add multi-step tool-calling SDG tutorial for workplace assistant #327

Conversation

shashank3959 commented Feb 12, 2026

Uh oh!

github-actions bot commented Feb 12, 2026

Uh oh!

greptile-apps bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Feb 12, 2026 •

edited

Loading