Skip to content

Conversation

@pan-x-c
Copy link
Collaborator

@pan-x-c pan-x-c commented Feb 11, 2026

Description

Using trinity log to track the logs during training.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the trinity CLI by adding a powerful new log command for real-time monitoring of experiment logs. This new utility allows users to easily track training progress and debug issues by providing flexible options for filtering, searching, and displaying log data. The changes also include a minor refactoring in the configuration handling to streamline directory path generation.

Highlights

  • New CLI Command: Introduced a new trinity log command for real-time monitoring of experiment logs, offering options for directory specification, configuration loading, keyword filtering, log level control, displaying last N lines, search patterns, and color output.
  • Log Management System: Added a comprehensive LogManager and LogFileTracker system to efficiently handle multiple log files, including features like file rotation detection, log level parsing, and continuous monitoring.
  • Configuration Refactoring: Refactored the Config class to include a dedicated method get_checkpoint_job_dir, centralizing the logic for determining checkpoint job directory paths and improving code reusability.
Changelog
  • trinity/cli/launcher.py
    • Added a new log command to the typer application, enabling real-time log monitoring with various configurable options.
  • trinity/common/config.py
    • Introduced a new method get_checkpoint_job_dir to the Config class for consistent retrieval of checkpoint job directory paths.
  • trinity/common/config_validator.py
    • Updated the validate method to utilize the newly added get_checkpoint_job_dir method for setting the checkpoint job directory.
  • trinity/manager/log_manager.py
    • Added a new module containing LogManager and LogFileTracker classes, providing core functionality for real-time log file tracking, parsing, and display.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new trinity log command to monitor experiment logs in real-time. The core logic is encapsulated in a new LogManager. The changes also include a small refactoring in the Config class to centralize the construction of the checkpoint job directory path.

My review focuses on improving robustness and performance. I've identified a potential high-memory usage issue in the log searching feature, a case of silent exception handling that could hide bugs, and an opportunity to simplify some redundant code in the new CLI command. Overall, this is a great addition for improving observability during training.

f"{Colors.CYAN}[INFO] Searching for pattern '{self.search_pattern}' in {self.filepath}{Colors.RESET}"
)
self.file.seek(0)
lines = self.file.readlines()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Reading the entire file into memory with self.file.readlines() can lead to high memory consumption and potentially a MemoryError if the log file is very large.

To make this more memory-efficient, consider processing the file line-by-line. You could use a collections.deque with a fixed size to maintain a sliding window of lines, which would allow you to print context around a match without loading the entire file.

Comment on lines 417 to 426
if not log_dir:
cfg = load_config(config)
checkpoint_job_dir = cfg.get_checkpoint_job_dir()
# we do not use check_and_update here because user may use this command
# in another environment
log_dir = os.path.join(checkpoint_job_dir, "log")
if not os.path.exists(log_dir):
raise FileNotFoundError(f"Log directory not found: {log_dir}")
if not os.path.exists(log_dir):
raise FileNotFoundError(f"Log directory not found: {log_dir}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for determining and checking log_dir can be simplified. The os.path.exists(log_dir) check is performed twice in the case where log_dir is derived from the config. You can consolidate this into a single check after log_dir has been determined.

Suggested change
if not log_dir:
cfg = load_config(config)
checkpoint_job_dir = cfg.get_checkpoint_job_dir()
# we do not use check_and_update here because user may use this command
# in another environment
log_dir = os.path.join(checkpoint_job_dir, "log")
if not os.path.exists(log_dir):
raise FileNotFoundError(f"Log directory not found: {log_dir}")
if not os.path.exists(log_dir):
raise FileNotFoundError(f"Log directory not found: {log_dir}")
if not log_dir:
cfg = load_config(config)
checkpoint_job_dir = cfg.get_checkpoint_job_dir()
# we do not use check_and_update here because user may use this command
# in another environment
log_dir = os.path.join(checkpoint_job_dir, "log")
if not os.path.exists(log_dir):
raise FileNotFoundError(f"Log directory not found: {log_dir}")

Comment on lines 118 to 119
except Exception:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This broad except Exception: block silently catches all exceptions and returns False. This can hide potential bugs or unexpected issues during file stat checks (e.g., permission errors). It's better to log the exception to aid in debugging, similar to how other exception blocks in this class are handled.

Suggested change
except Exception:
return False
except Exception as e:
print(f"{Colors.RED}[ERROR] Error checking file rotation for {self.filepath}: {e}{Colors.RESET}")
return False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant