Skip to content

Conversation

@ggoklani
Copy link
Contributor

@ggoklani ggoklani commented Jan 27, 2026

Enhancement:
Added a new bash shell script (tests/test_moneo.sh) for Moneo installation verification and functional testing. This script provides automated testing capabilities for validating Moneo GPU monitoring tool deployment on HPC systems.
Key features include:
Installation verification tests (directory existence, main script presence)

Functional tests:
Moneo full deployment with hostfile support

Grafana service health check (port 3000)

Prometheus service health check (port 9090)

Moneo shutdown functionality

SELinux enforcement handling (setenforce 0 at start, setenforce 1 at end)

Automatic hostfile detection and creation
60-second service startup wait after deployment
Reason:
To provide a lightweight, dependency-free testing solution for validating Moneo installations in CI/CD pipelines and manual verification scenarios. The bash implementation eliminates Python dependencies and provides native shell integration for HPC environments where minimal tooling is preferred.
Result:
Automated verification of Moneo installation, configuration, and functionality.

Prerequisites to run Tests script:

  1. sudo yum install -y docker pssh
  2. setup password less ssh between nodes.

Sample output:
./test-moneo.sh

Moneo Test Script

[SETUP] Disabling SELinux...
[SETUP] Configuring container registries...
[SETUP] Creating temporary hostfile...

==============================================
Running Tests

Testing: Moneo directory exists
[PASS] Moneo directory exists at /opt/hpc/azure/tools/Moneo
Testing: Moneo script exists
[PASS] moneo.py exists
Testing: Moneo deployment
[PASS] Moneo deployed successfully
Testing: Grafana is running
Waiting 60 seconds for services to start...
[PASS] Grafana is running (HTTP 302)
Testing: Prometheus is running
[PASS] Prometheus is running (HTTP 302)
Testing: Moneo shutdown
[PASS] Moneo shutdown completed
Testing: Grafana is stopped
[PASS] Grafana is stopped (connection refused)
Testing: Prometheus is stopped
[PASS] Prometheus is stopped (connection refused)

==============================================
All tests passed: 8

[CLEANUP] Removing temporary hostfile...
[CLEANUP] Re-enabling SELinux...

Issue Tracker Tickets (Jira or BZ if any):
https://issues.redhat.com/browse/RHELHPC-125

Summary by Sourcery

Add automated bash-based verification and functional tests for Moneo and align the Moneo installation path with the Azure tools directory while preserving compatibility with existing service configuration scripts.

New Features:

  • Introduce a Moneo installation and functional validation script (test-moneo.sh) that exercises deployment, service health, and shutdown flows.
  • Install the Moneo test script into the Azure HPC tests directory for use in CI and manual verification.

Enhancements:

  • Update the Moneo installation directory to derive from the Azure tools base path instead of a fixed legacy location.
  • Create a compatibility symlink from the legacy Moneo tools path to the actual installation directory when they differ to support existing configure_service.sh usage.

Tests:

  • Add end-to-end bash tests that verify Moneo directory and script presence, deployment, Grafana and Prometheus availability, and shutdown behavior.

@sourcery-ai
Copy link

sourcery-ai bot commented Jan 27, 2026

Reviewer's Guide

Adds a bash-based Moneo installation/functional test script and wires it into the Azure HPC role while updating Moneo install paths and maintaining backward compatibility via a legacy symlink.

Sequence diagram for running the new Moneo validation test script

sequenceDiagram
  actor Admin
  participant TestScript as test_moneo_sh
  participant MoneoInstaller as configure_service_sh
  participant Grafana as Grafana_service
  participant Prometheus as Prometheus_service
  participant GPUDriver as nvidia_smi

  Admin->>TestScript: Invoke with CLI options
  TestScript->>TestScript: Parse options (verbose, skip_functional, json_output)
  TestScript->>TestScript: setenforce 0 (temporarily disable SELinux)

  TestScript->>TestScript: Verify Moneo installation directory (__hpc_moneo_install_dir)
  TestScript->>TestScript: Verify Moneo main script presence
  TestScript->>TestScript: Verify bashrc alias configuration

  alt Functional tests not skipped
    TestScript->>MoneoInstaller: Deploy Moneo with hostfile
    MoneoInstaller-->>TestScript: Deployment status

    TestScript->>TestScript: Wait up to 60 seconds for services

    TestScript->>Grafana: HTTP health check on port 3000
    Grafana-->>TestScript: Health status

    TestScript->>Prometheus: HTTP health check on port 9090
    Prometheus-->>TestScript: Health status

    TestScript->>GPUDriver: Run nvidia-smi for GPU detection
    GPUDriver-->>TestScript: GPU info or error

    TestScript->>MoneoInstaller: Shutdown Moneo deployment
    MoneoInstaller-->>TestScript: Shutdown status
  else Functional tests skipped
    TestScript->>TestScript: Skip functional validation steps
  end

  TestScript->>TestScript: setenforce 1 (restore SELinux)
  TestScript->>Admin: Output color-coded results and optional JSON summary
Loading

File-Level Changes

Change Details Files
Update Moneo installation directory to use centralized Azure tools path while preserving compatibility with existing paths.
  • Change the Moneo installation directory variable to point at the Azure tools directory prefix instead of the hard-coded legacy path.
  • Add a conditional Ansible block that creates /opt/azurehpc/tools and symlinks it to the new Moneo installation path when the directory differs, to keep configure_service.sh working.
vars/main.yml
tasks/main.yml
Install a new bash-based Moneo validation test script as part of the role.
  • Add an Ansible task to copy the Moneo test script from the role files into the configured Azure tests directory with executable permissions.
  • Introduce a shell script that performs Moneo installation verification, deployment, Grafana/Prometheus health checks, and shutdown tests, including SELinux handling, dependency installation, and basic hostfile setup.
tasks/main.yml
files/tests/test-moneo.sh

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ggoklani ggoklani force-pushed the test_moneo_tool branch 5 times, most recently from 341c3ea to c830974 Compare January 28, 2026 08:42
@ggoklani ggoklani marked this pull request as ready for review January 28, 2026 09:49
@ggoklani ggoklani requested a review from spetrosi as a code owner January 28, 2026 09:49
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The test script hardcodes MONEO_HOME=/opt/hpc/azure/tools/Moneo instead of deriving it from the role variables (e.g. __hpc_moneo_install_dir), which may drift from the actual install path and break when the install prefix changes.
  • In test-moneo.sh the SELinux mode is unconditionally changed with setenforce 0/1; consider detecting the initial SELinux state and restoring it, and guarding these calls for systems without enforcing SELinux to avoid unintended behavior.
  • The test script installs packages and writes /etc/containers/registries.conf.d/99-unqualified-search.conf; making these side effects optional or merging with existing config would reduce the risk of impacting the host’s container setup during test runs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The test script hardcodes `MONEO_HOME=/opt/hpc/azure/tools/Moneo` instead of deriving it from the role variables (e.g. `__hpc_moneo_install_dir`), which may drift from the actual install path and break when the install prefix changes.
- In `test-moneo.sh` the SELinux mode is unconditionally changed with `setenforce 0`/`1`; consider detecting the initial SELinux state and restoring it, and guarding these calls for systems without enforcing SELinux to avoid unintended behavior.
- The test script installs packages and writes `/etc/containers/registries.conf.d/99-unqualified-search.conf`; making these side effects optional or merging with existing config would reduce the risk of impacting the host’s container setup during test runs.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ggoklani ggoklani changed the title feat: Added Testcases for testing moneo tool test: Added Testcases for testing moneo tool Jan 28, 2026

# Disable SELinux
echo "[SETUP] Disabling SELinux..."
sudo setenforce 0 2>/dev/null || echo "[SETUP] Could not disable SELinux"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to disable selinux?

Copy link
Contributor Author

@ggoklani ggoklani Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moneo.py is unable to deploy prometheus container due to selinux context , so for testing purpose i have to disable and enable in the script.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my information - what is the selinux issue e.g. output of ausearch? If it is something simple, we might use the selinux system role to add a policy for this. Every time someone disables selinux, Dan Walsh feels a disturbance in the Force.

Copy link
Contributor Author

@ggoklani ggoklani Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see below logs:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7996a2c3636 docker.io/prom/prometheus:latest --storage.tsdb.pa... 12 seconds ago Exited (2) 11 seconds ago 9090/tcp prometheus

container exiting due to below error on logs:
sudo docker logs prometheus
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time=2026-01-30T05:12:09.929Z level=ERROR source=main.go:654 msg="Error loading config (--config.file=/etc/prometheus/prometheus.yml)" file=/etc/prometheus/prometheus.yml err="open /etc/prometheus/prometheus.yml: permission denied"

This is a permission/SELinux issue when mounting volumes in Podman. The container can't access the config file due to SELinux labeling.
Fix: Add :z or :Z to your volume mount, or disable SELinux labeling:


[aliases]
"prometheus" = "docker.io/prom/prometheus"
EOF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this container setup being done here? shouldn't it have been done during moneo package installation?

Copy link
Contributor Author

@ggoklani ggoklani Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prometheus docker image getting pulled using short name , it is done by moneo installation script , so change is done to support short name container pull

@ggoklani ggoklani requested a review from richm January 29, 2026 08:59
@richm
Copy link
Contributor

richm commented Jan 29, 2026

lgtm but I'll defer to @dgchinner

I notice a few style things like using if echo "$var" | grep somepattern instead of if [[ "$var" =~ somepattern ]] to avoid forking and piping, but otherwise, ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants