fix: Add waitready command to verify cluster ready by johnramsden · Pull Request #683 · canonical/microceph

johnramsden · 2026-02-27T22:51:49Z

Description

When an operator attempts to do something before the cluster is up they can receive unexpected failures because bootstrap is not finished or microcluster is not yet available. This can be particularly problematic in CI or scripting.

Add an additional subcommand (similar to lxd waitready) https://manpages.debian.org/unstable/lxd/lxd.waitready.1

To confirm the cluster is up we check for the microcluster daemon to be ready, and for ceph to be ready (ceph -s)

On failure we get a message like the following if we haven't bootstrapped for example:

microceph waitready --timeout 30
Error: ceph not ready: timed out waiting for Ceph to become ready: context deadline exceeded

Running the following you should expect it to wait before running status, and it should succeed

sudo microceph cluster bootstrap &
sudo microceph waitready
sudo microceph status
[1] 35966
MicroCeph deployment summary:
- microceph (10.56.203.112) Services: mds, mgr, mon Disks: 0

Fixes #653

Type of change

Delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How has this been tested?

Added tests demonstrating waiting and timeout prior to bootstrap, and waiting succeeding post bootstrap.

Contributor checklist

Please check that you have:

self-reviewed the code in this PR
added code comments, particularly in less straightforward areas
checked and added or updated relevant documentation
checked and added or updated relevant release notes
added tests to verify effectiveness of this change

When an operator attempts to do something before the cluster is up they can receive unexpected failures because bootstrap is not finished or microcluster is not yet available. This can be particularly problematic in CI or scripting. Add an additional subcommand (similar to lxd waitready) https://manpages.debian.org/unstable/lxd/lxd.waitready.1 To confirm the cluster is up we check for the microcluster daemon to be ready, and for ceph to be ready (ceph -s) On failure we get a message like the following if we haven't bootstrapped for example: microceph waitready --timeout 30 Error: ceph not ready: timed out waiting for Ceph to become ready: context deadline exceeded Running the following you should expect it to wait before running status, and it should succeed sudo microceph cluster bootstrap & sudo microceph waitready sudo microceph status [1] 35966 MicroCeph deployment summary: - microceph (10.56.203.112) Services: mds, mgr, mon Disks: 0 Signed-off-by: John Ramsden <john.ramsden@canonical.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

johnramsden · 2026-02-27T22:52:33Z

One note I have is I'm not sure if ceph -s is completely sufficient and if there's anything else we want to wait on

sabaini

Hey @johnramsden thank you, lgtm in general, two comments inline

sabaini · 2026-03-02T10:05:19Z

microceph/ceph/monitor.go

+// It retries every second until success or the context is cancelled/expired.
+func WaitForCephReady(ctx context.Context) error {
+	for {
+		_, err := common.ProcessExec.RunCommand("ceph", "-s")


Hm, a hanging ceph -s could hang this forever, not sure how often this occurs in practice but just for robustness could use the cephRunContext() function and pass in the ctx

sabaini · 2026-03-02T10:18:18Z

microceph/cmd/microceph/waitready.go

+	}
+
+	ctx := context.Background()
+	if c.flagTimeout > 0 {


Minor nit: should we be erroring out if operators pass in a neg. timeout value?

sabaini requested changes Mar 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Add waitready command to verify cluster ready#683

fix: Add waitready command to verify cluster ready#683
johnramsden wants to merge 1 commit intocanonical:mainfrom
johnramsden:john/CEPH-1590-wait-ready

johnramsden commented Feb 27, 2026

Uh oh!

johnramsden commented Feb 27, 2026

Uh oh!

sabaini left a comment

Uh oh!

sabaini Mar 2, 2026

Uh oh!

sabaini Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johnramsden commented Feb 27, 2026

Description

Type of change

How has this been tested?

Contributor checklist

Uh oh!

johnramsden commented Feb 27, 2026

Uh oh!

sabaini left a comment

Choose a reason for hiding this comment

Uh oh!

sabaini Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

sabaini Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants