OCPBUGS-29894: Check if CRLs are downloaded when determining ready status by rfredette · Pull Request #595 · openshift/router

rfredette · 2024-05-13T19:13:47Z

Require all CRLs to be downloaded before the router can report that it's ready. This prevents forwarding requests to a router until it's ready to handle mTLS.

This fixes OCPBUGS-29894

openshift-ci-robot · 2024-05-13T19:13:52Z

@rfredette: This pull request references Jira Issue OCPBUGS-29894, which is invalid:

expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Require all CRLs to be downloaded before the router can report that it's ready. This prevents forwarding requests to a router until it's ready to handle mTLS.

This fixes OCPBUGS-29894

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

rfredette · 2024-05-13T19:19:24Z

/jira refresh

openshift-ci-robot · 2024-05-13T19:19:29Z

@rfredette: This pull request references Jira Issue OCPBUGS-29894, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

rfredette · 2024-06-05T14:27:32Z

/retest

Miciah · 2024-06-05T15:22:52Z

/assign

openshift-bot · 2024-09-04T01:00:38Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

lihongan · 2024-09-04T02:04:16Z

/remove-lifecycle stale

Miciah

This makes a router pod start failing readiness checks if it has outdated CRLs, right?

To fix OCPBUGS-29894, it should be sufficient to fail readiness only for the initial synch, so that startup probes (which use the readiness endpoint) fail until the initial synch is done.

Once the router pod has done the initial synch, we want readiness checks to pass even if refresh fails, for two reasons:

The expectation is to restore the behavior prior to openshift/cluster-ingress-operator#939 and #472, and that behavior was to prevent a router pod from serving traffic until it had CRLs, not to prevent a router pod from serving traffic if it had outdated CRLs.
It is generally less bad to continue using outdated CRLs, rather than to stop serving traffic entirely when refresh fails.

This does make me realize that we need a Prometheus metric and an alert when refresh fails for a prolonged period. Failure to refresh has two nasty implications:

Router pods are using outdated CRLs.
The next rolling update of the router deployment (for an upgrade, configuration change, or whatever reason) could get stuck as presumably the new pods would fail on initial synch.

pkg/router/crl/crl.go

rfredette · 2024-09-17T15:41:14Z

Once the router pod has done the initial synch, we want readiness checks to pass even if refresh fails

Ack, I'll update this so that the CRLs readiness check is only used for the initial sync.

This does make me realize that we need a Prometheus metric and an alert when refresh fails for a prolonged period.

That make sense, although I think that's out of the scope of this bug. I'll open a jira issue for that.

rfredette · 2024-09-24T16:54:17Z

e2e-upgrade failed during bootstrap.

/test e2e-upgrade

openshift-bot · 2024-12-24T01:00:46Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

lihongan · 2024-12-24T01:11:25Z

/remove-lifecycle stale

candita · 2025-02-05T17:28:18Z

/assign @alebedev87

alebedev87

LGTM, just a nit question.

alebedev87 · 2025-02-07T15:59:24Z

pkg/router/crl/crl.go

+	return crlsUpdated
+}
+
+func SetCRLsUpdated(value bool) {


Would it make sense to remove the possibility to set updated to false? Taking into account the fact that we want to probe the fully present CRL list only at startup.

Suggested change

func SetCRLsUpdated(value bool) {

func SetCRLsUpdated() {

I think that's reasonable. I've updated this to include that change 👍

openshift-bot · 2025-07-31T09:00:24Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2025-08-31T00:30:16Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

alebedev87 · 2025-09-01T12:44:06Z

/remove-lifecycle rotten

This fixes OCPBUGS-29894

openshift-ci · 2025-10-07T21:40:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-10-08T01:11:20Z

@rfredette: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-serial	`4b7b65f`	link	true	`/test e2e-aws-serial`
ci/prow/e2e-agnostic	`c81119b`	link	true	`/test e2e-agnostic`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-bot · 2026-01-06T09:01:14Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2026-02-06T00:30:43Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci bot requested review from frobware and gcs278 May 13, 2024 19:15

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 13, 2024

openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label May 13, 2024

openshift-ci bot requested a review from lihongan May 13, 2024 19:19

rfredette mentioned this pull request May 13, 2024

OCPBUGS-29894: Add test verifying that routers without the required CRLs are marked not ready openshift/cluster-ingress-operator#1053

Closed

openshift-ci bot assigned Miciah Jun 5, 2024

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2024

rfredette force-pushed the ocpbugs-29894 branch from 645c9ea to e6243d4 Compare September 16, 2024 18:16

Miciah reviewed Sep 16, 2024

View reviewed changes

pkg/router/crl/crl.go Outdated Show resolved Hide resolved

rfredette force-pushed the ocpbugs-29894 branch from e7b4fc2 to 4b7b65f Compare September 20, 2024 17:44

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2024

openshift-ci bot assigned alebedev87 Feb 5, 2025

alebedev87 reviewed Feb 7, 2025

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2025

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 31, 2025

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 1, 2025

Check if CRLs are downloaded when determining ready status

c81119b

This fixes OCPBUGS-29894

rfredette force-pushed the ocpbugs-29894 branch from 4b7b65f to c81119b Compare October 7, 2025 21:39

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2026

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 6, 2026

Conversation

rfredette commented May 13, 2024

Uh oh!

openshift-ci-robot commented May 13, 2024

Uh oh!

rfredette commented May 13, 2024

Uh oh!

openshift-ci-robot commented May 13, 2024

Uh oh!

rfredette commented Jun 5, 2024

Uh oh!

Miciah commented Jun 5, 2024

Uh oh!

openshift-bot commented Sep 4, 2024

Uh oh!

lihongan commented Sep 4, 2024

Uh oh!

Miciah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rfredette commented Sep 17, 2024

Uh oh!

rfredette commented Sep 24, 2024

Uh oh!

openshift-bot commented Dec 24, 2024

Uh oh!

lihongan commented Dec 24, 2024

Uh oh!

candita commented Feb 5, 2025

Uh oh!

alebedev87 left a comment

Choose a reason for hiding this comment

Uh oh!

alebedev87 Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfredette Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-bot commented Jul 31, 2025

Uh oh!

openshift-bot commented Aug 31, 2025

Uh oh!

alebedev87 commented Sep 1, 2025

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

openshift-ci bot commented Oct 8, 2025

Uh oh!

openshift-bot commented Jan 6, 2026

Uh oh!

openshift-bot commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

alebedev87 Feb 7, 2025 •

edited

Loading