Skip to content

Conversation

@saewoni
Copy link

@saewoni saewoni commented Jan 28, 2026

… prevent race condition

What this PR does / why we need it:
Immediately after networkctl reload, DNS settings may not have propagated from systemd-networkd (via DHCP) to systemd-resolved yet. As a result, /run/systemd/resolve/resolv.conf can still reflect the previous upstream DNS servers when replace_azurednsip_in_corefile runs.

This happens because networkctl reload only triggers a reload request over D-Bus; it does not wait for systemd-networkd to finish reprocessing configuration, re-acquire DHCP leases, or update systemd-resolved.

Which issue(s) this PR fixes:

Fixes #
to test: shellspec --shell bash --format d spec/parts/linux/cloud-init/artifacts/localdns_spec.sh

Copilot AI review requested due to automatic review settings January 28, 2026 23:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition that occurs after calling networkctl reload to update DNS configuration. Previously, the code would proceed immediately after the reload command without waiting for systemd-resolved to actually update the /run/systemd/resolve/resolv.conf file, potentially causing subsequent operations to work with stale DNS information.

Changes:

  • Added wait_for_dns_config_applied() function that polls resolv.conf to verify DNS configuration changes have been applied
  • Integrated the wait function after both networkctl reload calls to ensure DNS changes are complete before proceeding
  • Added comprehensive test coverage for the new function

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
parts/linux/cloud-init/artifacts/localdns.sh Implements the new wait_for_dns_config_applied() function and integrates it after networkctl reload calls in disable_dhcp_use_clusterlistener and cleanup_iptables_and_dns
spec/parts/linux/cloud-init/artifacts/localdns_spec.sh Adds comprehensive test coverage for wait_for_dns_config_applied with tests for success cases, timeout cases, edge cases, and partial IP matching

@saewoni saewoni marked this pull request as ready for review January 29, 2026 00:46
@saewoni saewoni changed the title fix(localdns): wait for resolv.conf update after networkctl reload to… fix(localdns): wait for resolv.conf update after networkctl reload to prevent race condition Jan 29, 2026
Update log messages to use Error: prefix when wait_for_dns_config_applied fails, since these are failure conditions (return 1), not warnings. Updated corresponding test assertion.
Copilot AI review requested due to automatic review settings January 29, 2026 21:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings January 29, 2026 22:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

# Arguments:
# $1: expected_dns_ip - The DNS IP that should appear in resolv.conf.
# $2: should_contain - "true" if the IP should be present, "false" if it should be absent.
# $3: max_wait_seconds - Maximum time to wait for the change (default: 10).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 10 seconds necessary?

}

# Disable DNS provided by DHCP and point the system at localdns.
disable_dhcp_use_clusterlistener() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not wait inside here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait right before replacing vnet dns ip

Copy link
Member

@yewmsft yewmsft Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call wait before here

~# Replace AzureDNSIP in corefile with VNET DNS ServerIPs.
~# ---------------------------------------------------------------------------------------------------------------------
replace_azurednsip_in_corefile || exit $ERR_LOCALDNS_FAIL

local current_dns
current_dns=$(awk '/^nameserver/ {print $2}' "$RESOLV_CONF" 2>/dev/null | paste -sd' ')

if [ "$should_contain" = "true" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to do this. You should just check if the resolv.conf nameserver is still LOCALDNS_NODE_LISTENER_IP. And you only need to wait for this result at start up time. Don't wait for this at shut down.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shut down can just leave the the DHCP running async, because you don't need to read the value from resolv.conf when shutting down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants