From a37af540a5a52bce056b0032f8d5c3fbb7b4186f Mon Sep 17 00:00:00 2001 From: Ed Santiago Date: Mon, 8 May 2023 09:31:11 -0600 Subject: [PATCH] Disable systemd resolved Fixes flake: 'dial tcp: lookup cdn03.quay.io: no such host' Okay, doesn't actually fix as in _fix_, just fix as in "sweep it under the rug". The actual bug is in systemd-resolved, or in the quay.io/cloudflare.net DNS nameservers, or in the weird specific setup for cdn03 (it's a CNAME, compared to cdn01/02 which are A). Maybe a combination of all of the above. I don't care; I just want the flakes gone. I realize that this makes our testing environment different from default Fedora, and am okay with that because I suspect many Fedora users disable systemd-resolved as SOP. Signed-off-by: Ed Santiago --- .../files/disable_systemd_resolved.sh | 41 +++++++++++++++++++ .../disable_systemd_resolved/tasks/main.yml | 3 ++ tests/test_podman.yml | 1 + 3 files changed, 45 insertions(+) create mode 100755 tests/roles/disable_systemd_resolved/files/disable_systemd_resolved.sh create mode 100644 tests/roles/disable_systemd_resolved/tasks/main.yml diff --git a/tests/roles/disable_systemd_resolved/files/disable_systemd_resolved.sh b/tests/roles/disable_systemd_resolved/files/disable_systemd_resolved.sh new file mode 100755 index 0000000..f5cb4da --- /dev/null +++ b/tests/roles/disable_systemd_resolved/files/disable_systemd_resolved.sh @@ -0,0 +1,41 @@ +#!/bin/bash +# +# Excerpted from https://github.com/containers/automation_images/blob/main/systemd_banish.sh +# +# Early 2023: https://github.com/containers/podman/issues/16973 +# +# We see countless instances of "lookup cdn03.quay.io" flakes. +# Disabling the systemd resolver has completely resolved those, +# from multiple flakes per day to zero in a month. +# +# Opinions differ on the merits of systemd-resolve, but the fact is +# it breaks our CI testing. Kill it. +nsswitch=/etc/authselect/nsswitch.conf +if [[ -e $nsswitch ]]; then + if grep -q -E 'hosts:.*resolve' $nsswitch; then + echo "Disabling systemd-resolved" + sed -i -e 's/^\(hosts: *\).*/\1files dns myhostname/' $nsswitch + systemctl disable --now systemd-resolved + rm -f /etc/resolv.conf + + # NetworkManager may already be running, or it may not.... + systemctl start NetworkManager + sleep 1 + systemctl restart NetworkManager + + # ...and it may create resolv.conf upon start/restart, or it + # may not. Keep restarting until it does. (Yes, I realize + # this is cargocult thinking. Don't care. Not worth the effort + # to diagnose and solve properly.) + retries=10 + while ! test -e /etc/resolv.conf;do + retries=$((retries - 1)) + if [[ $retries -eq 0 ]]; then + echo "Timed out waiting for resolv.conf" >&2 + echo "...gonna try continuing. Expect failures." >&2 + fi + systemctl restart NetworkManager + sleep 5 + done + fi +fi diff --git a/tests/roles/disable_systemd_resolved/tasks/main.yml b/tests/roles/disable_systemd_resolved/tasks/main.yml new file mode 100644 index 0000000..7c77540 --- /dev/null +++ b/tests/roles/disable_systemd_resolved/tasks/main.yml @@ -0,0 +1,3 @@ +--- +- name: disable systemd resolved + script: ./disable_systemd_resolved.sh diff --git a/tests/test_podman.yml b/tests/test_podman.yml index 9674dea..0b4c41b 100644 --- a/tests/test_podman.yml +++ b/tests/test_podman.yml @@ -7,6 +7,7 @@ - artifacts: ./artifacts rootless_user: testuser roles: + - role: disable_systemd_resolved - role: rootless_user_ready tasks: