To work around #1999321, we'll disable dnssec validation on the
FreeIPA server when doing an upgrade to Fedora 35 or later.
This sucks but I can't find a better option.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Using .local is apparently Bad Form because it's reserved for
mDNS. However there doesn't appear to be any particularly Good
Form for what to call a test domain you never want to exist
outside of a closed system, apparently. Sigh. Let's try this.
Includes a bump to disk_ks version because the kickstarts on
that image also need to have this change applied.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The weird bug turned out to be caused by an internal DNS zone
in the new infra not being signed:
https://pagure.io/fedora-infrastructure/issue/9411
This is now resolved, so we can drop the workaround.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I noticed today that if we deploy FreeIPA with dnssec validation
enabled, dnf can't resolve dl.fedoraproject.org afterwards, which
is a problem because it means we wind up falling through to
random mirrors for metadata and package download once the server
is deployed, which can be slow and give old packages. This seems
to be why the server upgrade test on F33 is sometimes failing
because we get an older FreeIPA package on upgrade, even though
the newer one has been stable for a week.
It's difficult to pin down exactly where this bug is and fix it,
I've mailed some folks to try and work it out, but until that's
figured out, let's just disable dnssec validation.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This does some of the things suggested by cheimes in
https://bugzilla.redhat.com/show_bug.cgi?id=1880628#c24 . It
seems to make the replica tests work with resolved, still work
with pre-F33 resolving, and not break anything. Also remove the
workaround to disable resolved if it's running, as we can now
work with it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Having systemd-resolved in use seems to cause problems for
FreeIPA servers:
https://bugzilla.redhat.com/show_bug.cgi?id=1880628
until the scripts are enhanced to do this or something, let's
disable it before server/replica deployment.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This was necessary for debugging the FreeIPA 4.8 pre-release
update bug, so let's have it for all runs, just in case.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This adds a set of jobs to test FreeIPA replication. We deploy
a server, deploy a replica of that server, then enrol a client
against the replica and run the client tests.
At first I was planning to add the replica testing into the
main set of FreeIPA tests, but the test ordering/blocking (via
mutexes and barriers and what-have-you) just turns into a big
nightmare that way. This way seems rather simpler to deal with.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We'd really like to know if FreeIPA is working aside from this
crasher bug, so let's workaround it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We kind of want to know if FreeIPA is working aside from this
known bug, so let's treat it as a soft failure and work around
it. But only for Rawhide, not for F27/F28 updates tests.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Rolekit is going away. At least for the F29 cycle, though, we
still want to test basically the same functionality. This ports
the 'domain controller role' test to use ipa-server-install
directly rather than rolectl.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Modular composes don't include these packages, but we need them
to run the web UI tests for FreeIPA and Cockpit. This is the
most reasonable hack I can come up with for now: just use a
non-modular fedora repo to source these packages when doing
Modular compose testing.
If we ever reach an all-Modular future, these packages should
be available in Modular composes I guess, but for now they are
not.
Summary:
This adds an upgrade variant of the FreeIPA tests, with only
the simplest client enrolment (sssd) for now. The server test
starts from the N-1 release and deploys the domain controller
role. The client test similarly starts from the N-1 release
and, when the server is deployed, enrols as a domain client.
Then the server upgrades itself, while the client waits (as the
server is its name server). Then the client upgrades itself,
while the server does some self-checks. The server then waits
for the client to do its checks before decommissioning itself,
as usual. So, summary: *deployment* of both server and client
occurs on N-1, then both are upgraded, then the actual *checks*
occur on N.
In my testing, this all more or less works, except the role
decommission step fails. This failure seems to be a genuine one
so far as I can tell; I intend to file a bug for it soon.
Test Plan:
Run the new tests, check they work. Run the existing
FreeIPA tests (both the compose and the update variants), check
they both behave the same.
Reviewers: jsedlak, jskladan
Reviewed By: jsedlak
Subscribers: tflink
Differential Revision: https://phab.qa.fedoraproject.org/D1204
It's not really a good idea to have the comments that explain
the test_flags in *every* test, because they can go stale and
then we either have to live with them being old or update them
all. Like, now. So let's just take 'em all out. There's always
a reference in the openQA and os-autoinst docs, and those get
updated faster.
More importantly, add the new `ignore_failure` flag to relevant
tests - all the tests that don't have the 'important' or
'fatal' flag at present. Upstream killed the 'important' flag
(making all tests 'important' by default), I got it replaced
with the 'ignore_failure' flag, we now need to explicitly mark
all modules we want the 'ignore_failure' behaviour for.
Summary:
This adds an entirely new workflow for testing distribution
updates. The `ADVISORY` variable is introduced: when set,
`main.pm` will load an early post-install test that sets up
a repository containing the packages from the specified update,
runs `dnf -y update`, and reboots. A new templates file is
added, `templates-updates`, which adds two new flavors called
`updates-server` and `updates-workstation`, each containing
job templates for appropriate post-install tests. Scheduler is
expected to post `ADVISORY=(update ID) HDD_1=(base image)
FLAVOR=updates-(server|workstation)`, where (base image) is one
of the stable release base disk images produced by `createhdds`
and usually used for upgrade testing. This will result in the
appropriate job templates being loaded.
We rejig postinstall test loading and static network config a
bit so that this works for both the 'compose' and 'updates' test
flows: we have to ensure we bring up networking for the tap
tests before we try and install the updates, but still allow
later adjustment of the configuration. We take advantage of the
openQA feature that was added a few months back to run the same
module multiple times, so the `_advisory_update` module can
reboot after installing the updates and the modules that take
care of bootloader, encryption and login get run again. This
looks slightly wacky in the web UI, though - it doesn't show the
later runs of each module.
We also use the recently added feature to specify `+HDD_1` in
the test suites which use a disk image uploaded by an earlier
post-install test, so the test suite value will take priority
over the value POSTed by the scheduler for those tests, and we
will use the uploaded disk image (and not the clean base image
POSTed by the scheduler) for those tests.
My intent here is to enhance the scheduler, adding a consumer
which listens out for critpath updates, and runs this test flow
for each one, then reports the results to ResultsDB where Bodhi
could query and display them. We could also add a list of other
packages to have one or both sets of update tests run on it, I
guess.
Test Plan:
Try a post something like:
HDD_1=disk_f25_server_3_x86_64.img DISTRI=fedora VERSION=25
FLAVOR=updates-server ARCH=x86_64 BUILD=FEDORA-2017-376ae2b92c
ADVISORY=FEDORA-2017-376ae2b92c CURRREL=25 PREVREL=24
Pick an appropriate `ADVISORY` (ideally, one containing some
packages which might actually be involved in the tests), and
matching `FLAVOR` and `HDD_1`. The appropriate tests should run,
a repo with the update packages should be created and enabled
(and dnf update run), and the tests should work properly. Also
test a regular compose run to make sure I didn't break anything.
Reviewers: jskladan, jsedlak
Reviewed By: jsedlak
Subscribers: tflink
Differential Revision: https://phab.qa.fedoraproject.org/D1143
Summary:
This adds a couple of new exporter modules, renames main_common
to utils (this is a better name: openSUSE's main_common is
functions used in main.pm, utils is what they call their module
full of miscellaneous commonly-used functions), and moves a
bunch of utility functions that were previously needlessly
implemented as instance methods in base classes into the
exporter modules. That means we can get rid of all the annoying
$self-> syntax for calling them.
We get rid of `fedorabase` entirely, as it's no longer useful
for anything. Other base classes keep the 'standard' methods
(like `post_fail_hook`) and methods which actually need to be
methods (like `root_console`, whose behaviour is different in
anacondatest and installedtest).
Test Plan:
Do a full test suite run and check everything lines
up. There should be no functional differences from before at all,
this is just a re-org.
Reviewers: jskladan, garretraziel_but_actually_jsedlak_who_uses_stupid_nicknames
Reviewed By: garretraziel_but_actually_jsedlak_who_uses_stupid_nicknames
Subscribers: tflink
Differential Revision: https://phab.qa.fedoraproject.org/D1080
Summary:
I started out wanting to fix an issue I noticed today where
graphical upgrade tests were failing because they didn't wait
for the graphical login screen properly; the test was sitting
at the 'full Fedora logo' state of plymouth for a long time,
so the current boot_to_login_screen's wait_still_screen was
triggered by it and the function wound up failing on the
assert_screen, because it was still some time before the real
login screen appeared.
So I tweaked the boot_to_login_screen implementation to work
slightly differently (look for a login screen match, *then* -
if we're dealing with a graphical login - wait_still_screen
to defeat the 'old GPU buffer showing login screen' problem
and assert the login screen again). But while working on it,
I figured we really should consolidate all the various places
that handle the bootloader -> login, we were doing it quite
differently in all sorts of different places. And as part of
that, I converted the base tests to use POSTINSTALL (and thus
go through the shared _wait_login tests) instead of handling
boot themselves. As part of *that*, I tweaked main.pm to not
require all POSTINSTALL tests have the _postinstall suffix on
their names, as it really doesn't make sense, and renamed the
tests.
Test Plan: Run all tests, see if they work.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D1015
Summary:
we have a long-standing problem with all the tests that hit
the repositories. The tests are triggered as soon as a compose
completes. At this point in time, the compose is not synced to
the mirrors, where the default 'fedora' repo definition looks;
the sync happens after the compose completes, and there is also
a metadata sync step that must happen after *that* before any
operation that uses the 'fedora' repository definition will
actually use the packages from the new compose. Thus all net
install tests and tests that installed packages have been
effectively testing the previous compose, not the current one.
We have some thoughts about how to fix this 'properly' (such
that the openQA tests wouldn't have to do anything special,
but their 'fedora' repository would somehow reflect the compose
under test), but none of them is in place right now or likely
to happen in the short term, so in the mean time this should
deal with most of the issues. With this change, everything but
the default_install tests for the netinst images should use
the compose-under-test's Everything tree instead of the 'fedora'
repository, and thus should install and test the correct
packages.
This relies on a corresponding change to openqa_fedora_tools
to set the LOCATION openQA setting (which is simply the base
location of the compose under test).
Test Plan:
Do a full test run, check (as far as you can) tests run sensibly
and use appropriate repositories.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D989
Summary:
Except when running on the pre-upgrade release in the upgrade
tests (where GPG check should always be OK).
Currently we always need to use --nogpgcheck on Rawhide, and we
must also use it on Branched prior to the Bodhi activation
point. At present we don't really have any simple way to know
when the Bodhi activation point has kicked in. We could assume
that it's safe to do GPG checking for 'candidate' (not nightly)
composes, but even that isn't 100% safe and isn't really the
*right* thing to do. So I think for now it's best to just always
use --nogpgcheck , until we come up with a decent way to check
for Bodhi enablement, or releng figures things out so we can
rely on packages being signed in Rawhide and in Branched before
Bodhi enablement.
Test Plan:
Check the tests all still run, make sure I didn't
miss any dnf calls.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D964
Summary:
again, added as a non-fatal module for realmd_join_cockpit as
it's convenient to do it here. Also abstract a couple of ipa
bits into a new exporter package in the style of SUSE's
mm_network, rather than using ill-fitting class inheritance as
we have before - we should probably convert our existing class
based stuff to work this way.
Also a few minor tweaks and clean-ups of the other tests:
The path in console_login() where we detect login of a regular
user when we want root or vice versa and log out was actually
broken because it would 'wait' for the result of the 'exit'
command, which obviously doesn't work (as it relies on running
another command afterwards, and we're no longer at a shell).
This commit no longer actually uses that path, but I spotted
the bug with an earlier version of this which did, and we may
as well keep the fix.
/var/log/lastlog is an apparently-extremely-large sparse file.
A couple of times it seemed to cause tar to run very slowly
while creating the /var/log archive for upload on failure. It's
no use for diagnosing bugs, so we may as well exclude it from
the archive.
I caught cockpit webUI login failing one time when testing the
test, so threw in a wait_still_screen before starting to type
the URL, as we have for the FreeIPA webUI.
I also caught a timing issue with the openQA webUI policy add
step; the test flips from the Users screen to the HBAC screen
then clicks the 'add' button, but there's actually an identical
'add' button on *both* screens, so it could wind up trying to
click the one on the Users screen instead, if the web UI took
a few milliseconds to switch. So we throw in a needle match to
make sure we're actually on the HBAC screen before clicking the
button.
We make the freeipa_webui test a 'milestone' so that if the
new test fails, restoring to the last-known-good milestone
doesn't take so long; it actually seems like openQA can get
confused and try to cancel the test if restoring the milestone
takes a *really* long time, and wind up with a zombie qemu
process, which isn't good. This seems to avoid that happening.
Test Plan:
In the simple case, just run all the FreeIPA-related
tests on Fedora 24 (as Rawhide is broken) and make sure they all
work properly. To get a bit more advanced you can throw in an
`assert_script_run 'false'` in either of the non-fatal tests to
break it and make sure things go properly when that happens (the
last milestone should be restored - which should be right after
freeipa_webui, sitting at tty1 - and run properly; things are
set up so each test starts with root logged in on tty1).
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D935
Summary:
this is following a SUSE model for tests where we need a server
end but don't want setting up the server to constitute a real
test in itself, we want it to be stable. The 'support_server'
test just boots a pre-built (by createhdds) disk image, sets up
networking, and runs the iSCSI server.
To run the iSCSI test we need to handle networking config in
anaconda (or we would need to set the support server up as a
DHCP server, which may be worth considering), so this adds that.
We also need to be able to specify the target device for a
volume in custom partitioning, so this adds that too.
Test Plan:
Build the necessary support server disk image (use
D883), then run the test and make sure it works. Also make sure
all other tests continue to work.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D884
Summary:
This requires a few other changes:
* turn clone_host_resolv into clone_host_file, letting you clone
any given host file (cloning /etc/hosts seems to make both
server deployment and client enrolment faster/more reliable)
* allow loading of multiple POSTINSTALL tests (so we can share
the freeipa_client_postinstall test). Note this is compatible,
existing uses will work fine
* move initial password change for the IPA test users into the
server deployment test (so the client tests don't conflict over
doing that)
* add GRUB_POSTINSTALL, for specifying boot parameters for boot of
the installed system, and make it work by tweaking _console_wait
_login (doesn't work for _graphical_wait_login yet, as I didn't
need that)
* make the static networking config for tap tests into a library
function so the tests can share it
* handle ABRT problem dirs showing up in /var/spool/abrt as well
as /var/tmp/abrt (because the enrol attempt hits #1330766 and
the crash report shows up in /var/spool/abrt, don't ask me why
the difference, I just work here)
* specify the DNS servers from the worker host's resolv.conf as
the forwarders for the FreeIPA server when deploying it; if we
don't do this, rolekit defaults to using the root servers as
forwarders(!) and thus we get the public, not phx2-appropriate,
results for e.g. mirrors.fedoraproject.org, some of which the
workers can't reach, so PackageKit package install always fails
(boy, was it fun figuring THAT mess out)
Even after all that, the test still doesn't actually pass, but
I'm reasonably confident this is because it's hitting actual bugs,
not because it's broken. It runs into #1330766 nearly every time
(I think I saw *one* time the enrolment actually succeeded), and
seems to run into a subsequent bug I hadn't seen before when
trying to work around that by trying the join again (see
https://bugzilla.redhat.com/show_bug.cgi?id=1330766#c37 ).
Test Plan:
Run the test, see what happens. If you're really lucky,
it'll actually pass. But you'll probably run into #1330766#c37,
I'm mostly posting for comment. You'll need a tap-capable openQA
instance to test this.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D880
Summary:
These require openQA tap networking to allow the server and
client boxes to communicate, and require masquerading (NAT) so
the server at least can reach a repository (dnf/rolekit really,
really do not want to work without a repo connection).
They use the 'parallel' test support to have the server deploy
run first while the client enrol test waits at the grub menu
until the server is done before it goes ahead.
This is all deployed and working on stg. The really tricky bit
was getting all the openvswitch and firewall config right in
ansible.
We *could* do the server deploy test as a follow-on from the
default install test to save the install, but then we'd have to
teach it to change the hostname and set up static networking
post-install. I'm not sure if it's worth doing that.
This requires the corresponding openqa_fedora_tools commit that
adds the hard disks (containing the kickstarts - it's possible
to get them from remote during install, but we have to set up
name resolution or hard code the IP of the server).
Test Plan:
Deploy this and the openqa_fedora_tools commit,
generate the disks, configure the networking (good luck! See
the docs in openqa_fedora_tools) and see if you can run the
tests. If you're using Docker, uh...sorry. You somehow need to
set things up so the workers can use tap interfaces that can
talk to each other and are NATed to the outside world. Have fun.
I can talk you through it on IRC...
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D831