Summary:
I started out wanting to fix an issue I noticed today where
graphical upgrade tests were failing because they didn't wait
for the graphical login screen properly; the test was sitting
at the 'full Fedora logo' state of plymouth for a long time,
so the current boot_to_login_screen's wait_still_screen was
triggered by it and the function wound up failing on the
assert_screen, because it was still some time before the real
login screen appeared.
So I tweaked the boot_to_login_screen implementation to work
slightly differently (look for a login screen match, *then* -
if we're dealing with a graphical login - wait_still_screen
to defeat the 'old GPU buffer showing login screen' problem
and assert the login screen again). But while working on it,
I figured we really should consolidate all the various places
that handle the bootloader -> login, we were doing it quite
differently in all sorts of different places. And as part of
that, I converted the base tests to use POSTINSTALL (and thus
go through the shared _wait_login tests) instead of handling
boot themselves. As part of *that*, I tweaked main.pm to not
require all POSTINSTALL tests have the _postinstall suffix on
their names, as it really doesn't make sense, and renamed the
tests.
Test Plan: Run all tests, see if they work.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D1015
Summary:
by waiting for the bootloader in _boot_to_anaconda rather than
_console_wait_login, we can ensure that we use the anaconda
post-fail hook and thus get logs uploaded when a kickstart
install fails.
Test Plan:
Run a kickstart install test that fails and check
anaconda logs get uploaded. Then run one that works and make
sure it...still works.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D1005
Summary:
again, added as a non-fatal module for realmd_join_cockpit as
it's convenient to do it here. Also abstract a couple of ipa
bits into a new exporter package in the style of SUSE's
mm_network, rather than using ill-fitting class inheritance as
we have before - we should probably convert our existing class
based stuff to work this way.
Also a few minor tweaks and clean-ups of the other tests:
The path in console_login() where we detect login of a regular
user when we want root or vice versa and log out was actually
broken because it would 'wait' for the result of the 'exit'
command, which obviously doesn't work (as it relies on running
another command afterwards, and we're no longer at a shell).
This commit no longer actually uses that path, but I spotted
the bug with an earlier version of this which did, and we may
as well keep the fix.
/var/log/lastlog is an apparently-extremely-large sparse file.
A couple of times it seemed to cause tar to run very slowly
while creating the /var/log archive for upload on failure. It's
no use for diagnosing bugs, so we may as well exclude it from
the archive.
I caught cockpit webUI login failing one time when testing the
test, so threw in a wait_still_screen before starting to type
the URL, as we have for the FreeIPA webUI.
I also caught a timing issue with the openQA webUI policy add
step; the test flips from the Users screen to the HBAC screen
then clicks the 'add' button, but there's actually an identical
'add' button on *both* screens, so it could wind up trying to
click the one on the Users screen instead, if the web UI took
a few milliseconds to switch. So we throw in a needle match to
make sure we're actually on the HBAC screen before clicking the
button.
We make the freeipa_webui test a 'milestone' so that if the
new test fails, restoring to the last-known-good milestone
doesn't take so long; it actually seems like openQA can get
confused and try to cancel the test if restoring the milestone
takes a *really* long time, and wind up with a zombie qemu
process, which isn't good. This seems to avoid that happening.
Test Plan:
In the simple case, just run all the FreeIPA-related
tests on Fedora 24 (as Rawhide is broken) and make sure they all
work properly. To get a bit more advanced you can throw in an
`assert_script_run 'false'` in either of the non-fatal tests to
break it and make sure things go properly when that happens (the
last milestone should be restored - which should be right after
freeipa_webui, sitting at tty1 - and run properly; things are
set up so each test starts with root logged in on tty1).
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D935
Summary:
This requires us to handle decryption each time we reboot in
the upgrade process, so factor that little block out into the
base class so we don't have to keep pasting it. It's also a
bit tricky to integrate into the 'catch a boot loop' code we
have to deal with #1349721, but I think this should work. There
is a matching openqa_fedora_tools diff to generate the disk
image.
Test Plan:
Run the tests, check that they work, run the other
upgrade and encrypted install tests and check they still work
properly too.
Reviewers: garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D922
Summary:
this is following a SUSE model for tests where we need a server
end but don't want setting up the server to constitute a real
test in itself, we want it to be stable. The 'support_server'
test just boots a pre-built (by createhdds) disk image, sets up
networking, and runs the iSCSI server.
To run the iSCSI test we need to handle networking config in
anaconda (or we would need to set the support server up as a
DHCP server, which may be worth considering), so this adds that.
We also need to be able to specify the target device for a
volume in custom partitioning, so this adds that too.
Test Plan:
Build the necessary support server disk image (use
D883), then run the test and make sure it works. Also make sure
all other tests continue to work.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D884
Summary:
This requires a few other changes:
* turn clone_host_resolv into clone_host_file, letting you clone
any given host file (cloning /etc/hosts seems to make both
server deployment and client enrolment faster/more reliable)
* allow loading of multiple POSTINSTALL tests (so we can share
the freeipa_client_postinstall test). Note this is compatible,
existing uses will work fine
* move initial password change for the IPA test users into the
server deployment test (so the client tests don't conflict over
doing that)
* add GRUB_POSTINSTALL, for specifying boot parameters for boot of
the installed system, and make it work by tweaking _console_wait
_login (doesn't work for _graphical_wait_login yet, as I didn't
need that)
* make the static networking config for tap tests into a library
function so the tests can share it
* handle ABRT problem dirs showing up in /var/spool/abrt as well
as /var/tmp/abrt (because the enrol attempt hits #1330766 and
the crash report shows up in /var/spool/abrt, don't ask me why
the difference, I just work here)
* specify the DNS servers from the worker host's resolv.conf as
the forwarders for the FreeIPA server when deploying it; if we
don't do this, rolekit defaults to using the root servers as
forwarders(!) and thus we get the public, not phx2-appropriate,
results for e.g. mirrors.fedoraproject.org, some of which the
workers can't reach, so PackageKit package install always fails
(boy, was it fun figuring THAT mess out)
Even after all that, the test still doesn't actually pass, but
I'm reasonably confident this is because it's hitting actual bugs,
not because it's broken. It runs into #1330766 nearly every time
(I think I saw *one* time the enrolment actually succeeded), and
seems to run into a subsequent bug I hadn't seen before when
trying to work around that by trying the join again (see
https://bugzilla.redhat.com/show_bug.cgi?id=1330766#c37 ).
Test Plan:
Run the test, see what happens. If you're really lucky,
it'll actually pass. But you'll probably run into #1330766#c37,
I'm mostly posting for comment. You'll need a tap-capable openQA
instance to test this.
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D880
we only 'double' the password for the user account, not for root
this is a trivial fix so just pushing it out to get cyrillic
test working at last (hopefully)
Summary:
Requires new needles and test suite and job template, plus a
few tweaks to handle 'switched' keyboard layouts (so we use the
switched layout in the username and password).
Test Plan:
Run the test and see that it...fails. But that's OK!
It's a genuine bug: RHBZ #1333998 . At least make sure it gets
to that point and no other tests have broken and all the needles
look sane.
Reviewers: garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D846
Summary:
These require openQA tap networking to allow the server and
client boxes to communicate, and require masquerading (NAT) so
the server at least can reach a repository (dnf/rolekit really,
really do not want to work without a repo connection).
They use the 'parallel' test support to have the server deploy
run first while the client enrol test waits at the grub menu
until the server is done before it goes ahead.
This is all deployed and working on stg. The really tricky bit
was getting all the openvswitch and firewall config right in
ansible.
We *could* do the server deploy test as a follow-on from the
default install test to save the install, but then we'd have to
teach it to change the hostname and set up static networking
post-install. I'm not sure if it's worth doing that.
This requires the corresponding openqa_fedora_tools commit that
adds the hard disks (containing the kickstarts - it's possible
to get them from remote during install, but we have to set up
name resolution or hard code the IP of the server).
Test Plan:
Deploy this and the openqa_fedora_tools commit,
generate the disks, configure the networking (good luck! See
the docs in openqa_fedora_tools) and see if you can run the
tests. If you're using Docker, uh...sorry. You somehow need to
set things up so the workers can use tap interfaces that can
talk to each other and are NATed to the outside world. Have fun.
I can talk you through it on IRC...
Reviewers: jskladan, garretraziel
Reviewed By: garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D831
With the arrival of Pungi 4, the scheduler is no longer using
fedfind-provided BUILD and FLAVOR values, but ones derived from
Pungi properties. BUILD is now simply the Pungi compose_id.
FLAVOR is produced by joining the Pungi variant, type, and
format with '-' characters as the separators.
Pungi, unfortunately, does not treat 'Rawhide' as a release, it
synthesizes a release number for Rawhide composes and places
that in the compose ID. To cope with that, for now, the
scheduler will set RAWHIDE to '1' if the compose is a Rawhide
one. As we have to adapt all places where we parse the release
in any case, this commit consolidates them into a fedorabase
subroutine.
For the one place where we also used to parse the 'milestone'
from fedfind, there is a placeholder get_milestone subroutine
which currently returns an empty string, as I don't yet have a
good handle on how to draw the kinds of distinctions fedfind
mapped to 'milestone' from Pungi metadata.
Summary:
This is a bit icky, but it's the easiest way to solve a problem
I've seen a few times, the latest case being
https://openqa.stg.fedoraproject.org/tests/1664 . In that test,
_console_wait_login logs in to tty1 as user, then uefi_
postinstall wants to switch to tty3 and log in as root. When
it does that, sometimes the check_screen loop in console_login
gets hit before the display has actually switched from tty1 to
tty3, so everything gets out of sync.
An alternative would be to have root_console check that it's
either logged in or at the correct tty before handing off to
console_login, but that starts duplicating stuff, and it breaks
in the case the target tty is logged in as a user and the login
prompt is no longer visible...
Test Plan:
Check all tests run as normal, and maybe run UEFI
tests a few times to see that the bug no longer happens (but
it's hard to reliably trigger it anyway).
Reviewers: garretraziel, jskladan
Reviewed By: jskladan
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D683
Summary:
Root console in anaconda got broken by RHBZ #1222413 - no
shell on tty2. Decided to clean up console use in general as
part of fixing it.
This creates a class 'fedorabase' and has 'anacondalog' and
'fedoralog' both inherit from it. boot_to_login_screen is
moved there (as it seems appropriate) and it has a new
method, console_login, which basically handles 'get me a
shell on a console': if we're already at one it returns,
if not it'll type the user name and the password *if
necessary* (sometimes it's not) and return once it sees a
prompt. It takes a hash of named parameters for user,
password and 'check', which is whether it should die if it
fails to reach a console or not (some users don't want it
to).
anacondalog and fedoralog both get 'root_console' methods
which do something appropriate and then call
console_login; both have a hash of named parameters,
anacondalog's version only bothers with 'check', while
fedoralog's also accepts 'tty' to pick the tty to use.
This also adjusts all things which try to get to a console
prompt to use either root_console or console_login as
appropriate.
It also tweaks the needle tags a bit, drops some unneeded
needles, and adds a new 'user console prompt' needle; we
really just need two versions of the root prompt needle
and two of the user prompt needle (one for <F23, one for
F23+ - the console font changed in F23, and the @ character
at least doesn't match between the two). I think we still
need the <F23 case for upgrade tests, for now.
Test Plan:
Do a full test run and see that more tests
succeed. I've done a run on happyassassin with a hack to
workaround the SELinux issue for interactive installs,
and the results look good. I also fiddled about a bit to
test some different cases, like forcing a failure in a
live test to test post_fail_hook (and hence root_console)
in that scenario, and forcing failures after some console
commands had been run to check that it DTRT when we've
already reached a console, etc.
Reviewers: jskladan, garretraziel
Reviewed By: jskladan, garretraziel
Subscribers: tflink
Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D462