Commit Graph

67 Commits

Author SHA1 Message Date
Adam Williamson 29d1243d39 Apply some debugging hacks for the periodic failure bug
This is all in aid of trying to figure out what's going on in
https://pagure.io/fedora-qa/os-autoinst-distri-fedora/pull-request/312#comment-199540
 - sometimes the upgrade process just does not work, it seems
like the prepared update file gets wiped for some reason. This
is all trying to figure out exactly when that happens and
whether it's because we're running out of disk space.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-04-14 08:53:15 -07:00
Adam Williamson 5f702b0be8 Run update repo setup steps from a serial console
This is a surprisingly large change as we want to go back to
the console we were previously on after doing it. To do that we
need to know what console we were on, and to know *that*, we need
to port everything that currently uses (ctrl-)alt-fX to switch
consoles to use select_console instead.

This is primarily intended to make running setup_repos.py faster
when it has to download a lot of packages (as typing in hundreds
of package names is quite slow). But it actually makes the whole
thing faster, even when only downloading one or two packages.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-01-11 12:09:59 -08:00
Adam Williamson d012d31270 _post_fail_hook: only do advisory package checks when appropriate
Same conditions as used in main.pm to load the tests in the
normal flow. It makes no sense to do this on non-update tests,
or on the non-matching support server case.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-12-14 08:47:46 -08:00
Adam Williamson 6e9213f4b7 Drop use of _ADVISORY_REPO_DONE, always use file tests
This is safer if the advisory stuff was done on a previous test
run. Hilariously, this exposed a dumb mistake I made years ago
in installedtest.pm and never noticed before: the calls to
advisory_* at the bottom of that file are meant to be in the
post_fail_hook, but they weren't, which meant they got called
by the scheduler. This didn't cause any failures because the
first line caused them to return immediately based on a get_var
call (which it's OK to do in the scheduler), but changing it
to a script_run call (which it's *not* OK to do in scheduling)
caused all the tests to blow up immediately and confused me
*a lot* until I spotted this!

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-12-12 08:53:02 -08:00
Adam Williamson e45473103f Upload appropriate logs on ostree_build failure
Also use get_var("TEST") for installer_build - no point trying
to upload these logs for the other tests in the same flavor,
they won't be there.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-12-03 11:17:17 -08:00
Adam Williamson 173bdef5cb Revert "Try and get more detailed Firefox logs from startx"
This reverts commit f424e5bac5. It
never actually worked, and the Firefox-dying bug seems to have
magically gone away recently.
2022-11-25 12:13:08 -08:00
Adam Williamson 2c6e1ec76b post_fail_hook: give dnf -y install tar a bit longer
It seems to time out a lot on lab but not on prod, for some
reason. Let's just give it a little longer.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-11-25 11:12:46 -08:00
Adam Williamson f424e5bac5 Try and get more detailed Firefox logs from startx
When we run Firefox directly on X lately, we often hit a bug
where X just suddenly exits in the middle of doing stuff in
Firefox. I'm not sure if this is a bug in X or in Firefox (if
Firefox crashed, X would immediately exit). Let's see if this
helps get any info on what's going on with Firefox.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-09-09 13:08:47 -07:00
Adam Williamson 1a65993d36 Add a perltidy check and apply it to the entire codebase
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-07-28 14:38:38 -07:00
Adam Williamson e1ed9960f3 Let _console_avc_crash wait a bit longer for console login
On Rawhide update tests we often don't seem to get to the login
prompt in 10 seconds, so tweak the code a bit to let us specify
a timeout in root_console, and use 30 seconds here.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-06-06 18:34:44 -07:00
Adam Williamson a629020a9b post_fail_hook serial path: log the entire journal
When we're logging via the serial console when a test fails and
no network is available, we only log the journal from the current
boot. But we might well need to see messages from previous boots.
So let's just log the whole journal.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-05-11 16:13:52 -07:00
Adam Williamson f51a804357 Handle emergency console entry with no password
This happens on ARM disk images.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-10-28 15:21:03 -07:00
Adam Williamson 527cb63152 post_fail_hook: upload PostgreSQL init log if it exists
Why...why...WHY is this not in /var/log. Sheesh.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-08-25 16:18:32 -07:00
Adam Williamson f70b8ec943 post_fail_hook: don't try ctrl-c on serial console
Can't do that.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-07-07 17:35:32 -07:00
Adam Williamson ddea8e2169 post_fail_hook: hit ctrl-c if 'dnf -y installer tar' gets stuck
This happens in the iSCSI install test.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-07-06 16:38:12 -07:00
Adam Williamson dc7b7a7241 Great Needle Cleanup 2020
Remove a bunch of needles that have not been used for some time,
plus a few workarounds that are similarly stale.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-03-20 14:02:10 -07:00
Adam Williamson 1eb4e3dca5 Add perl syntax check test, add it to CI
Inspired by openQA's 01-compile-check-all.t, this adds a perl
test which checks the syntax of main.pm and all lib and test
files, and hooks it up to CI. Requires os-autoinst and
perl-Test-Strict.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-02-13 15:28:09 -08:00
Adam Williamson 48b6c9d3e9 Change name we use for virtio serial consoles
There is nothing inherently 'root'-y about these so it makes no
sense to prefix their names with 'root-'. And why change from
'console' to 'terminal' compared to the naming used in the
actual qemu command and the log files? It's just confusing.
Let's be consistent (except for using - instead of _ here...
but - is easier to type!)

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-11-20 08:39:28 -08:00
Lukáš Růžička b93a197c22 Enable Anaconda Text install via serial console.
This adds the Anaconda text installation test over
serial console and FIXES #115.
2019-11-19 22:54:55 -08:00
Adam Williamson 81387cc840 Revert "installedtest: try and fix log uploads for ostree installs"
This reverts commit 092d7dd9c3 and
the follow-up, they turn out not to be necessary or useful...
2019-08-16 11:29:53 -07:00
Adam Williamson fb439767f5 Fix ostree root
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-08-16 11:00:15 -07:00
Adam Williamson 092d7dd9c3 installedtest: try and fix log uploads for ostree installs
The log files are all under the ostree deploy root, the 'real'
system root has nothing useful. Try and find the deploy root
and prepend it to all the relevant commands if we're a CANNED
install.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-08-16 09:05:26 -07:00
Adam Williamson 0d38d3f292 FreeIPA: get verbose logs from BIND
This was necessary for debugging the FreeIPA 4.8 pre-release
update bug, so let's have it for all runs, just in case.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-05-01 18:59:52 -07:00
Adam Williamson 2754deb28a Don't do live build postinstall steps on the install tests
It's not going to work there. Use the test name not the flavor
name to match on.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-04-12 18:54:36 -07:00
Adam Williamson 945a93aa05 Get more info on failures caused by Firefox certificate issues
We're getting an intermittent case where FreeIPA tests fail
because of a web server certificate issue. Click 'Advanced' in
Firefox when this happens so we can get a bit more info on the
problem.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-02-21 12:16:42 -08:00
Adam Williamson 99302c6fd4 Add a live image build test for updates
Just like the installer image build test, only...it builds a live
image. This involves reimplementing quite a chunk of the Koji
livemedia task. Ah, well. Also involves rethinking the flavor
names a bit here, these seem...better.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-02-07 18:28:24 -08:00
Adam Williamson 12affb145f Add update tests to build and test a netinst image
This adds a test which builds a netinst image potentially with
the package(s) from the update, and uploads that image. It also
adds a test which runs a default install using that image. This
is intended to check whether the update breaks the creation or
use of install images; particularly this will let us test
anaconda etc. updates. We also update the minimal disk image
name, as we have to make it bigger to accommodate this test,
and making it bigger changes its name - the actual change to
the disk image itself is in createhdds. We also have to redo a
bunch of installer needles for F28 fonts, after I removed them
a month or so back...

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-01-18 08:24:44 -08:00
Adam Williamson 536f699013 Post fail: handle landing in dracut shell, upload rdsosreport
If a test fails to the dracut shell, we currently don't do
anything useful. This should recognize when that happens, and
upload rdsosreport.txt.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-01-16 11:24:06 -08:00
Adam Williamson 141f29c7cc fix syntax of advisory_check_nonmatching_packages in post-fail
You...can't do this like that.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2019-01-03 11:35:40 -08:00
Adam Williamson 4629e5b740 Fix console keyboard layout in installedtest post_fail_hook
This should fix log collection when a French or Japanese test
fails before the test itself would have done this.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-12-13 16:29:53 -08:00
Adam Williamson 12e103e3da Factor meat out of advisory_post and do it in postfail too
If an update test fails before reaching advisory_post, we don't
generate the 'what update packages were installed' and 'were
any update packages *not* installed when they should have been'
logs, but these may well be useful for diagnosing the failure -
so let's also do the same stuff there. Only let's not do it all
twice.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-12-12 22:17:29 -08:00
Adam Williamson 0639468de6 Use -l for systemctl status (to avoid ellipsization)
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-12-12 12:36:21 -08:00
Adam Williamson cb035c7737 Still fixing up this serial logging stuff
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-12-12 12:28:23 -08:00
Adam Williamson 95b227b97a Sigh, fix a syntax error in previous commit
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-12-12 11:57:53 -08:00
Adam Williamson 0f5281f389 In post_fail_hooks, try using serial line if no network
Sometimes we get a test failing because the SUT isn't connecting
to the network for some reason. In this case we never get any
logs, because `upload_logs` relies on being able to reacht at
least the worker host system via the network.

This attempts to detect when we can't ping the worker host, and
in that case, send some info out over the serial line instead.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-12-12 11:40:58 -08:00
Adam Williamson 23192baa5d Use better pattern for checking if there are coredumps
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-04-04 14:13:39 -07:00
Adam Williamson cdf33cc2ae Catch and upload coredumps in post-fail hook
We were doing this in a post-install test, but not on failures.
We need it to figure out why Firefox is crashing on aarch64...

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-04-04 12:10:33 -07:00
Adam Williamson 0915b857f9 More tweaking for this damn no root password spoke situation
Previous approach wouldn't work for tests that run after the
install test...let's just set a password from a chroot after
install completes. Don't really like this as it changes the
'real' install process a bit, but it's the least invasive short
term fix at least. We can maybe do something more sudo-y later
with a bit more thought.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-03-08 20:31:14 -08:00
Adam Williamson 83c32fe04e More tweaking for Workstation live scenario
It's really INSTALL_NO_USER, not USER_LOGIN='false'. Also, we
need to make root_console work with no root password, sigh.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2018-03-08 19:22:21 -08:00
Adam Williamson ba3a5152c1 Improve FreeIPA debug logging a bit
Committing without review as this is pretty trivial and I've
had it on staging for the last few days without issue. Just gets
us somewhat better info for debugging FreeIPA issues.
2017-03-16 12:36:33 -07:00
Adam Williamson 186678e98b Make log upload work when installed system hits emergency mode
Summary:
This is to handle cases like #1414904 , where the system boots
to emergency mode. We really need logs to try and debug this.

Test Plan:
Force a test to hit emergency mode somehow (right now
you can just run base_services_start on Rawhide over and over
until you hit #1414904, but there's probably an easier way to
do it, I think there's a systemd boot arg to tell it which target
to boot for e.g.) and check logs get uploaded. Also check this
doesn't break log upload for a 'normal' failure.

Reviewers: garretraziel_but_actually_jsedlak_who_uses_stupid_nicknames

Reviewed By: garretraziel_but_actually_jsedlak_who_uses_stupid_nicknames

Subscribers: tflink

Differential Revision: https://phab.qa.fedoraproject.org/D1103
2017-02-01 12:30:21 +01:00
Adam Williamson b67f604894 Move all remaining utility functions into exporter modules
Summary:
This adds a couple of new exporter modules, renames main_common
to utils (this is a better name: openSUSE's main_common is
functions used in main.pm, utils is what they call their module
full of miscellaneous commonly-used functions), and moves a
bunch of utility functions that were previously needlessly
implemented as instance methods in base classes into the
exporter modules. That means we can get rid of all the annoying
$self-> syntax for calling them.

We get rid of `fedorabase` entirely, as it's no longer useful
for anything. Other base classes keep the 'standard' methods
(like `post_fail_hook`) and methods which actually need to be
methods (like `root_console`, whose behaviour is different in
anacondatest and installedtest).

Test Plan:
Do a full test suite run and check everything lines
up. There should be no functional differences from before at all,
this is just a re-org.

Reviewers: jskladan, garretraziel_but_actually_jsedlak_who_uses_stupid_nicknames

Reviewed By: garretraziel_but_actually_jsedlak_who_uses_stupid_nicknames

Subscribers: tflink

Differential Revision: https://phab.qa.fedoraproject.org/D1080
2017-01-17 23:15:44 -08:00
Adam Williamson e0418b3328 Update and unwrap README, move function docs in-line
The README looks pretty ugly on Pagure. So let's unwrap it.
Let's also move the function docs into the source files. We're
much more likely to keep them up to date that way, I think. We
should probably change over to proper perl POD documentation at
some point, but comments in-line are OK for now I think.
2017-01-12 14:27:42 -08:00
Adam Williamson 332a955814 disable updates as well as updates-testing in repo_setup
This should solve all those annoying "Failed to synchronize
cache for repo 'updates'" failures we've had: there's no need
for the 'updates' repository to be enabled when we've decided
we want the `repo_setup` changes to be made, and having it
enabled causes problems when we run right after the Rawhide
compose completes. We hit the awkward period where the rawhide
repo has been synced but mirrormanager has not been updated
with the new metadata checksums, so mirrormanager rejects the
metadata from dl.fp.o and DNF has to go out and hit other
mirrors until it finds one which didn't sync yet. Since the
point of `repo_setup` is specifically to hack up the config so
we only use packages from the compose *anyway*, there's no
reason at all to worry about leaving 'updates' enabled and
nerfing it like we do 'fedora' and 'rawhide', we can just turn
it off.
2016-12-15 16:11:37 -08:00
Adam Williamson c4edf8009e Improve and simplify post_fail_hook existence checks
Summary:
The current installedtest post_fail_hook assumes /var/tmp/abrt
exists at all, and dies if it doesn't, leading to no /var/log
upload. We can also avoid using openQA `script_output` - which
is annoyingly indirect and slow - by using this neat `test -n`
trick I found on SO. Let's also use it in the anacondatest
post_fail_hook to avoid uploading /var/tmp when it's empty
(which we currently do). This also drops the 0 arg from a few
more script_run calls, because it's safe to wait for the run
to complete and we should probably do so to avoid later typing
errors if the commands are slow.

Test Plan:
Cause both anaconda and installed tests to fail and
check the hooks work as intended. Maybe twiddle the failures to
ensure directories do and don't exist and/or have contents and
make sure things work OK. I've tested this to some degree and
I'm pretty sure it works right.

Reviewers: jskladan, garretraziel

Reviewed By: garretraziel

Subscribers: tflink

Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D1041
2016-10-31 11:41:10 -07:00
Adam Williamson 600dd39a64 install tar in installedtest post_fail_hook
It's not always in minimal installs. This is a simple change
and needed to make the post-fail hook work for minimal installs,
so pushing without review.
2016-10-27 17:08:27 -07:00
Adam Williamson dcb68d93c8 drop our implementation of script_run in favour of os-autoinst
Summary:
os-autoinst implements `script_run` itself now, we aren't
required to implement it ourselves any more. os-autoinst's
implementation is better than ours, as it allows for verifying
the script actually ran (via the redirect-output-to-serial-
console trick).

So this drops our implementation so we'll just use the upstream
one. Where I judged we don't want to bother with the 'check
the command actually ran' feature I've adjusted our direct
`script_run` calls to pass a wait time of 0, which skips the
'wait for command to run' stuff entirely and just does a simple
'type the string and hit enter'.

Because of how the inheritance works, our `assert_script_run`
calls already used the os-autoinst `script_run`, rather than
the one from our distribution.

This should prevent `prepare_test_packages` sometimes going
wrong right after removing the python3-kickstart package, as
we'll properly wait for that removal to complete now (before
we weren't, we'd just start typing the next command while it
was still running, which could result in lost keypresses).

Test Plan:
Check all tests still run OK (I've tried this on
staging and it seems fine).

Reviewers: jskladan, garretraziel

Reviewed By: garretraziel

Subscribers: tflink

Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D1034
2016-10-20 09:24:48 -07:00
Adam Williamson cf2ce903c5 drop cockpit workaround entirely
now 119 hit F25, this is no longer needed (and breaks the test
in fact).
2016-10-17 16:52:17 -07:00
Adam Williamson 660dde164f disable cockpit login workaround for Rawhide
Rawhide now has cockpit 119, where this is fixed.
2016-10-08 10:59:57 -07:00
Adam Williamson bacb6f1f7b redo console_login with multiple matches, move to main_common
Summary:
Since we can match on multiple needles, we can drop the loop
from console_login and instead do it this way, which is simpler
and should work better on ARM (the timeouts will scale and
allow ARM to be slow here). Also move it to main_common as
there's no logical reason for it to be a class method.

Also remove the `check` arg. `check` was only set to 0 by two
tests, _console_shutdown and anacondatest's _post_fail_hook.

For _console_shutdown, I think I just wanted to give it the
best possible chance of succeeding. But we're really not going
to lose anything significant by checking, the only case where
check=>0 would've helped is if the 'good' needle had stopped
matching, and all sorts of other tests will fail in that case.

anacondatest was only using it to save a screenshot of whatever
was on the tty if it didn't reach a root console, which doesn't
seem that useful, and we'll get screenshots from check_screen
and assert_screen anyway.

Test Plan:
Run all tests, check they behave as expected and
none inappropriately fails on console login.

Reviewers: jskladan, garretraziel

Reviewed By: garretraziel

Subscribers: tflink

Differential Revision: https://phab.qadevel.cloud.fedoraproject.org/D1016
2016-09-30 08:42:45 -07:00