pungi/doc/koji.rst
Lubomír Sedlář b625ccea06
Add integrity checking for builds
When a real build is downloaded, Koji can provide a checksum via API.
This commit adds verification of that checksum.

A mismatch will abort the compose. If Koji doesn't provide a checksum
for the particular sigkey, no checking will happen.

Nothing is still checked for scratch builds and images.

This patch requires Koji 1.32. When talking to an older version, there
is no checking done.

Signed-off-by: Lubomír Sedlář <lsedlar@redhat.com>
(cherry picked from commit 77f8fa25ad)
2023-11-10 16:55:44 +02:00

106 lines
4.2 KiB
ReStructuredText

======================
Getting data from koji
======================
When Pungi is configured to get packages from a Koji tag, it somehow needs to
access the actual RPM files.
Historically, this required the storage used by Koji to be directly available
on the host where Pungi was running. This was usually achieved by using NFS for
the Koji volume, and mounting it on the compose host.
The compose could be created directly on the same volume. In such case the
packages would be hardlinked, significantly reducing space consumption.
The compose could also be created on a different storage, in which case the
packages would either need to be copied over or symlinked. Using symlinks
requires that anything that accesses the compose (e.g. a download server) would
also need to mount the Koji volume in the same location.
There is also a risk with symlinks that the package in Koji can change (due to
being resigned for example), which would invalidate composes linking to it.
Using Koji without direct mount
===============================
It is possible now to run a compose from a Koji tag without direct access to
Koji storage.
Pungi can download the packages over HTTP protocol, store them in a local
cache, and consume them from there.
The local cache has similar structure to what is on the Koji volume.
When Pungi needs some package, it has a path on Koji volume. It will replace
the ``topdir`` with the cache location. If such file exists, it will be used.
If it doesn't exist, it will be downloaded from Koji (by replacing the
``topdir`` with ``topurl``).
::
Koji path /mnt/koji/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
Koji URL https://kojipkgs.fedoraproject.org/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
Local path /mnt/compose/cache/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
The packages can be hardlinked from this cache directory.
Cleanup
-------
While the approach above allows each RPM to be downloaded only once, it will
eventually result in the Koji volume being mirrored locally. Most of the
packages will however no longer be needed.
There is a script ``pungi-cache-cleanup`` that can help with that. It can find
and remove files from the cache that are no longer needed.
A file is no longer needed if it has a single link (meaning it is only in the
cache, not in any compose), and it has mtime older than a given threshold.
It doesn't make sense to delete files that are hardlinked in an existing
compose as it would not save any space anyway.
The mtime check is meant to preserve files that are downloaded but not actually
used in a compose, like a subpackage that is not included in any variant. Every
time its existence in the local cache is checked, the mtime is updated.
Race conditions?
----------------
It should be safe to have multiple compose hosts share the same storage volume
for generated composes and local cache.
If a cache file is accessed and it exists, there's no risk of race condition.
If two composes need the same file at the same time and it is not present yet,
one of them will take a lock on it and start downloading. The other will wait
until the download is finished.
The lock is only valid for a set amount of time (5 minutes) to avoid issues
where the downloading process is killed in a way that blocks it from releasing
the lock.
If the file is large and network slow, the limit may not be enough finish
downloading. In that case the second process will steal the lock while the
first process is still downloading. This will result in the same file being
downloaded twice.
When the first process finishes the download, it will put the file into the
local cache location. When the second process finishes, it will atomically
replace it, but since it's the same file it will be the same file.
If the first compose already managed to hardlink the file before it gets
replaced, there will be two copies of the file present locally.
Integrity checking
------------------
There is minimal integrity checking. RPM packages belonging to real builds will
be check to match the checksum provided by Koji hub.
There is no checking for scratch builds or any images.