b625ccea06
When a real build is downloaded, Koji can provide a checksum via API.
This commit adds verification of that checksum.
A mismatch will abort the compose. If Koji doesn't provide a checksum
for the particular sigkey, no checking will happen.
Nothing is still checked for scratch builds and images.
This patch requires Koji 1.32. When talking to an older version, there
is no checking done.
Signed-off-by: Lubomír Sedlář <lsedlar@redhat.com>
(cherry picked from commit 77f8fa25ad
)
106 lines
4.2 KiB
ReStructuredText
106 lines
4.2 KiB
ReStructuredText
======================
|
|
Getting data from koji
|
|
======================
|
|
|
|
When Pungi is configured to get packages from a Koji tag, it somehow needs to
|
|
access the actual RPM files.
|
|
|
|
Historically, this required the storage used by Koji to be directly available
|
|
on the host where Pungi was running. This was usually achieved by using NFS for
|
|
the Koji volume, and mounting it on the compose host.
|
|
|
|
The compose could be created directly on the same volume. In such case the
|
|
packages would be hardlinked, significantly reducing space consumption.
|
|
|
|
The compose could also be created on a different storage, in which case the
|
|
packages would either need to be copied over or symlinked. Using symlinks
|
|
requires that anything that accesses the compose (e.g. a download server) would
|
|
also need to mount the Koji volume in the same location.
|
|
|
|
There is also a risk with symlinks that the package in Koji can change (due to
|
|
being resigned for example), which would invalidate composes linking to it.
|
|
|
|
|
|
Using Koji without direct mount
|
|
===============================
|
|
|
|
It is possible now to run a compose from a Koji tag without direct access to
|
|
Koji storage.
|
|
|
|
Pungi can download the packages over HTTP protocol, store them in a local
|
|
cache, and consume them from there.
|
|
|
|
The local cache has similar structure to what is on the Koji volume.
|
|
|
|
When Pungi needs some package, it has a path on Koji volume. It will replace
|
|
the ``topdir`` with the cache location. If such file exists, it will be used.
|
|
If it doesn't exist, it will be downloaded from Koji (by replacing the
|
|
``topdir`` with ``topurl``).
|
|
|
|
::
|
|
|
|
Koji path /mnt/koji/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
|
|
Koji URL https://kojipkgs.fedoraproject.org/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
|
|
Local path /mnt/compose/cache/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
|
|
|
|
The packages can be hardlinked from this cache directory.
|
|
|
|
|
|
Cleanup
|
|
-------
|
|
|
|
While the approach above allows each RPM to be downloaded only once, it will
|
|
eventually result in the Koji volume being mirrored locally. Most of the
|
|
packages will however no longer be needed.
|
|
|
|
There is a script ``pungi-cache-cleanup`` that can help with that. It can find
|
|
and remove files from the cache that are no longer needed.
|
|
|
|
A file is no longer needed if it has a single link (meaning it is only in the
|
|
cache, not in any compose), and it has mtime older than a given threshold.
|
|
|
|
It doesn't make sense to delete files that are hardlinked in an existing
|
|
compose as it would not save any space anyway.
|
|
|
|
The mtime check is meant to preserve files that are downloaded but not actually
|
|
used in a compose, like a subpackage that is not included in any variant. Every
|
|
time its existence in the local cache is checked, the mtime is updated.
|
|
|
|
|
|
Race conditions?
|
|
----------------
|
|
|
|
It should be safe to have multiple compose hosts share the same storage volume
|
|
for generated composes and local cache.
|
|
|
|
If a cache file is accessed and it exists, there's no risk of race condition.
|
|
|
|
If two composes need the same file at the same time and it is not present yet,
|
|
one of them will take a lock on it and start downloading. The other will wait
|
|
until the download is finished.
|
|
|
|
The lock is only valid for a set amount of time (5 minutes) to avoid issues
|
|
where the downloading process is killed in a way that blocks it from releasing
|
|
the lock.
|
|
|
|
If the file is large and network slow, the limit may not be enough finish
|
|
downloading. In that case the second process will steal the lock while the
|
|
first process is still downloading. This will result in the same file being
|
|
downloaded twice.
|
|
|
|
When the first process finishes the download, it will put the file into the
|
|
local cache location. When the second process finishes, it will atomically
|
|
replace it, but since it's the same file it will be the same file.
|
|
|
|
If the first compose already managed to hardlink the file before it gets
|
|
replaced, there will be two copies of the file present locally.
|
|
|
|
|
|
Integrity checking
|
|
------------------
|
|
|
|
There is minimal integrity checking. RPM packages belonging to real builds will
|
|
be check to match the checksum provided by Koji hub.
|
|
|
|
There is no checking for scratch builds or any images.
|