104 lines
4.1 KiB
ReStructuredText
104 lines
4.1 KiB
ReStructuredText
|
======================
|
||
|
Getting data from koji
|
||
|
======================
|
||
|
|
||
|
When Pungi is configured to get packages from a Koji tag, it somehow needs to
|
||
|
access the actual RPM files.
|
||
|
|
||
|
Historically, this required the storage used by Koji to be directly available
|
||
|
on the host where Pungi was running. This was usually achieved by using NFS for
|
||
|
the Koji volume, and mounting it on the compose host.
|
||
|
|
||
|
The compose could be created directly on the same volume. In such case the
|
||
|
packages would be hardlinked, significantly reducing space consumption.
|
||
|
|
||
|
The compose could also be created on a different storage, in which case the
|
||
|
packages would either need to be copied over or symlinked. Using symlinks
|
||
|
requires that anything that accesses the compose (e.g. a download server) would
|
||
|
also need to mount the Koji volume in the same location.
|
||
|
|
||
|
There is also a risk with symlinks that the package in Koji can change (due to
|
||
|
being resigned for example), which would invalidate composes linking to it.
|
||
|
|
||
|
|
||
|
Using Koji without direct mount
|
||
|
===============================
|
||
|
|
||
|
It is possible now to run a compose from a Koji tag without direct access to
|
||
|
Koji storage.
|
||
|
|
||
|
Pungi can download the packages over HTTP protocol, store them in a local
|
||
|
cache, and consume them from there.
|
||
|
|
||
|
The local cache has similar structure to what is on the Koji volume.
|
||
|
|
||
|
When Pungi needs some package, it has a path on Koji volume. It will replace
|
||
|
the ``topdir`` with the cache location. If such file exists, it will be used.
|
||
|
If it doesn't exist, it will be downloaded from Koji (by replacing the
|
||
|
``topdir`` with ``topurl``).
|
||
|
|
||
|
::
|
||
|
|
||
|
Koji path /mnt/koji/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
|
||
|
Koji URL https://kojipkgs.fedoraproject.org/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
|
||
|
Local path /mnt/compose/cache/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
|
||
|
|
||
|
The packages can be hardlinked from this cache directory.
|
||
|
|
||
|
|
||
|
Cleanup
|
||
|
-------
|
||
|
|
||
|
While the approach above allows each RPM to be downloaded only once, it will
|
||
|
eventually result in the Koji volume being mirrored locally. Most of the
|
||
|
packages will however no longer be needed.
|
||
|
|
||
|
There is a script ``pungi-cache-cleanup`` that can help with that. It can find
|
||
|
and remove files from the cache that are no longer needed.
|
||
|
|
||
|
A file is no longer needed if it has a single link (meaning it is only in the
|
||
|
cache, not in any compose), and it has mtime older than a given threshold.
|
||
|
|
||
|
It doesn't make sense to delete files that are hardlinked in an existing
|
||
|
compose as it would not save any space anyway.
|
||
|
|
||
|
The mtime check is meant to preserve files that are downloaded but not actually
|
||
|
used in a compose, like a subpackage that is not included in any variant. Every
|
||
|
time its existence in the local cache is checked, the mtime is updated.
|
||
|
|
||
|
|
||
|
Race conditions?
|
||
|
----------------
|
||
|
|
||
|
It should be safe to have multiple compose hosts share the same storage volume
|
||
|
for generated composes and local cache.
|
||
|
|
||
|
If a cache file is accessed and it exists, there's no risk of race condition.
|
||
|
|
||
|
If two composes need the same file at the same time and it is not present yet,
|
||
|
one of them will take a lock on it and start downloading. The other will wait
|
||
|
until the download is finished.
|
||
|
|
||
|
The lock is only valid for a set amount of time (5 minutes) to avoid issues
|
||
|
where the downloading process is killed in a way that blocks it from releasing
|
||
|
the lock.
|
||
|
|
||
|
If the file is large and network slow, the limit may not be enough finish
|
||
|
downloading. In that case the second process will steal the lock while the
|
||
|
first process is still downloading. This will result in the same file being
|
||
|
downloaded twice.
|
||
|
|
||
|
When the first process finishes the download, it will put the file into the
|
||
|
local cache location. When the second process finishes, it will atomically
|
||
|
replace it, but since it's the same file it will be the same file.
|
||
|
|
||
|
If the first compose already managed to hardlink the file before it gets
|
||
|
replaced, there will be two copies of the file present locally.
|
||
|
|
||
|
|
||
|
Caveats
|
||
|
-------
|
||
|
|
||
|
There is no integrity checking. Ideally Koji should provide checksums for the
|
||
|
RPMs that would be verified after downloading. This is not yet available.
|