Backport #1796 to speed up compose some more
(cherry picked from commit 130e003364be879f91c3716f9e059f319376b89c)
This commit is contained in:
parent
3ec9bd0413
commit
cc5b039197
60
1796.patch
Normal file
60
1796.patch
Normal file
@ -0,0 +1,60 @@
|
|||||||
|
From c8fe99b1aa5a9a9b941b7515cda367d24829dedf Mon Sep 17 00:00:00 2001
|
||||||
|
From: Adam Williamson <awilliam@redhat.com>
|
||||||
|
Date: Nov 19 2024 08:22:09 +0000
|
||||||
|
Subject: pkgset: optimize cache check (saves 20 minutes)
|
||||||
|
|
||||||
|
|
||||||
|
The pkgset phase takes around 35 minutes in current composes.
|
||||||
|
Around 20 minutes of that is spent creating these per-arch
|
||||||
|
subsets of the global package set. In a rather roundabout way
|
||||||
|
(see #1794 ), I figured out that almost all of this time is
|
||||||
|
spent in this cache check, which is broken for a subtle reason.
|
||||||
|
|
||||||
|
Python's `in` keyword works by first attempting to call the
|
||||||
|
container's magic `__contains__` method. If the container does
|
||||||
|
not implement `__contains__`, it falls back to iteration - it
|
||||||
|
tries to iterate over the container until it either hits what
|
||||||
|
it's looking for, or runs out. (If the container implements
|
||||||
|
neither, you get an error).
|
||||||
|
|
||||||
|
The FileCache instance's `file_cache` is a plain Python dict.
|
||||||
|
dicts have a very efficient `__contains__` implementation, so
|
||||||
|
doing `foo in (somedict)` is basically always very fast no matter
|
||||||
|
how huge the dict is. FileCache itself, though, implements
|
||||||
|
`__iter__` by returning an iterator over the `file_cache` dict's
|
||||||
|
keys, but it does *not* implement `__contains__`. So when we do
|
||||||
|
`foo in self.file_cache`, Python has to iterate over every key
|
||||||
|
in the dict until it hits foo or runs out. This is massively
|
||||||
|
slower than `foo in self.file_cache.file_cache`, which uses the
|
||||||
|
efficient `__contains__` method.
|
||||||
|
|
||||||
|
Because these package sets are so huge, and we're looping over
|
||||||
|
*one* huge set and checking each package from it against the cache
|
||||||
|
of another, increasingly huge, set, this effect becomes massive.
|
||||||
|
To make it even worse, I ran a few tests where I added a debug log
|
||||||
|
if we ever hit the cache, and it looks like we never actually do -
|
||||||
|
so every check has to iterate through the entire dict.
|
||||||
|
|
||||||
|
We could probably remove this entirely, but changing it to check
|
||||||
|
the dict instead of the FileCache instance makes it just about as
|
||||||
|
fast as taking it out, so I figured let's go with that in case
|
||||||
|
there's some unusual scenario in which the cache does work here.
|
||||||
|
|
||||||
|
Signed-off-by: Adam Williamson <awilliam@redhat.com>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
diff --git a/pungi/phases/pkgset/pkgsets.py b/pungi/phases/pkgset/pkgsets.py
|
||||||
|
index 81c090c..ee12c3c 100644
|
||||||
|
--- a/pungi/phases/pkgset/pkgsets.py
|
||||||
|
+++ b/pungi/phases/pkgset/pkgsets.py
|
||||||
|
@@ -265,7 +265,7 @@ class PackageSetBase(kobo.log.LoggingBase):
|
||||||
|
for arch in arch_list:
|
||||||
|
self.rpms_by_arch.setdefault(arch, [])
|
||||||
|
for i in other.rpms_by_arch.get(arch, []):
|
||||||
|
- if i.file_path in self.file_cache:
|
||||||
|
+ if i.file_path in self.file_cache.file_cache:
|
||||||
|
# TODO: test if it really works
|
||||||
|
continue
|
||||||
|
if inherit_to_noarch and exclusivearch_list and arch == "noarch":
|
||||||
|
|
@ -13,6 +13,7 @@ Patch: https://pagure.io/pungi/pull-request/1782.patch
|
|||||||
Patch: https://pagure.io/pungi/pull-request/1788.patch
|
Patch: https://pagure.io/pungi/pull-request/1788.patch
|
||||||
Patch: https://pagure.io/pungi/pull-request/1789.patch
|
Patch: https://pagure.io/pungi/pull-request/1789.patch
|
||||||
Patch: https://pagure.io/pungi/pull-request/1790.patch
|
Patch: https://pagure.io/pungi/pull-request/1790.patch
|
||||||
|
Patch: https://pagure.io/pungi/pull-request/1796.patch
|
||||||
|
|
||||||
BuildRequires: make
|
BuildRequires: make
|
||||||
BuildRequires: python3-pytest
|
BuildRequires: python3-pytest
|
||||||
@ -179,6 +180,9 @@ rm %{buildroot}%{_bindir}/pungi
|
|||||||
%{_bindir}/%{name}-cache-cleanup
|
%{_bindir}/%{name}-cache-cleanup
|
||||||
|
|
||||||
%changelog
|
%changelog
|
||||||
|
* Tue Nov 19 2024 Adam Williamson <awilliam@redhat.com> - 4.7.0-7
|
||||||
|
- Backport #1796 to speed up compose some more
|
||||||
|
|
||||||
* Thu Nov 14 2024 Adam Williamson <awilliam@redhat.com> - 4.7.0-6
|
* Thu Nov 14 2024 Adam Williamson <awilliam@redhat.com> - 4.7.0-6
|
||||||
- Rebuild with no changes to bump past release used in infra tag
|
- Rebuild with no changes to bump past release used in infra tag
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user